https://www.mql5.com/en/articles/4227
https://www.mql5.com/en/articles/4228
https://www.mql5.com/en/articles/4722
https://blog.csdn.net/zwqjoy/article/details/80431496
https://www.jianshu.com/p/11083abc5738
文章目录
Ensemble Learning 集成学习
1995-
1. classifier ensembles 分类器集成
1.1 集成模型怎么定义?
1.2 这样做是否正确?
“Multiple Classifier Combination: Lessons and Next steps”, published in 2002, Tin Kam Ho wrote:
“Instead of looking for the best set of features and the best classifier, now we look for the best set of classifiers and then the best combination method. One can imagine that very soon we will be looking for the best set of combination methods and then the best way to use them all. If we do not take the chance to review the fundamental problems arising from this challenge, we are bound to be driven into such an infinite recurrence, dragging along more and more complicated combination schemes and theories, and gradually losing sight of the original problem.”
陷入构建更复杂模型的循环,忽视问题本质。
2. 切入角度
思路:在各个层次向集成模型转变
2.1 Combiner 聚合器
-
Non-trainable. An example of such a method is a simple “majority voting (多数投票)”.
-
Trainable. This group includes “weighted majority voting” and “Naive Bayes”, as well as the “classifier selection” approach, where the decision on a given object is made by one classifier of the ensemble. 最终结果由众多聚合分类器之中的一个表示。
-
Meta classifier. Outputs of the base classifiers are considered as inputs for the new classifier to be trained, which becomes a combiner. This approach is called “complex generalization”, “generalization through training”, or simply “stacking”. Building a training set for a meta classifier is one of the main problems of this combiner. ( generalization: 概括,简单化 )
2.2 Diversity 分类器差异性
How to generate differences in the ensemble? The following options are suggested.
-
Manipulate the training parameters. Use different approaches and parameters when training individual base classifiers. For example, it is possible to initialize the neuron weights in the hidden layers of each base classifier’s neural network with different random variables. It is also possible to set the hyperparameters randomly.
-
Manipulate the samples — take a custom bootstrap sample from the training set for each member of the ensemble.
-
Manipulate the predictors — prepare a custom set of randomly determined predictors for each base classifier. This is the so-called vertical split of the training set.
2.3 Ensemble size 聚合规模
How to determine the number of classifiers in an ensemble? Is the ensemble built by simultaneous (同时/一次性) training of the required number of classifiers or iteratively by adding/removing classifiers? Possible options:
- The number is reserved in advance 预设数量
- The number is set in the course of training 训练时确定数量 (e.g. boost)
- Classifiers are overproduced and then selected 过量训练
3. 主流集成方法: bagging / boosting / stacking
3.1 动机
方差 / 偏差 / 预测效果(end to end)
3.2 集成方法产生
-
reduce variance — bagging;
-
reduce bias — boosting;
-
improve predictions — stacking.
-
parallel methods(平行聚合) of constructing an ensemble, where the base models are generated in parallel (for example, a random forest). The idea is to use the independency(独立性) between the base models and to reduce the error by averaging. Hence, the main requirement for models — low mutual correlation and high diversity.
-
sequential ensemble methods (序列式聚合), where the base models are generated sequentially (for example, AdaBoost, XGBoost). The main idea here is to use the dependency(关联性) between the base models. Here, the overall quality can be increased by assigning higher weights to examples that were previously incorrectly classified.
3.3 Bagging
分类器并列计算 -> 组合输出
-
a bootstrap sample (引导样本) is extracted from the training set;
-
each classifier is trained on its own sample;
-
individual outputs from separate classifiers are combined into one class label. If individual outputs have the form of a class label, then a simple majority voting is used. If the output of classifiers is a continuous variable, then either averaging is applied, or the variable is converted into a class label, followed by a simple majority voting.
来自单独分类器的各个输出被组合成一个类标签。
如果单个输出具有类标签的形式,则使用简单多数表决。如果分类器的输出是连续变量,则应用平均值,或者将变量转换为类标签,然后进行简单多数表决。
3.4 Boosting
Boosting指的是通过算法集合将弱学习器转换为强学习器。
boosting的主要原则是训练一系列的弱学习器,所谓弱学习器是指仅比随机猜测好一点点的模型,例如较小的决策树,训练的方式是利用加权的数据。在训练的早期对于错分数据给予较大的权重。
- 先从初始训练集训练出一个基学习器;
- 再根据基学习器的表现对训练样本分布进行调整,使得先前基学习器做错的训练样本在后续受到更多关注;
- 基于调整后的样本分布来训练下一个基学习器;
- 重复进行上述步骤,直至基学习器数目达到事先指定的值T,最终将这T个基学习器进行加权结合。
3.5 Stacking
元模型将基础模型的特征作为特征进行训练, 类似于食物链的形式。
基础模型通常包含不同的学习算法,因此stacking通常是异质集成。
3.6 集成方法对比
Bagging
e.g. 随机森林
降低方差
Boosting
e.g. AdaBoost
校正偏差
Stacking
e.g. Model/Model/Model/Model/…… -> XGBoost/Neural Network/Adaboost …-> final model
宏观上提升性能