Ensemble

Averaging

两个具有显著差异的模型做 Linear Blend

1)多个模型的平均输出

2)多个模型进行加权平均,加权系数可以通过Linear blending在validation上确定

3)条件平均,不同条件下选取不同的模型

Bagging

相同模型不同版本的平均融合, 不同状态下的模型具有不同的偏差和方差,通过模型的简单平均可以减少最后模型的偏差和方差从而提高模型的精度。
 
Bagging的方式:Bagging models每个之间是独立的,因此可以实现并行化
  • 改变随机种子,每个模型的初始化参数不同
  • 行采样或者Bootstrapping,又放回采样
  • Shuffling,将数据集打乱以提高模型训练的鲁棒性
  • 列采样,等价于对数据集进行特征采样
  • 模型的数量,模型越多融合的结果越好
 
随机森林的算法就是很好的 Bagging方式的实现,简单的例子表明Bagging的具体代码实现:
model = RandomForestRegressor()
bags = 10
seed = 1
bagged_prediction = np.zeros(test.shape[0])
for n in range(0,bags):
    model.set_params(random_state=seed+1)
    model.fit(train,y)
    preds = model.predict(test)
    bagged_prediction += preds
bagged_prediction /= bags

Boosting

模型序列化,不同模型的优化依赖前面模型的性能。
1)基于权重的boosting
Boosting参数:
  • 学习率:Learning rate/shringkage/eta:用来权重的更新
  • 模型的数量:estimators
  • 初始输入模型:Input model
  • AdaBoost很好的案例
2)基于残差的boosting
Boosting参数:
  • 学习率:Learning rate/shringkage/eta:用来权重的更新
  • 模型的数量:estimators
  • 行采样:Row Sampling
  • 列采样:Col Sampling
  • 初始输入模型:Input model
 
典型的模型有:
  • xgboost
  • lightgbm
  • catboost
  • skearn's GBM
  • H20's GBM

Stacking

一般两层的模型堆叠
时间序列处理机制:
 
 
模型多样性的生成
        It will find when a model is good, and when a model is actually bad or fairly weak.So you don't need to worry too much to make all the models really strong,stacking can actually extract the juice from each prediction. Therefore, what you really need to focus is, am I making a model that brings some information, even though it is generally weak?And this is true, there have been many situations where I've made, I've had some quite weak models in my ensemble, I mean, compared to the top performance. And nevertheless, they were actually adding lots of value in stacking. They were bringing in new information that the meta model could leverage.
        Normally, you introduce diversity from two forms:
1)one is by choosing a different algorithm.Which makes sense, certain algorithms capitalize on different relationships within the data. For example, a linear model will focus on a linear relationship, a non-linear model can capture better a non-linear relationships.So predictions may come a bit different.
2)The other thing is you can even run the same model, but you try to run it on different transformation of input data, either less features or completely different transformation.For example, in one data set you may treat categorical features as one whole encoding. In another, you may just use label Encoding, and the result will probably produce a model that is very different.
 
常见的策略
  • with time sensitive data - respect time
  • Diversity as important as preformance
  • Diversity may come from:
    • Different algorithms
    • Different input featurs
  • Performance plateauing after N models
  • Meta model is normally modest(顶层的stacking模型不需要太复杂)

StackNet

多层stacking模型的堆叠,3、4层不同的模型堆叠
 
总的来说:大量使用不同的模型,构建不同的特征,利用不同的特征群和不同的模型训练得到不同的结果,然后一层层的堆叠进行多层训练完成最后模型的输出。
 
案例1

案例2:

Tips About StackNet
  • 支持许多模型的叠加
  • 可以在回归问题中使用分类器,反之亦然(非常有用)
  • 记住层数越深,模型复杂度越低
 
一般的模型选择
第一层模型
第二层模型
 
 
 
 
 
 
 
 
 
 
 
 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值