Stacking

Stacking

Basic Idea

The basic idea behind stacked generalization is to use a pool of base classifiers, then use another classifier to combine their predictions, with the aim of reducing the generalization error.

Algorithms

Standard Stacking

这里写图片描述

  • Split training set (labels,F) ( l a b e l s , F ) into k k folds f1,f2,,fk, fit learner Li L i on each k1 k − 1 folds and use it to predict the remaining fold to get predictions on f1,f2,,fk f 1 , f 2 , … , f k , denoted as Vi1,Vi2,,Vik V i 1 , V i 2 , … , V i k , one fold at a time.
  • Meanwhile, each time we get a Li L i , use it to make predictions on the whole test set thus we obtain k k sets of predictions Pi1,Pi2,,Pik.
  • Vi=(Vi1;Vi2;;Vik) V i = ( V i 1 ; V i 2 ; … ; V i k )
    Pi=1kkm=1Pim P i = 1 k ∑ m = 1 k P i m or other averaging methods
  • Repeat above steps from i=1 i = 1 to n n , using different learners which we call unified as Model 1, we have
    V=(V1,V2,,Vn)
    P=(P1,P2,,Pn) P = ( P 1 , P 2 , … , P n )
  • Consider (labels,V) ( l a b e l s , V ) and P P as new training set and test set respectively, train and make predictions with Model 2 (usually Logistic Regression, Popular non-linear algorithms for stacking are GBM, KNN, NN ,RF**and **ET (extra trees).) to get final results.

In a word, it models

y=i=1nwigi

where wi w i is the weight of the i-th learner and gi g i is the corresponding prediction. Some details should be paid attention to:

  • Averaging works if it is a regression problem or it is a classification problem but model 1 learners’ output are probabilities. Under other circumstances, voting could be better than averaging.
  • In fact you can also get Pi P i by simply using Li L i to train on the whole training set and make predictions on the test set , which may consume more computing resources but slightly lower the coding complexity.
  • The partitions of training set must be the same for n n different estimators, especially when you’re on a team work, or it will lead to information leak and therefore over-fitting.

Feature-Weighted Linear Stacking

Replace wi with vkxk ∑ v k x k , here xk x k represents the k-th feature of a sample and vk v k is the corresponding weight.

Further

We can also insert predictions of Model 1 to original features to get expanded features, notice that their dimensions are different, therefore normalization is necessary.

References

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值