Stacking

最新推荐文章于 2023-01-16 11:49:58 发布

其可怪也欤

最新推荐文章于 2023-01-16 11:49:58 发布

阅读量720

点赞数

分类专栏： Ensemble 文章标签： stacking

本文链接：https://blog.csdn.net/qq_36904126/article/details/81349035

版权

Ensemble 专栏收录该内容

0 篇文章 0 订阅

订阅专栏

Stacking

Stacking

Basic Idea

The basic idea behind stacked generalization is to use a pool of base classifiers, then use another classifier to combine their predictions, with the aim of reducing the generalization error.

Algorithms

Standard Stacking

这里写图片描述

Split training set $(labels,F)$ into $k$ folds $f_1,f_2,\dots,f_k$ , fit learner $L_i$ on each $k-1$ folds and use it to predict the remaining fold to get predictions on $f_1,f_2,\dots,f_k$ , denoted as $V_{i1},V_{i2},\dots,V_{ik}$ , one fold at a time.
Meanwhile, each time we get a $L_i$ , use it to make predictions on the whole test set thus we obtain $k$ sets of predictions $P_{i1},P_{i2},\dots,P_{ik}$ .
$V_i=(V_{i1};V_{i2};\dots;V_{ik})$
$P_i=\frac{1}{k}\sum_{m=1}^{k}P_{im}$ or other averaging methods
Repeat above steps from $i=1$ to $n$ , using different learners which we call unified as Model 1, we have
$V=(V_1,V_2,\dots,V_n)$
$P=(P_1,P_2,\dots,P_n)$
Consider $(labels,V)$ and $P$ as new training set and test set respectively, train and make predictions with Model 2 (usually Logistic Regression, Popular non-linear algorithms for stacking are GBM, KNN, NN ,RF**and **ET (extra trees).) to get final results.

In a word, it models

y^{⋆} = \sum_{i = 1}^{n} w_{i} g_{i}

$\begin{equation*} y^\star=\sum_{i=1}^nw_ig_i \end{equation*}$
where

wi w i $w_i$ is the weight of the i-th learner and

gi g i $g_i$ is the corresponding prediction. Some details should be paid attention to:

Averaging works if it is a regression problem or it is a classification problem but model 1 learners’ output are probabilities. Under other circumstances, voting could be better than averaging.
In fact you can also get $P_i$ by simply using $L_i$ to train on the whole training set and make predictions on the test set , which may consume more computing resources but slightly lower the coding complexity.
The partitions of training set must be the same for $n$ different estimators, especially when you’re on a team work, or it will lead to information leak and therefore over-fitting.

Feature-Weighted Linear Stacking

Replace $w_i$ with $\sum v_kx_k$ , here $x_k$ represents the k-th feature of a sample and $v_k$ is the corresponding weight.

Further

We can also insert predictions of Model 1 to original features to get expanded features, notice that their dimensions are different, therefore normalization is necessary.

References

Stacked Generalization
http://www.machine-learning.martinsewell.com/ensembles/stacking/Wolpert1992.pdf
Feature-Weighted Linear Stacking
https://arxiv.org/pdf/0911.0460.pdf
Kaggle Ensemble Guide
https://mlwave.com/kaggle-ensembling-guide/
A Kaggler’s Guide to Model Stacking in Practice
http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/
详解stacking过程
https://blog.csdn.net/wstcjf/article/details/77989963
Stacking Learning在分类问题中的使用 (with more valuable links)
https://blog.csdn.net/MrLevo520/article/details/78161590
详解 Stacking 的 python 实现
https://www.jianshu.com/p/5905f19c4df6

其可怪也欤

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Stacking

StackingBasic IdeaAlgorithmsStandard StackingFeature-Weighted Linear StackingFurtherReferencesStackingBasic IdeaThe basic idea behind stacked generalization is to use a pool of b...
复制链接

扫一扫