集成学习——Bagging、Boosting、Stacking

Gu_NN

已于 2022-08-02 01:07:51 修改

阅读量839

点赞数 1

分类专栏：集成学习文章标签： boosting 机器学习深度学习

于 2021-07-23 02:03:51 首次发布

本文链接：https://blog.csdn.net/Gu_NN/article/details/119012265

版权

集成学习专栏收录该内容

10 篇文章 0 订阅

订阅专栏

偏差与方差

设模型表示为：
$y_i = f(\textbf{X}_i)+\epsilon_i,i\in\{1,2,...,n\}$
其中，假设噪声 $\epsilon$ 均值为0,方差为 $\sigma^2$ 。则模型训练目的是令损失函数L最小，即

$\begin{aligned} L(\hat{f}) & = \mathbb{E}_D(y-\hat{f}_D)^2\\ & = \mathbb{E}_D(f+\epsilon-\hat{f}_D+\mathbb{E}_D[\hat{f}_{D}]-\mathbb{E}_D[\hat{f}_{D}])^2 \\ & = \mathbb{E}_D[(f-\mathbb{E}_D[\hat{f}_{D}])+(\mathbb{E}_D[\hat{f}_{D}]-\hat{f}_D)+\epsilon]^2 \\ & = \mathbb{E}_D[(f-\mathbb{E}_D[\hat{f}_{D}])^2] + \mathbb{E}_D[(\mathbb{E}_D[\hat{f}_{D}]-\hat{f}_D)^2] + \mathbb{E}_D[\epsilon^2] + 2\mathbb{E}_D[(f-\mathbb{E}_D[\hat{f}_{D}])(\mathbb{E}_D[\hat{f}_{D}]-\hat{f}_D)] + 2\mathbb{E}_D[\epsilon(f-\mathbb{E}_D[\hat{f}_{D}])]+ 2\mathbb{E}_D[\epsilon(\mathbb{E}_D[\hat{f}_{D}]-\hat{f}_D)] \\ & = \mathbb{E}_D[(f-\mathbb{E}_D[\hat{f}_{D}])^2] +\mathbb{E}_D[(\mathbb{E}_D[\hat{f}_{D}]-\hat{f}_D)^2] + \mathbb{E}_D[\epsilon^2] \\ & = [f-\mathbb{E}_D[\hat{f}_{D}]]^2 + \mathbb{E}_D[(\mathbb{E}_D[\hat{f}_{D}]-\hat{f}_D)^2] + \sigma^2 \end{aligned}$

式中第四行后三项期望为0，故第四行可得到第五行。
最终等式右边三项

第一项为数据真实值与模型平均预测值的偏差
第二项为模型预测值的方差
第三项为数据中的原始噪声

其中，偏差越小模型学习能力越强，方差越小模型抗干扰能力越强。但是方差和偏差不可能同时能达到最小。因此要选择两者的折中。
集成学习就是利用多个低偏差的学习器进行集成来降低模型的方差，或者利用多个低方差学习器进行集成来降低模型的偏差。

投票法

投票法即通过对多个模型结果进行少数服从多数的融合。有助于提高模型的泛化能力，减少模型的错误率。

回归模型
投票法最终的预测结果是多个其他回归模型预测结果的平均值。
分类模型
- 硬投票：预测结果是多个模型预测结果中出现次数最多的类别
- 软投票：对各类预测结果的概率进行求和，最终选取概率之和最大的类标签。

投票法结果好需要满足的两个条件：

基模型之间的效果不能差别过大。当某个基模型相对于其他基模型效果过差时，该模型很可能成为噪声。
基模型之间应该有较小的同质性。例如在基模型预测效果近似的情况下，基于树模型与线性模型的投票，往往优于两个树模型或两个线性模型。

缺点：所有子模型对预测的贡献是一样的。如果一些模型在某些情况下很好，而在其他情况下很差。

python库 ：Sklearn中提供了 VotingRegressor 与 VotingClassifier

集成学习

将多个弱分类器组合成强分类器的过程

Bagging

bagging是一种并行集成方法，其全称是booststrap aggregating，即基于bootstrap抽样的聚合算法。

Bootstraps

自主采样，是有放回采样

Bagging

思路
通过bootstraps获得多个样本集合，然后对每个样本集合训练基学习器，再把这些基学习器结果求均值。
回归任务模型方差
每个基学习器输出值 $y^{(i)}$ 的方差为 $\sigma^2$ ，基学习器两两之间的相关系数为 $\rho$ ，则可以计算集成模型输出的方差为：
$\begin{aligned} Var(\hat{y})&=Var(\frac{\sum_{i=1}^My^{(i)}}{M})\\ &= \frac{1}{M^2}[\sum_{i=1}^MVar(y^{(i)})+\sum_{i\neq j}Cov(y^{(i)},y^{(j)})]\\ &= \frac{1}{M^2}[M\sigma^2+M(M-1)\rho\sigma^2]\\ &= \rho\sigma^2 + (1-\rho)\frac{\sigma^2}{M} \end{aligned}$
放回抽样使得模型两两之间很可能有一些样本不会同时包含，即 $\rho<1$ ，而集成的方差随着模型相关性的降低而减小，如果想要进一步减少模型之间的相关性，那么就需要对基学习器进行进一步的设计。
数据集差异
设样本容量为N，采样数为n。
- 单个样本入选概率
  $1-(1-\frac{1}{N})^n$
  当 $n\rightarrow\infty$ 时，概率为 $1-e^{-1}$ 。
- 入选非重复样本的期望个数
  $\begin{aligned} \mathbb{E}\sum_{i=1}^n\mathbb{1}_{\{A_i\}} &= \sum_{i=1}^n\mathbb{E}\mathbb{1}_{\{A_i\}} \\ &= \sum_{i=1}^nP(A_i)\\ &= \sum_{i=1}^n 1-(1-\frac{1}{N})^n\\ &= n[1-(1-\frac{1}{N})^n] \end{aligned}$
  当 $n\rightarrow\infty$ 时，入选样本占原数据集的期望比例为 $\lim_{n\to \infty}\frac{\mathbb{E}\sum_{i=1}^n\mathbb{1}_{\{A_i\}}}{n}=\lim_{n\to\infty} [1-(1-\frac{1}{N})^n]$
假设总体有100个样本，每轮利用bootstrap抽样从总体中得到10个样本，则所有样本都被至少抽出过一次的期望轮数= $\frac{100}{10}\sum_{i=1}^{100}\frac{1}{i}=10*5.187=52$ 。
模型优化
假设第i个基模型的输出是 $\hat{f}^{(i)}(\mathbf{X})$ ，则总体模型的输出为 $\sum_{i=1}^M\alpha_i\hat{f}^{(i)}(\tilde{X})$ 。boosting算法在拟合第T个学习器时，已经获得了前T-1个学习器的集成输出 $\sum_{i=1}^{T-1}\alpha_i\hat{f}^{(i)}(\mathbf{X})$ ，对于损失函数 $L(y,\hat{y})$ ，当前轮需要优化的目标即为使得 $L(y,\alpha_{T}\hat{f}^{(T)}(\mathbf{X})+\sum_{i=1}^{T-1}\alpha_i\hat{f}^{(i)}(\mathbf{X}))$ 最小化。
如何降低预测误差
预测模型误差=训练偏差+训练方差+误差项方差，bagging结果是对每个训练模型结果进行加权求和
- 子模型越多，bagging方差越小
- 子模型相关性越低，bagging方差越小
bagging与随机森林
随机森林是在样本采样的同时对特征进行采样。（方差会进一步减小）
python库：Sklearn提供了 BaggingRegressor 与 BaggingClassifier
BaggingClassifier与DecisionTreeClassifier对比
- 模型：sklearn.ensemble.BaggingClassifier和sklearn.tree.DecisionTreeClassifier
- 数据集：鸢尾花数据集
- 决策树指标：GINI系数
- 代码

from numpy import mean
from numpy import std
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

#加载鸢尾花数据集
iris = load_iris() 
#bagging模型
model1 = BaggingClassifier(random_state=0)
#普通决策树
model2 = DecisionTreeClassifier(random_state=0)
#kfold
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores1 = cross_val_score(model1, iris.data, iris.target, cv=cv)
n_scores2 = cross_val_score(model2, iris.data, iris.target, cv=cv)
#results
print('k = 10')
print('BaggingClassifier_Accuracy: %.3f (%.3f)' % (mean(n_scores1), std(n_scores1)))
print('DecisionTreeClassifier_Accuracy: %.3f (%.3f)' % (mean(n_scores2), std(n_scores2)))
#kfold
cv = RepeatedStratifiedKFold(n_splits=2, n_repeats=3, random_state=1)
n_scores1 = cross_val_score(model1, iris.data, iris.target, cv=cv)
n_scores2 = cross_val_score(model2, iris.data, iris.target, cv=cv)
#results
print('k = 2')
print('BaggingClassifier_Accuracy: %.3f (%.3f)' % (mean(n_scores1), std(n_scores1)))
print('DecisionTreeClassifier_Accuracy: %.3f (%.3f)' % (mean(n_scores2), std(n_scores2)))

结果
k = 10
BaggingClassifier_Accuracy: 0.949 (0.066)
DecisionTreeClassifier_Accuracy: 0.951 (0.062)
k = 2
BaggingClassifier_Accuracy: 0.942 (0.025)
DecisionTreeClassifier_Accuracy: 0.929 (0.026)
分析，在进行10次交叉验证后普通决策树模型的误差和方差均优于bagging决策树，但是不进行交叉验证或者交叉验证次数少时，bagging效果优于普通决策树。bagging过程本身就包含多次采样建立子模型过程，与交叉验证过程有重合。

Boosting

基本概念

思路：使用同一组数据集进行反复学习，得到一系列简单模型，然后组合这些模型构成一个预测性能十分强大的机器学习模型。
与bagging本质区别
boosting是通过不断减少偏差的方式减小预测误差，而bagging是通过减少方差。
弱学习：识别错误率小于1/2（即准确率仅比随机猜测略高的学习算法）
强学习：识别准确率很高并能在多项式时间内完成的学习算法
大多数的Boosting方法都是通过改变训练数据集的概率分布(训练数据不同样本的权值)，针对不同概率分布的数据调用弱分类算法学习一系列的弱分类器。

Adaboost

基本思路
总体思路：1. 提高那些被前一轮分类器错误分类的样本的权重，而降低那些被正确分类的样本的权重。2. 加大分类错误率低的弱分类器的权重
Step1 初始化训练数据的分布
假设训练数据的权值分布是均匀分布
$D_{1}=\left(w_{11}, \cdots, w_{1 i}, \cdots, w_{1 N}\right), \quad w_{1 i}=\frac{1}{N}, \quad i=1,2, \cdots, N$
Step2 迭代基本分类器 $G_m(x)$ 的分类错误率
对于m=1,2,…,M
- 使用具有权值分布 $D_m$ 的训练数据集进行学习，得到基本分类器
  $G_{m}(x): \mathcal{X} \rightarrow\{-1,+1\}$
- 计算 $G_m(x)$ 在训练集上的分类误差率
  $\begin{aligned} e_{m}&=\sum_{i=1}^{N} P\left(G_{m}\left(x_{i}\right) \neq y_{i}\right)\\ &=\sum_{i=1}^{N} w_{m i} I\left(G_{m}\left(x_{i}\right) \neq y_{i}\right) \end{aligned}$
- 计算 $G_m(x)$ 在总模型中的重要程度
  $\alpha_{m}=\frac{1}{2} \ln \frac{1-e_{m}}{e_{m}}$
  （分类错误率越小模型越重要，系数越大；在后一步中可以给分类错误的样本较大权重）
- 更新训练数据集的权重分布
  $\begin{array}{c} D_{m+1}=\left(w_{m+1,1}, \cdots, w_{m+1, i}, \cdots, w_{m+1, N}\right) \\ w_{m+1, i}=\frac{w_{m i}}{Z_{m}} \exp \left(-\alpha_{m} y_{i} G_{m}\left(x_{i}\right)\right), \quad i=1,2, \cdots, N \end{array}$
  （预测正确则 $y_{i} G_{m}(x_i)$ 为正）
  这里的 $Z_m$ 是规范化因子，使得 $D_{m+1}$ 称为概率分布， $Z_{m}=\sum_{i=1}^{N} w_{m i} \exp \left(-\alpha_{m} y_{i} G_{m}\left(x_{i}\right)\right)$
Step3 构建基本分类器的线性组合得到最终分类器
$\begin{aligned} G(x) &=\operatorname{sign}(f(x)) \\ &=\operatorname{sign}\left(\sum_{m=1}^{M} \alpha_{m} G_{m}(x)\right) \end{aligned}$ 所有的 $\alpha_m$ 之和不为1。 $f (x)$ 的符号决定了样本x属于哪一类
python库：sklearn.ensemble.AdaBoostClassifier

前向分步算法

一个加法集成模型： $f(x)=\sum_{m=1}^{M} \beta_{m} b\left(x ; \gamma_{m}\right)$
其中，
$b\left(x ; \gamma_{m}\right)$ 为即基本分类器，
$\gamma_{m}$ 为基本分类器的参数，
$\beta_m$ 为基本分类器的权重

在给定训练数据以及损失函数 $L (y, f (x))$ 的条件下，学习 $f (x)$ 就是：
$\min _{\beta_{m}, \gamma_{m}} \sum_{i=1}^{N} L\left(y_{i}, \sum_{m=1}^{M} \beta_{m} b\left(x_{i} ; \gamma_{m}\right)\right)$
（很难通过简单的凸优化的相关知识进行解决）

前向分步算法的基本思路是：将同时求解从m=1到M的所有参数 $\beta_{m}$ ， $\gamma_{m}$ 的优化问题简化为逐次求解各个 $\beta_{m}$ ， $\gamma_{m}$ 的问题。（并不一定是全局最优）
步骤
给定数据集 $T=\left\{\left(x_{1}, y_{1}\right),\left(x_{2}, y_{2}\right), \cdots,\left(x_{N}, y_{N}\right)\right\}$ ， $x_{i} \in \mathcal{X} \subseteq \mathbf{R}^{n}$ ， $y_{i} \in \mathcal{Y}=\{+1,-1\}$ 。损失函数 $L (y, f (x))$ ，基函数集合 $\{b(x ; \gamma)\}$ ，输出加法模型 $f (x)$ 。
Step1 初始化 ： $f_{0}(x)=0$
Step2 迭代极小化损失函数
对m = 1,2,…,M:
$\left(\beta_{m}, \gamma_{m}\right)=\arg \min _{\beta, \gamma} \sum_{i=1}^{N} L\left(y_{i}, f_{m-1}\left(x_{i}\right)+\beta b\left(x_{i} ; \gamma\right)\right)$
$f_{m}(x)=f_{m-1}(x)+\beta_{m} b\left(x ; \gamma_{m}\right)$
Step3 最终加法模型
$f(x)=f_{M}(x)=\sum_{m=1}^{M} \beta_{m} b\left(x ; \gamma_{m}\right)$
Adaboost算法是前向分步算法的特例，由基本分类器组成的加法模型，损失函数为指数损失函数。

梯度提升树(GBDT)

提升树算法：加法模型+前向分步算法框架
梯度提升树算法：优化损失函数对原函数的负梯度（一类算法）
$\begin{array}{l|l|l} \hline \text { Setting } & \text { Loss Function } & -\partial L\left(y_{i}, f\left(x_{i}\right)\right) / \partial f\left(x_{i}\right) \\ \hline \text { Regression } & \frac{1}{2}\left[y_{i}-f\left(x_{i}\right)\right]^{2} & y_{i}-f\left(x_{i}\right) \\ \hline \text { Regression } & \left|y_{i}-f\left(x_{i}\right)\right| & \operatorname{sign}\left[y_{i}-f\left(x_{i}\right)\right] \\ \hline \text { Regression } & \text { Huber } & y_{i}-f\left(x_{i}\right) \text { for }\left|y_{i}-f\left(x_{i}\right)\right| \leq \delta_{m} \\ & & \delta_{m} \operatorname{sign}\left[y_{i}-f\left(x_{i}\right)\right] \text { for }\left|y_{i}-f\left(x_{i}\right)\right|>\delta_{m} \\ & & \text { where } \delta_{m}=\alpha \text { th-quantile }\left\{\left|y_{i}-f\left(x_{i}\right)\right|\right\} \\ \hline \text { Classification } & \text { Deviance } & k \text { th component: } I\left(y_{i}=\mathcal{G}_{k}\right)-p_{k}\left(x_{i}\right) \\ \hline \end{array}$
Adaboost和GBDT
GBDT算法的基础前向分布算法是在Adaboost基础上提出的算法框架，两者均是针对加法模型。但是最终的优化目标不同，adaboost是分类错误率，而GBDT为损失函数对原函数的负梯度。
python库：sklearn.ensemble.GradientBoostingRegressor和sklearn.ensemble.GradientBoostingClassifier

XGBoost

XGBoost是一种以CART决策树为子模型的GBDT算法
运算快于决策树的原因是：采用近似贪心算法
在XGBoost系统中, 用户可以根据需求自由选择使用精确贪心算法、近似算法全局策略、近似算法本地策略, 算法均可通过参数进行配置。
python库：xgboost
案例
用XGBoost进行分类，并进行调参
- 数据集：鸢尾花数据集
- 代码如下

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_iris

iris = load_iris()
X,y = iris.data,iris.target
col = iris.target_names 
train_x, valid_x, train_y, valid_y = train_test_split(X, y, test_size=0.3, random_state=1)   # 分训练集和验证集
parameters = {
              'max_depth': [5, 10, 15, 20, 25],
              'learning_rate': [0.01, 0.02, 0.05, 0.1, 0.15],
              'n_estimators': [500, 1000, 2000, 3000, 5000],
              'min_child_weight': [0, 2, 5, 10, 20],
              'max_delta_step': [0, 0.2, 0.6, 1, 2],
              'subsample': [0.6, 0.7, 0.8, 0.85, 0.95],
              'colsample_bytree': [0.5, 0.6, 0.7, 0.8, 0.9],
              'reg_alpha': [0, 0.25, 0.5, 0.75, 1],
              'reg_lambda': [0.2, 0.4, 0.6, 0.8, 1],
              'scale_pos_weight': [0.2, 0.4, 0.6, 0.8, 1]

}

xlf = xgb.XGBClassifier(max_depth=10,
            learning_rate=0.01,
            n_estimators=2000,
            silent=True,
            objective='multi:softmax',
            num_class=3 ,          
            nthread=-1,
            gamma=0,
            min_child_weight=1,
            max_delta_step=0,
            subsample=0.85,
            colsample_bytree=0.7,
            colsample_bylevel=1,
            reg_alpha=0,
            reg_lambda=1,
            scale_pos_weight=1,
            seed=0,
            missing=None)

gs = GridSearchCV(xlf, param_grid=parameters, scoring='accuracy', cv=3)
gs.fit(train_x, train_y)

print("Best score: %0.3f" % gs.best_score_)
print("Best parameters set: %s" % gs.best_params_ )

LightGBM

在XGBoost基础上进行优化

优化速度和内存使用
- 降低了计算每个分割增益的成本。
- 使用直方图减法进一步提高速度。
- 减少内存使用。
- 减少并行学习的计算成本。
稀疏优化
- 用离散的bin替换连续的值。如果#bins较小，则可以使用较小的数据类型（例如uint8_t）来存储训练数据。
- 无需存储其他信息即可对特征数值进行预排序。
精度优化
- 使用叶子数为导向的决策树建立算法而不是树的深度导向。
- 分类特征的编码方式的优化
- 通信网络的优化
- 并行学习的优化
- GPU支持

优点：

更快的训练效率
更低内存使用
更高的准确率
支持并行化学习
可以处理大规模数据

基本决策树与boosting提升分类器决策边界

数据集：鸢尾花数据集（对特征进行PCA降维）
比较模型：决策树、adaboost
代码如下

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
plt.style.use("ggplot")
from sklearn.datasets import load_iris
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

#加载鸢尾花数据集
iris = load_iris() 
#PCA降维
X =  PCA(n_components=2).fit_transform(iris.data)
y = iris.target
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=1,stratify=y)

#模型定义
tree = DecisionTreeClassifier(criterion='entropy',random_state=1,max_depth=1)
ada = AdaBoostClassifier(base_estimator=tree,n_estimators=500,learning_rate=0.1,random_state=1)

# 画出单层决策树与Adaboost的决策边界：
x_min = X_train[:, 0].min() - 1
x_max = X_train[:, 0].max() + 1
y_min = X_train[:, 1].min() - 1
y_max = X_train[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),np.arange(y_min, y_max, 0.1))
f, axarr = plt.subplots(nrows=1, ncols=2,sharex='col',sharey='row',figsize=(12, 6))
for idx, clf, tt in zip([0, 1],[tree, ada],['Decision tree', 'Adaboost']):
    clf.fit(X_train, y_train)
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    axarr[idx].contourf(xx, yy, Z, alpha=0.3)
    axarr[idx].scatter(X_train[y_train==0, 0],X_train[y_train==0, 1],c='blue', marker='^')
    axarr[idx].scatter(X_train[y_train==1, 0],X_train[y_train==1, 1],c='red', marker='o')
    axarr[idx].scatter(X_train[y_train==2, 0],X_train[y_train==2, 1],c='green', marker='*')
    axarr[idx].set_title(tt)
axarr[0].set_ylabel('PCA component2', fontsize=12)
plt.tight_layout()
plt.text(0, -0.2,s='PCA component1',ha='center',va='center',fontsize=12,transform=axarr[1].transAxes)
plt.show()

结果

Stacking

基本思路：上一个模型的输出作为下一个模型（通常是逻辑回归）的输入

Blending

步骤：
Step1 将数据划分为训练集和测试集(test_set)，其中训练集需要再次划分为训练集(train_set)和验证集(val_set)
Step2 创建第一层的多个模型，这些模型可以使同质的也可以是异质的；
Step3 使用train_set训练步骤2中的多个模型，然后用训练好的模型预测val_set和test_set得到val_predict, test_predict1；
Step4 创建第二层的模型,使用val_predict作为训练集训练第二层的模型；
Step5 使用第二层训练好的模型对第二层测试集test_predict1进行预测，该结果为整个测试集的结果。
代码
- 数据集：鸢尾花
- 第一层分类器：SVC,RF,KNN
- 第二层分类器：线性回归

import matplotlib.pyplot as plt
plt.style.use("ggplot")
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score

# 加载鸢尾花数据集
iris = load_iris() 
# 创建训练集和测试集
X_train1,X_test,y_train1,y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=1)
# 创建训练集和验证集
X_train,X_val,y_train,y_val = train_test_split(X_train1, y_train1, test_size=0.3, random_state=1)
# 定义第一层分类器
clfs = [SVC(probability = True),RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),KNeighborsClassifier()]
# 定义第二层分类器
lr = LinearRegression()
# 输出第一层的验证集结果与测试集结果
val_features = np.zeros((X_val.shape[0],len(clfs)))  # 初始化验证集结果
test_features = np.zeros((X_test.shape[0],len(clfs)))  # 初始化测试集结果
for i,clf in enumerate(clfs):
    clf.fit(X_train,y_train)
    val_feature = clf.predict_proba(X_val)[:, 1]
    test_feature = clf.predict_proba(X_test)[:,1]
    val_features[:,i] = val_feature
    test_features[:,i] = test_feature  
# 将第一层的验证集的结果输入第二层训练第二层分类器
lr.fit(val_features,y_val)
# 输出预测的结果
cross_val_score(lr,test_features,y_test,cv=5)
# 画图
x_min = val_features[:, 0].min() - 1
x_max = val_features[:, 0].max() + 1
y_min = val_features[:, 1].min() - 1
y_max = val_features[:, 1].max() + 1
z_min = val_features[:, 2].min() - 1
z_max = val_features[:, 2].max() + 1
xx, yy,zz = np.meshgrid(np.arange(x_min, x_max, 0.1),np.arange(y_min, y_max, 0.1),np.arange(z_min, z_max, 0.1))
Z = lr.predict(np.c_[xx.ravel(), yy.ravel(),zz.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx[:,:,0], yy[:,:,0], Z[:,:,0], alpha=0.3)
plt.scatter(val_features[y_val==0, 0],val_features[y_val==0, 1],c='blue', marker='^')
plt.scatter(val_features[y_val==1, 0],val_features[y_val==1, 1],c='red', marker='o')
plt.scatter(val_features[y_val==2, 0],val_features[y_val==2, 1],c='green', marker='*')
plt.xlabel("SVC results")
plt.ylabel("RF results")
plt.show()

结果
优点：简单粗暴
缺点：数据浪费、可能过拟合

Stacking

在blending基础上做了改进：第一层分类器训练的时候进行交叉验证=>增加了第二层模型训练集

python库：pip install mlxtend
案例（引用）

from sklearn import datasets

iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB 
from sklearn.ensemble import RandomForestClassifier
from mlxtend.classifier import StackingCVClassifier

RANDOM_SEED = 42

clf1 = KNeighborsClassifier(n_neighbors=1)
clf2 = RandomForestClassifier(random_state=RANDOM_SEED)
clf3 = GaussianNB()
lr = LogisticRegression()

# Starting from v0.16.0, StackingCVRegressor supports
# `random_state` to get deterministic result.
sclf = StackingCVClassifier(classifiers=[clf1, clf2, clf3],  # 第一层分类器
                            meta_classifier=lr,   # 第二层分类器
                            random_state=RANDOM_SEED)

print('3-fold cross validation:\n')

for clf, label in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes','StackingClassifier']):
    scores = cross_val_score(clf, X, y, cv=3, scoring='accuracy')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))
# 我们画出决策边界
from mlxtend.plotting import plot_decision_regions
import matplotlib.gridspec as gridspec
import itertools

gs = gridspec.GridSpec(2, 2)
fig = plt.figure(figsize=(10,8))
for clf, lab, grd in zip([clf1, clf2, clf3, sclf], 
                         ['KNN', 
                          'Random Forest', 
                          'Naive Bayes',
                          'StackingCVClassifier'],
                          itertools.product([0, 1], repeat=2)):
    clf.fit(X, y)
    ax = plt.subplot(gs[grd[0], grd[1]])
    fig = plot_decision_regions(X=X, y=y, clf=clf)
    plt.title(lab)
plt.show()

可以使用第一层分类器预测结果的概率加权作为第二层输入，需要在StackingClassifier 中增加一个参数设置：use_probas = True。并将verage_probas = True

clf1 = KNeighborsClassifier(n_neighbors=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
lr = LogisticRegression()

sclf = StackingCVClassifier(classifiers=[clf1, clf2, clf3],
                            use_probas=True,  # 
                            meta_classifier=lr,
                            random_state=42)

print('3-fold cross validation:\n')

for clf, label in zip([clf1, clf2, clf3, sclf], 
                      ['KNN', 
                       'Random Forest', 
                       'Naive Bayes',
                       'StackingClassifier']):

    scores = cross_val_score(clf, X, y, 
                                              cv=3, scoring='accuracy')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" 
          % (scores.mean(), scores.std(), label))

不使用mlxtend库，采用k-fold实现stacking



from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB 
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets

iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target

RANDOM_SEED = 42
clf1 = KNeighborsClassifier(n_neighbors=1)
clf2 = RandomForestClassifier(random_state=RANDOM_SEED)
clf3 = GaussianNB()
lr = LogisticRegression()

# Starting from v0.16.0, StackingCVRegressor supports
# `random_state` to get deterministic result.
sclf = StackingCVClassifier(classifiers=[clf1, clf2, clf3],  # 第一层分类器
                            meta_classifier=lr,   # 第二层分类器
                            random_state=RANDOM_SEED)

print('3-fold cross validation:\n')

for clf, label in zip([clf1, clf2, clf3, sclf], ['KNN', 'Random Forest', 'Naive Bayes','StackingClassifier']):
    scores = cross_val_score(clf, X, y, cv=3, scoring='accuracy')
    print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label))
# 我们画出决策边界
from mlxtend.plotting import plot_decision_regions
import matplotlib.gridspec as gridspec
import itertools

gs = gridspec.GridSpec(2, 2)
fig = plt.figure(figsize=(10,8))
for clf, lab, grd in zip([clf1, clf2, clf3, sclf], 
                         ['KNN', 
                          'Random Forest', 
                          'Naive Bayes',
                          'StackingCVClassifier'],
                          itertools.product([0, 1], repeat=2)):
    clf.fit(X, y)
    ax = plt.subplot(gs[grd[0], grd[1]])
    fig = plot_decision_regions(X=X, y=y, clf=clf)
    plt.title(lab)
plt.show()

Stacking与Blending比较

若𝑚个基模型使用𝑘折交叉验证，对于stacking和blending集成而言，分别需要进行几次训练和几次预测？

stacking：mk+1次训练，2mk+1次预测
blending：m+1次训练，2m+1次预测

[参考]：

Gu_NN

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
集成学习——Bagging、Boosting、Stacking

目录投票法集成学习BaggingBootstraps（题5.1）Bagging（题5.2-题5.6）BoostingStacking投票法投票法即通过对多个模型结果进行少数服从多数的融合。有助于提高模型的泛化能力，减少模型的错误率。回归模型投票法最终的预测结果是多个其他回归模型预测结果的平均值。分类模型硬投票：预测结果是多个模型预测结果中出现次数最多的类别软投票：对各类预测结果的概率进行求和，最终选取概率之和最大的类标签。投票法结果好需要满足的两个条件：基模型之间的效果不能差别
复制链接

扫一扫

专栏目录