Datawhale集成学习：Blending集成学习算法

最新推荐文章于 2024-09-11 13:52:18 发布

Mr.小林

最新推荐文章于 2024-09-11 13:52:18 发布

阅读量165

点赞数

文章标签：机器学习

本文链接：https://blog.csdn.net/weixin_41221544/article/details/116677276

版权

本文介绍了Blending集成学习的基本思路，它类似于Stacking的简化版。通过划分数据集，训练多层模型并利用第一层模型的预测结果作为第二层模型的输入。Blending方法的优点在于其简便性，但缺点是仅使用部分数据进行模型验证。文章还提供了Blending的实战示例，分别基于自创数据和鸢尾花数据集。

摘要由CSDN通过智能技术生成

前言

Blending 是简化版的Stacking，Stacking集成算法可以理解为一个两层的集成，第一层含有多个基础分类器，把预测的结果(元特征)提供给第二层，而第二层的分类器通常是逻辑回归，他把一层分类器的结果当做特征做拟合输出预测结果。

Blending方法的基本思路

将数据按照一定比例划分为训练集和测试集，其中训练集按照一定比例再次划分为训练集和验证集
创建第一层的多个同质或异质模型
使用训练集数据对第一层模型进行训练，然后使用验证集和测试集进行模型验证和测试，得到{val_predict}，test_predict
创建第二层的模型（一般是线性模型），使用val_predict作为第二层模型的训练集
使用训练好的第二层模型对test_predict进行预测，将得到的结果作为整个测试集的结果

Blending方法的优缺点

优点：方法简单，使用传递的方法进行训练、验证、测试
缺点：只使用一部分数据集进行模型验证

Blending实战(使用创建数据)

# import packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline
import seaborn as sns

# 创建数据
from sklearn import datasets
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

data, target = make_blobs(n_samples=10000, centers=2, random_state=1, cluster_std=1.0)

## 划分训练集和测试集
X_train_origin, X_test, y_train_origin, y_test = train_test_split(data, target, 
                                                                  test_size=0.2, random_state=1)

## 划分训练集和验证集
X_train, X_val, y_train, y_val = train_test_split(X_train_origin, 
                                                   y_train_origin, 
                                                   test_size=0.3, 
                                                   random_state=1)

print("The shape of training X:", X_train.shape)
print("The shape of training y:", y_train.shape)
print("The shape of test X:", X_test

最低0.47元/天解锁文章

Mr.小林

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫