集成学习之Bagging/Boosting分类和回归

最新推荐文章于 2024-04-17 10:08:50 发布

扫地小僧SWM

最新推荐文章于 2024-04-17 10:08:50 发布

阅读量3.8k

点赞数 2

分类专栏： ML 文章标签：集成学习 Bagging Boosting Bagging/Boosting

本文链接：https://blog.csdn.net/weixin_43096996/article/details/100544819

版权

本文介绍了集成学习的基本概念，包括两种主要的集成方法：Bagging和Boosting。Bagging通过并行构建多个基学习器然后取平均来降低模型的方差，而Boosting则通过序列构建基学习器，逐步减少模型的偏差。在scikit-learn库中，可以找到这两种方法的实现，如BaggingClassifier/Regressor和AdaBoost、GradientBoosting等。

摘要由CSDN通过智能技术生成

集成学习

0.Official Description

The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator.

Two families of ensemble methods are usually distinguished:

In averaging methods, the driving principle is to build several estimators independently and then to average their predictions. On average, the combined estimator is usually better than any of the single base estimator because its variance is reduced.

Examples: Bagging methods, Forests of randomized trees, …
By contrast, in boosting methods, base estimators are built sequentially and one tries to reduce the bias of the combined estimator. The motivation is to combine several weak models to produce a powerful ensemble.

Examples: AdaBoost, Gradient Tree Boosting, …

As they provide a way to reduce overfitting,

Bagging methods work best with strong and complex models (e.g., fully developed decision trees),

in contrast with Boosting methods which usually work best with weak models (e.g., shallow decision trees).

1.什么是集成学习

集成学习通过构建并结合多个学习器来完成学习任务,有时也被称为多分类器系统、基于委员会的学习等.集成学习通过将多个学习器进行结合,常可获得比单一学习器显著优越的泛化性能.
# 泛化能力（generalization ability）是指机器学习算法对新鲜样本的适应能力,,简而言之是在原有的数据集上添加新的数据集,通过训练输出一个合理的结果.学习的目的是学到隐含在数据背后的规律,对具有同一规律的学习集以外的数据,经过训练的网络也能给出合适的输出,该能力称为泛化能力.

2.集成学习分类

根据基学习器的生成方式,目前的集成学习方法大致可以分为两大类,
1. Bagging
		基学习器间不存在强依赖关系,可同时生成的并行化方法
2. Boosting
		基学习器间存在强依赖关系,必须串行生成的序列化方法
# 基学习器/基分类器/弱学习器 =====> 子训练集通过机器学习算法训练得到的模型
# 上述几个是同一个东西,叫法不同而已,统称weak learner,垃圾翻译常有,自求多福吧

3.结合策略

对于基学习器最终的结合策略常见的方法有如下几种：

平均法

对于数值形输出,最常见的结合策略即为平均法：
$H(x)=\frac{1}{T}\sum_{i=1}^{T}h_{i}(x)$
其中
$h_{i}(x)为基学习器的输出结果，H(x)为最终学习器的结果,T为基学习器的个数.$

最低0.47元/天解锁文章

扫地小僧SWM

关注

2
点赞
踩
32

收藏

觉得还不错? 一键收藏
1
评论
集成学习之Bagging/Boosting分类和回归

集成学习0.Official DescriptionThe goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustnes...
复制链接

扫一扫