机器学习(4)分类之集成方法

集成学习:将各种分类器组合起来,这种组合结果被视为集成方法或元算法。使用集成方法有很多形式:同一算法、同一分类器在不同限定条件下的集成,或者不同算法的集成,数据集不同部分分配给不同算法的集成。

1、主要的两种集成方法##

集成方法主要包括bagging和boost。
1.1 bagging基本处理思路
A、利用boostrap方法抽取n个训练样本,样本可能被重复抽到。然后再进行k轮抽取,得到k个训练集,他们之间相互独立。
什么是boostrap方法?在原始数据的范围内作有放回的再抽样, 样本容量仍为n,原始数据中每个观察单位每次被抽到的概率相等, 为1/n , 所得样本称为Bootstrap样本。
B、k个训练集共训练了k个模型,具体使用什么算法,视具体的场景而定;
C、分类问题,k个模型得到的结果,采用投票的方式;回归问题:计算平均值。;
1.2 boosting基本思路
采用重赋权法跌代训练分类器。对每一轮样本权值分布依赖上一次的训练结果,产生误差越大的样本,所赋的权重越高。分类器之间采用序列式的线性加权方式进行组合。

1.3 bagging与boosting区别
样本选择:bagging在原始集上有放回选取,样本之间独立;boosting由于每个样本权重要改变因此每一轮训练集不变,以便赋值不同权重。
样本权重:bagging中均匀取样,样本权重相等;boosting错误率越大权重越大
预测函数:bagging所有预测函数权重相等;boosting每个弱分类有相应权重,分类误差小的分类器有更大的权重
并行计算:bagging各个预测函数并行生成;boosting中预测函数顺序生成,因为结果有依赖关系。

2、AdaBoosting

运行过程:
2.1 计算样本权重
一般都是初始化每个样本权重为1/n
错误率e的统计: e=为正确分类的样本数目/所有样本数目
2.3 计算弱学习算法权重
利用错误率计算权重α:
α=1/2 * ln((1-e)/e)
2.4 更新样本权重
每一次学习完成后,都需要更新样本权重,被错分的将给与更大的权重
这里写图片描述
Z_t归一化后的值:
Zt=sum(D)
公式最后可化为:
这里写图片描述
2.5 AdaBoost算法
重复进行学习,经过若干次迭代学习后,得到n个弱学习算法最终输出如下:
这里写图片描述

详细见PDF:

http://download.csdn.net/download/u011730199/10050276

Ensemble methodology imitates our second nature to seek several opinions before making a crucial decision. The core principle is to weigh several individual pattern classifiers, and combine them in order to reach a classification that is better than the one obtained by each of them separately. Researchers from various disciplines such as pattern recognition, statistics, and machine learning have explored the use of ensemble methods since the late seventies. Given the growing interest in the field, it is not surprising that researchers and practitioners have a wide variety of methods at their disposal. Pattern Classification Using Ensemble Methods aims to provide a methodic and well structured introduction into this world by presenting a coherent and unified repository of ensemble methods, theories, trends, challenges and applications. Its informative, factual pages will provide researchers, students and practitioners in industry with a comprehensive, yet concise and convenient reference source to ensemble methods. The book describes in detail the classical methods, as well as extensions and novel approaches that were recently introduced. Along with algorithmic descriptions of each method, the reader is provided with a description of the settings in which this method is applicable and with the consequences and the trade-offs incurred by using the method. This book is dedicated entirely to the field of ensemble methods and covers all aspects of this important and fascinating methodology.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值