Boosting

5.2 Boosting (Sequential) Learning

Boosting is under the PAC learning model. It involves incrementally building an ensemble by training each new model instance to emphasize the training instances that previous models mis-classified.
 
 

5.21 AdaBoost

AdaBoost = Weak Algorithm + Re-Weighting + Linear Aggregation

5.211 What is AdaBoost?

AdaBoost calls a given weak or base learning algorithm repeatedly in a series of rounds t = 1 , 2 , . . . , T t=1,2,...,T t=1,2,...,T ,which is used for classification, mostly binary in practice.
This algorithm takes as input a training set ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) (x_1,y_1),(x_2,y_2),...,(x_N,y_N) (x1,y1),(x2,y2),...,(xN,yN), where x n x_n xn belongs to some domain or instance space X X X. And each lable y n y_n yn is in some lable set Y Y Y.
One of the main ideas of the algorithm is to maintain a distribution or set of weights over the training set. The weight of this distribution on x n \bf x_n xn on round t t t is denoted u n ( t ) u_n^{(t)} un(t). Initially ,all weights are set equally,but on each round, the weights of incorrectly classified examples are increased so that the weak learner is forced to focus on the hard examples in the training set.

  1. 集成学习,适用于分类问题 (classification)。
  2. 有序地训练学习基 (base hypothesis or base learner, which is possibly weak).
  3. 当前轮的学习基在训练过程中会更重视上一个学习器中判错的样本。
5.212 Why AdaBoost?

AdaBoost can handle weak hypotheses which output real-valued or confidence-rated predictions. That is, for each instance, the weak hypothesis g t g_t gt outputs a prediction $g_t(\bf x_n) $ i n   R in \,\R inR, whose sign is the predicted lable (+1, -1)and whose magnitude ∣ g t ( x n ) ∣ |g_t(\bf x_n) | gt(xn) gives a measure of “confidence” in the prediction.

5.213 How AdaBoost?

Our mission is to find a weak hypothesis g t g_t gt for each round and their weights used for the final linear aggregation.
  
Given the learning data set D : D: D: { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) (x_1,y_1),(x_2,y_2),...,(x_N,y_N) (x1,y1),(x2,y2),...,(xN,yN)} and algorithm A . A. A.

  1. Set the initial weight u ( 1 ) = [ 1 N , 1 N , . . . , 1 N ] u^{(1)}=[{1\over N},{1\over N},...,{1\over N}] u(1)=[N1,N1,...,N1]
  2. f o r   t = 1 , 2 , . . . , T \bf for\ t=1,2,...,T for t=1,2,...,T
  3. g t ← u ( t ) g_t \leftarrow u^{(t)} gtu(t)
  4. ϵ t = ∑ n = 1 N u n ( t ) I   [ y n ≠ g t ( x n ) ] ∑ n = 1 N u n ( t ) \epsilon_t=\frac{\sum_{n=1}^{N}u_n^{(t)}I\ [y_n\neq g_t(\bf x_n)]}{\sum_{n=1}^{N}u_n^{(t)}} ϵt=n=1Nun(t)n=1Nun(t)I [yn̸=gt(xn)]
  5. i f   ϵ t > 0.5 if \ \epsilon_t>0.5 if ϵt>0.5, then break
  6. α t = 1 2 ln ⁡ ( 1 − ϵ t ϵ t ) \alpha_t={1\over2}\ln({{1-\epsilon_t}\over \epsilon_t}) αt=21ln(ϵt1ϵt)
  7. u ( t + 1 ) ← u ( t ) u^{(t+1)} \leftarrow u^{(t)} u(t+1)u(t)
       y n ≠ g t ( x n ) : y_n\neq g_t(\bf x_n):\quad yn̸=gt(xn): u ( t + 1 ) ← u ( t ) 1 − ϵ t ϵ t u^{(t+1)} \leftarrow u^{(t)}\sqrt{ {1-\epsilon_t}\over \epsilon_t} u(t+1)u(t)ϵt1ϵt
       y n = g t ( x n ) : y_n= g_t(\bf x_n):\quad yn=gt(xn): u ( t + 1 ) ← u ( t ) / 1 − ϵ t ϵ t u^{(t+1)} \leftarrow u^{(t)}/\sqrt{ {1-\epsilon_t}\over \epsilon_t} u(t+1)u(t)/ϵt1ϵt
  8. e n d   f o r \bf end \ for end for
  9. R e t u r n \bf Return Return G ( x ) G(\bf x) G(x) = s i g n [ ∑ t = 1 T α t g t ( x ) ] =sign [\sum_{t=1}^T\alpha_tg_t(\bf x)] =sign[t=1Tαtgt(x)]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值