AdaBoost算法的推导以及误差分析

本文利用向前分布算法详细推导AdaBoost算法。首先对向前分布向前分布算法进行的简单介绍。

向前分布算法

向前分布算法学习得到的是加法模型:
f ( x ) = ∑ m = 1 M α m G ( x ; γ m ) f(x)=\sum_{m=1}^{M}\alpha_{m}G(x;\gamma_{m}) f(x)=m=1MαmG(x;γm)
其中, G ( x ; γ m ) G(x;\gamma_{m}) G(x;γm)是基学习器, γ m \gamma_{m} γm为基学习器参数, α m \alpha_{m} αm为基学习器的系数。
向前分步算法由前往后,逐步增加一个基学习器以及它的系数,去极小化下面的损失函数:
m i n γ m , α m    ∑ i = 1 N L ( y i , ∑ k = 1 m − 1 α k G ( x i , γ k ) + α m G ( x i , γ m ) ) min_{\gamma_{m},\alpha_{m}}~~\sum_{i=1}^{N}L(y_{i},\sum_{k=1}^{m-1}\alpha_{k}G(x_{i},\gamma_{k})+\alpha_{m}G(x_{i},\gamma_{m})) minγm,αm  i=1NL(yi,k=1m1αkG(xi,γk)+αmG(xi,γm))
其中N是样本个数, ( x i , y i ) (x_{i},y_{i}) (xi,yi)是样本, L ( y , f ( x ) ) L(y,f(x)) L(y,f(x))是损失函数。
注意到给定样本数据集以及损失函数即可按照向前分布算法学习到一个加法模型

推导过程

当向前分步算法选择的是指数损失函数
L ( y , f ( x ) ) = e x p ( − y f ( x ) ) L(y,f(x))=exp(-yf(x)) L(y,f(x))=exp(yf(x))
指数损失函数与分类错误率 e r r err err的关系:

∑ i = 1 N L ( y i , f ( x i ) ) = ∑ i = 1 N e x p ( − y i f ( x i ) ) = ∑ i = 1 N e x p ( − y i f ( x i ) ) I ( y i ≠ f ( x i ) ) + ∑ i = 1 N e x p ( − y i f ( x i ) ) I ( y i = f ( x i ) ) = ∑ i = 1 N e I ( y i ≠ f ( x i ) ) + ∑ i = 1 N e − 1   I ( y i = f ( x i ) ) = ∑ i = 1 N e I ( y i ≠ f ( x i ) ) + e − 1 ( N − ∑ i = 1 N I ( y i ≠ f ( x i ) ) ) = ( e − e − 1 ) ∑ i = 1 N I ( y i ≠ f ( x i ) ) + e − 1 N = N ( e − e − 1 ) e r r + e − 1 N \begin{aligned} \sum_{i=1}^{N}L(y_{i},f(x_{i})) &=\sum_{i=1}^{N}exp(-y_{i}f(x_{i})) \\ & =\sum_{i=1}^{N}exp(-y_{i}f(x_{i}))I(y_{i}\neq f(x_{i}))+\sum_{i=1}^{N}exp(-y_{i}f(x_{i}))I(y_{i}= f(x_{i}))\\ &=\sum_{i=1}^{N}eI(y_{i}\neq f(x_{i}))+\sum_{i=1}^{N}e^{-1}~I(y_{i}= f(x_{i}))\\ &=\sum_{i=1}^{N}eI(y_{i}\neq f(x_{i}))+e^{-1}(N-\sum_{i=1}^{N}I(y_{i}\neq f(x_{i})))\\ &=(e-e^{-1})\sum_{i=1}^{N}I(y_{i}\neq f(x_{i}))+e^{-1}N\\ &=N(e-e^{-1})err+e^{-1}N \end{aligned} i=1NL(yi,f(xi))=i=1Nexp(yif(xi))=i=1Nexp(yif(xi))I(yi=f(xi))+i=1Nexp(yif(xi))I(yi=f(xi))=i=1NeI(yi=f(xi))+i=1Ne1 I(yi=f(xi))=i=1NeI(yi=f(xi))+e1(Ni=1NI(yi=f(xi)))=(ee1)i=1NI(yi=f(xi))+e1N=N(ee1)err+e1N
假设,m-1次迭代得到
f m − 1 ( x ) = α 1 G 1 + α 2 G 2 + . . . . + α m − 1 G m − 1 f_{m-1}(x)=\alpha_{1}G_{1}+\alpha_{2}G_{2}+....+\alpha_{m-1}G_{m-1} fm1(x)=α1G1+α2G2+....+αm1Gm1
m m m次迭代得到的 ( α m , G m ) (\alpha_{m},G_{m}) (αm,Gm)由下式求得
( α m , G m ) = a r g m i n α , G ∑ i = 1 N e x p ( − y i ( f m − 1 ( x i ) + α y i G ( x i ) ) ) (\alpha_{m},G_{m})=argmin_{\alpha,G}\sum_{i=1}^{N}exp(-y_{i}(f_{m-1}(x_{i})+\alpha y_iG(x_{i}))) (αm,Gm)=argminα,Gi=1Nexp(yi(fm1(xi)+αyiG(xi)))
ω ˉ m i = − y i ( f m − 1 ( x i ) ) \bar{\omega}_{mi}=-y_{i}(f_{m-1}(x_{i})) ωˉmi=yi(fm1(xi)), ω ˉ m i \bar{\omega}_{mi} ωˉmi ( α , G ) (\alpha,G) (α,G)无关,随着每轮迭代更新,于是有
( α m , G m ) = a r g m i n α , G ∑ i = 1 N ω ˉ m i e x p ( − α y i G ( x i ) ) )        ( ∗ ) (\alpha_{m},G_{m})=argmin_{\alpha,G}\sum_{i=1}^{N}\bar{\omega}_{mi}exp(-\alpha y_iG(x_{i})))~~~~~~(*) (αm,Gm)=argminα,Gi=1Nωˉmiexp(αyiG(xi)))      
对任意 α > 0 \alpha>0 α>0,有
∑ i = 1 N ω ˉ m i e x p ( − α y i G ( x i ) ) = ∑ i = 1 N ω ˉ m i e α I ( y i ≠ G ( x i ) ) + ∑ i = 1 N ω ˉ m i e − α I ( y i = G ( x i ) ) = ∑ i = 1 N ω ˉ m i e α I ( y i ≠ G ( x i ) ) + e − α ( ∑ i = 1 N ω ˉ m i − ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) ) = e − α ∑ i = 1 N ω ˉ m i + ( e α − e − α ) ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) )     ( ∗ ∗ ) \begin{aligned} \sum_{i=1}^{N}\bar{\omega}_{mi}exp(-\alpha y_iG(x_{i})) &=\sum_{i=1}^{N}\bar{\omega}_{mi}e^{\alpha}I(y_{i}\neq G(x_{i}))+\sum_{i=1}^{N}\bar{\omega}_{mi}e^{-\alpha}I(y_{i}=G(x_{i}))\\ &=\sum_{i=1}^{N}\bar{\omega}_{mi}e^{\alpha}I(y_{i}\neq G(x_{i}))+e^{-\alpha}(\sum_{i=1}^{N}\bar{\omega}_{mi}-\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))\\ &=e^{-\alpha}\sum_{i=1}^{N}\bar{\omega}_{mi}+(e^\alpha-e^{-\alpha})\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))~~~(**) \end{aligned} i=1Nωˉmiexp(αyiG(xi))=i=1NωˉmieαI(yi=G(xi))+i=1NωˉmieαI(yi=G(xi))=i=1NωˉmieαI(yi=G(xi))+eα(i=1Nωˉmii=1NωˉmiI(yi=G(xi))=eαi=1Nωˉmi+(eαeα)i=1NωˉmiI(yi=G(xi))   ()
( e α − e − α ) > 0 (e^\alpha-e^{-\alpha})>0 (eαeα)>0,可知使 ( ∗ ) ( *) ()式最小的 G ( x ) G(x) G(x)
G m = a r g   m i n G ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) ) G_{m}=arg~min_{G}\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i})) Gm=arg minGi=1NωˉmiI(yi=G(xi))
再继续求 α m \alpha_{m} αm,对 ( ∗ ∗ ) (**) ()式关于 α \alpha α求导,并令其大于0,得到
α > 1 2 l o g 1 − e r r m e r r m \alpha>\frac{1}{2}log \frac{1-err_{m}}{err_{m}} α>21logerrm1errm
e r r m err_{m} errm G m G_{m} Gm在数据集上的加权误差率,等于
e r r m = ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) ) ∑ i = 1 N ω ˉ m i = ∑ i = 1 N ω m i I ( y i ≠ G ( x i ) ) err_{m}=\frac{\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))}{\sum_{i=1}^{N}\bar{\omega}_{mi}}=\sum_{i=1}^{N}\omega_{mi}I(y_{i}\neq G(x_{i})) errm=i=1Nωˉmii=1NωˉmiI(yi=G(xi))=i=1NωmiI(yi=G(xi))

即当 α m = 1 2 l o g 1 − e r r m e r r m \alpha_{m}=\frac{1}{2}log \frac{1-err_{m}}{err_{m}} αm=21logerrm1errm时, ( ∗ ∗ ) (**) ()取最小值。我们认弱分类器的误差率 e r r m err_{m} errm小于1/2,即优于随机猜测的结果,故 α m > 0 \alpha_{m}>0 αm>0
得到模型迭代结果
f m ( x ) = f m − 1 ( x ) + α m G m ( x ) f_{m}(x)=f_{m-1}(x)+\alpha_{m}G_{m}(x) fm(x)=fm1(x)+αmGm(x)
简单总结一下整个算法的过程:
输入:训练数据集: { ( x i , y i ) } i = 1 , 2 , . . . , N \{(x_{i},y_{i})\}_{i=1,2,...,N} {(xi,yi)}i=1,2,...,N x i ∈ χ ⊆ R n , y i = ± 1 x_{i}\in \chi \subseteq R^{n},y_{i}=\pm1 xiχRn,yi=±1
           ~~~~~~~~~~           损失函数: L ( y , f ( x ) ) = e x p ( − y f ( x ) ) L(y,f(x))=exp(-yf(x)) L(y,f(x))=exp(yf(x))
1.初始化 f 0 ( x ) = 0 f_{0}(x)=0 f0(x)=0
2.计算权值 ω ˉ m i = e x p ( − y i f m − 1 ( x i ) ) , m ≥ 1 , i = 1 , . . . , N \bar{\omega}_{mi}=exp(-y_{i}f_{m-1}(x_{i})),m\ge1,i=1,...,N ωˉmi=exp(yifm1(xi)),m1,i=1,...,N;
3.求基分类器 G m = a r g   m i n G ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) ) G_{m}=arg~min_{G}\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i})) Gm=arg minGi=1NωˉmiI(yi=G(xi));
4.求分类误差率: e r r m = ∑ i = 1 N ω ˉ m i I ( y i ≠ G ( x i ) ) ∑ i = 1 N ω ˉ m i err_{m}=\frac{\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))}{\sum_{i=1}^{N}\bar{\omega}_{mi}} errm=i=1Nωˉmii=1NωˉmiI(yi=G(xi));
5.求分类器系数: α m = 1 2 l o g 1 − e r r m e r r m \alpha_{m}=\frac{1}{2}log \frac{1-err_{m}}{err_{m}} αm=21logerrm1errm
6.得到AdaBoost模型: f m ( x ) = f m − 1 ( x ) + α m G m ( x ) f_{m}(x)=f_{m-1}(x)+\alpha_{m}G_{m}(x) fm(x)=fm1(x)+αmGm(x)

ω m + 1 i = w m i Z m e x p ( − α m y i G m ( x i ) ) , i = 1 , . . . , N \omega_{m+1i}=\frac{w_{mi}}{Z_{m}}exp(-\alpha_{m}y_{i}G_{m}(x_{i})),i=1,...,N ωm+1i=Zmwmiexp(αmyiGm(xi)),i=1,...,N
ω 1 i = 1 N , , i = 1 , . . . , N \omega_{1i}=\frac{1}{N},,i=1,...,N ω1i=N1,,i=1,...,N
其中 Z m Z_{m} Zm是规范化因子
Z m = ∑ i = 1 N ω m i e x p ( − α m y i G m ( x i ) ) Z_{m}=\sum_{i=1}^{N}\omega_{mi}exp(-\alpha_{m}y_{i}G_{m}(x_{i})) Zm=i=1Nωmiexp(αmyiGm(xi))
则对任意 m ≥ 1 m\ge1 m1,有
ω m i = ω ˉ m i ∑ i = 1 N ω ˉ m i \omega_{mi}=\frac{\bar{\omega}_{mi}}{\sum_{i=1}^{N}\bar{\omega}_{mi}} ωmi=i=1Nωˉmiωˉmi

ω m i \omega_{mi} ωmi ω ˉ m i \bar{\omega}_{mi} ωˉmi之间只相差一个规范化因子

ω ˉ m i ∑ i = 1 N ω ˉ m i = 1 N Π k = 1 m − 1 e x p ( − y i α k G k ( x i ) ) 1 N ∑ i = 1 N Π k = 1 m − 1 e x p ( − y i α k G k ( x i ) )                    ( 1 ) = 1 N Π k = 1 m − 1 e x p ( − y i α k G k ( x i ) ) ∑ i = 1 N ω 2 i Z 1 Π k = 2 m − 1 e x p ( − y i α k G k ( x i ) )              ( 2 ) = 1 N Π k = 1 m − 1 e x p ( − y i α k G k ( x i ) ) ∑ i = 1 N ω 3 i Z 1 Z 2 Π k = 3 m − 1 e x p ( − y i α k G k ( x i ) )          ( 3 ) = . . . . . .                                                                    ( 4 ) = 1 N Π k = 1 m − 1 e x p ( − y i α k G k ( x i ) ) Z 1 Z 2 . . . Z m − 1                             ( 5 ) = ω 2 i Π k = 2 m − 1 e x p ( − y i α k G k ( x i ) ) Z 2 . . . Z m − 1 = ω 3 i Π k = 3 m − 1 e x p ( − y i α k G k ( x i ) ) Z 3 . . . Z m − 1 = . . . = ω m − 1 i e x p ( − y i α m − 1 G m − 1 ( x i ) ) Z m − 1 = ω m i \begin{aligned} \frac{\bar{\omega}_{mi}}{\sum_{i=1}^{N}\bar{\omega}_{mi}} & =\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{\frac{1}{N}\sum_{i=1}^{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}~~~~~~~~~~~~~~~~~~(1)\\ &=\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{\sum_{i=1}^{N}\omega_{2i}Z_{1}\Pi_{k=2}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}~~~~~~~~~~~~(2)\\ &=\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{\sum_{i=1}^{N}\omega_{3i}Z_{1}Z_{2}\Pi_{k=3}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}~~~~~~~~(3)\\ &=......~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~(4)\\ &=\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{Z_{1}Z_{2}...Z_{m-1}}~~~~~~~~~~~~~~~~~~~~~~~~~~~(5)\\ &=\frac{\omega_{2i}\Pi_{k=2}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{Z_{2}...Z_{m-1}}\\ &=\frac{\omega_{3i}\Pi_{k=3}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{Z_{3}...Z_{m-1}}\\ &=...\\ &=\frac{\omega_{m-1i}exp(-y_{i}\alpha_{m-1}G_{m-1}(x_{i}))}{Z_{m-1}}\\ &=\omega_{mi} \end{aligned} i=1Nωˉmiωˉmi=N1i=1NΠk=1m1exp(yiαkGk(xi))N1Πk=1m1exp(yiαkGk(xi))                  (1)=i=1Nω2iZ1Πk=2m1exp(yiαkGk(xi))N1Πk=1m1exp(yiαkGk(xi))            (2)=i=1Nω3iZ1Z2Πk=3m1exp(yiαkGk(xi))N1Πk=1m1exp(yiαkGk(xi))        (3)=......                                                                  (4)=Z1Z2...Zm1N1Πk=1m1exp(yiαkGk(xi))                           (5)=Z2...Zm1ω2iΠk=2m1exp(yiαkGk(xi))=Z3...Zm1ω3iΠk=3m1exp(yiαkGk(xi))=...=Zm1ωm1iexp(yiαm1Gm1(xi))=ωmi
ω ˉ m i \bar{\omega}_{mi} ωˉmi ω m i \omega_{mi} ωmi之间只相差一个规范化因子,对求 G m G_{m} Gm无影响,故算法可以变为

输入:训练数据集: { ( x i , y i ) } i = 1 , 2 , . . . , N \{(x_{i},y_{i})\}_{i=1,2,...,N} {(xi,yi)}i=1,2,...,N x i ∈ χ ⊆ R n , y i = ± 1 x_{i}\in \chi \subseteq R^{n},y_{i}=\pm1 xiχRn,yi=±1
           ~~~~~~~~~~           损失函数: L ( y , f ( x ) ) = e x p ( − y f ( x ) ) L(y,f(x))=exp(-yf(x)) L(y,f(x))=exp(yf(x))
1.初始化 f 0 ( x ) = 0 f_{0}(x)=0 f0(x)=0
2.计算权值 ω m i = ω m − 1 , i e x p ( α m − 1 y i G m − 1 ( x i ) ) Z m − 1 , m ≥ 1 , i = 1 , . . . , N \omega_{mi}=\frac{\omega_{m-1,i}exp(\alpha_{m-1}y_{i}G_{m-1}(x_{i}))}{Z_{m-1}},m\ge1,i=1,...,N ωmi=Zm1ωm1,iexp(αm1yiGm1(xi)),m1,i=1,...,N;
3.求基分类器 G m = a r g   m i n G ∑ i = 1 N ω m i I ( y i ≠ G ( x i ) ) G_{m}=arg~min_{G}\sum_{i=1}^{N}\omega_{mi}I(y_{i}\neq G(x_{i})) Gm=arg minGi=1NωmiI(yi=G(xi));
4.求分类误差率: e r r m = ∑ i = 1 N ω m i I ( y i ≠ G ( x i ) ) err_{m}=\sum_{i=1}^{N}\omega_{mi}I(y_{i}\neq G(x_{i})) errm=i=1NωmiI(yi=G(xi));
5.求分类器系数: α m = 1 2 l o g 1 − e r r m e r r m \alpha_{m}=\frac{1}{2}log \frac{1-err_{m}}{err_{m}} αm=21logerrm1errm
6.得到AdaBoost模型: f m ( x ) = f m − 1 ( x ) + α m G m ( x ) f_{m}(x)=f_{m-1}(x)+\alpha_{m}G_{m}(x) fm(x)=fm1(x)+αmGm(x)

误差分析

AdaBoost的分类误差:
1 N ∑ i = 1 N I ( f ( x i ) ≠ y i ) ≤ e x p ( − 2 ∑ m = 1 M γ m 2 ) \frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le exp(-2\sum_{m=1}^M\gamma_{m}^2) N1i=1NI(f(xi)=yi)exp(2m=1Mγm2)
其中 γ m = 1 2 − e m , f ( x ) = ∑ k = 1 m α k G k ( x ) \gamma_{m}=\frac{1}{2}-e_{m},f(x)=\sum_{k=1}^{m}\alpha_{k}G_{k}(x) γm=21emf(x)=k=1mαkGk(x)
证明:
首先证明 1 N ∑ i = 1 N I ( f ( x i ) ≠ y i ) ≤ Π k = 1 m Z k \frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le \Pi_{k=1}^{m}Z_{k} N1i=1NI(f(xi)=yi)Πk=1mZk
y i = f ( x i ) y_{i}=f(x_{i}) yi=f(xi)
I ( f ( x i ) ≠ y i ) = 0 ≤ e x p ( − y i f ( x i ) ) I(f(x_{i})\neq y_{i})=0\le exp(-y_{i}f(x_{i})) I(f(xi)=yi)=0exp(yif(xi))
y i ≠ f ( x i ) y_{i}\neq f(x_{i}) yi=f(xi)
I ( f ( x i ) ≠ y i ) = e x p ( − y i f ( x i ) ) = 1 I(f(x_{i})\neq y_{i})= exp(-y_{i}f(x_{i}))=1 I(f(xi)=yi)=exp(yif(xi))=1


1 N ∑ i = 1 N I ( f ( x i ) ≠ y i ) ≤ 1 N ∑ i = 1 N e x p ( − y i f ( x i ) ) \frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le\frac{1}{N}\sum_{i=1}^{N}exp(-y_{i}f(x_{i})) N1i=1NI(f(xi)=yi)N1i=1Nexp(yif(xi))
又由式 ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) (1)(2)(3)(4)(5) (1)(2)(3)(4)(5)可知
1 N ∑ i = 1 N e x p ( − y i f ( x i ) ) = Π k = 1 m Z k \frac{1}{N}\sum_{i=1}^{N}exp(-y_{i}f(x_{i}))=\Pi_{k=1}^{m}Z_{k} N1i=1Nexp(yif(xi))=Πk=1mZk
其次证明: Π k = 1 m Z k ≤ e x p ( − 2 ∑ k = 1 m γ k 2 ) \Pi_{k=1}^{m}Z_{k}\le exp(-2\sum_{k=1}^{m}\gamma_{k}^2) Πk=1mZkexp(2k=1mγk2)
∀ k , 1 ≤ k ≤ m \forall k,1\le k\le m k,1km,有
Z k = ∑ j = 1 k ω k i e x p ( − α k y i G k ( x i ) ) = ∑ G k ( x i ) = y i ω k i e − α k + ∑ G k ( x i ) ≠ y i ω k i e α k = ( 1 − e k ) e − α k + e k e α k = 2 ( e k ( 1 − e k ) ) 1 2 = ( 1 − 4 γ k 2 ) 1 2 \begin{aligned} Z_{k} & =\sum_{j=1}^{k}\omega_{ki}exp(-\alpha_{k}y_{i}G_{k}(x_{i}))\\ &=\sum_{G_{k}(x_{i})=y_{i}}\omega_{ki}e^{-\alpha_{k}}+\sum_{G_{k}(x_{i})\neq y_{i}}\omega_{ki}e^{\alpha_{k}}\\ &=(1-e_{k})e^{-\alpha_{k}}+e_{k}e^{\alpha_{k}}\\ &=2(e_{k}(1-e_{k}))^{\frac{1}{2}}\\ &=(1-4\gamma_{k}^2)^{\frac{1}{2}} \end{aligned} Zk=j=1kωkiexp(αkyiGk(xi))=Gk(xi)=yiωkieαk+Gk(xi)=yiωkieαk=(1ek)eαk+ekeαk=2(ek(1ek))21=(14γk2)21
于是
Π k = 1 m Z k = Π k = 1 m ( 1 − 4 γ k 2 ) 1 2 \Pi_{k=1}^{m}Z_{k}=\Pi_{k=1}^{m}(1-4\gamma_{k}^2)^{\frac{1}{2}} Πk=1mZk=Πk=1m(14γk2)21
∀ k , 1 ≤ k ≤ m \forall k,1\le k\le m k,1km,只需证明
( 1 − 4 γ k 2 ) 1 2 ≤ e x p ( − 2 γ k 2 )             ( ∗ ) (1-4\gamma_{k}^2)^{\frac{1}{2}}\le exp(-2\gamma_{k}^{2})~~~~~~~~~~~(*) (14γk2)21exp(2γk2)           ()
即有

Π k = 1 m Z k = Π k = 1 m ( 1 − 4 γ k 2 ) 1 2 ≤ e x p ( − 2 ∑ k = 1 m γ k 2 ) \Pi_{k=1}^{m}Z_{k}=\Pi_{k=1}^{m}(1-4\gamma_{k}^2)^{\frac{1}{2}}\le exp(-2\sum_{k=1}^{m}\gamma_{k}^2) Πk=1mZk=Πk=1m(14γk2)21exp(2k=1mγk2)
( ∗ ) (*) ()式两边加对数,得
1 2 l n ( 1 − 4 γ k 2 ) ≤ − 2 γ k 2 \frac{1}{2}ln(1-4\gamma_{k}^2)\le -2\gamma_{k}^{2} 21ln(14γk2)2γk2
l n ( 1 − 4 γ k 2 ) ln(1-4\gamma_{k}^2) ln(14γk2)进行泰勒展开,得
l n ( 1 − 4 γ k 2 ) = − 4 γ k 2 − ( − 4 γ k 2 ) 2 2 + ( − 4 γ k 2 ) 3 3 − . . . − 4 γ k 2 n − . . . ln(1-4\gamma_{k}^2)=-4\gamma_{k}^2-\frac{(-4\gamma_{k}^2)^2}{2}+\frac{(-4\gamma_{k}^2)^3}{3}-...-\frac{4\gamma_{k}^2}{n}-... ln(14γk2)=4γk22(4γk2)2+3(4γk2)3...n4γk2...
故有

1 2 l n ( 1 − 4 γ k 2 ) ≤ − 2 γ k 2 \frac{1}{2}ln(1-4\gamma_{k}^2)\le -2\gamma_{k}^{2} 21ln(14γk2)2γk2
证毕。

注意:
当存在 γ > 0 \gamma >0 γ>0,使得对任意 k k k,均满足 γ k ≥ γ \gamma_{k}\ge \gamma γkγ时,即对所有基分类器的误差率都小于某个小于 1 2 \frac{1}{2} 21时,有

1 N ∑ i = 1 N I ( f ( x i ) ≠ y i ) ≤ e x p ( − 2 m γ 2 ) \frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le exp(-2m\gamma^{2}) N1i=1NI(f(xi)=yi)exp(2mγ2)
即AdaBoost算法的误差率随者迭代次数的增加以指数的形式下降。

正则化

为了防止过拟合,在AdaBoost算法弱分类器的迭代过程中加入正则化项 v v v,即
f m ( x ) = f m − 1 ( x ) + v α m G m ( x ) f_{m}(x)=f_{m-1}(x)+v\alpha_{m}G_{m}(x) fm(x)=fm1(x)+vαmGm(x)
其中 v v v称为步长, v v v越小,所需要的迭代次数越高,通常用步长和迭代的最高次数决定算法的拟合效果。

优缺点

优点:
1)分类正确率高;
2)不易发生过拟合;
3)基于AdaBoost的框架,可以利用各种分类回归学习器构建弱学习器,通常使用决策树和神经网络。
缺点:
1)对噪声敏感,对于异样的样本可能会赋予很高的权重。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值