假设最后根据各个基模型 G i ( x ) i ∈ [ 1 , m ] G_i(x) {i\in[1,m]} Gi(x)i∈[1,m],各个基模型重要程度为 α i \alpha_i αi加权得到的模型为 f m ( x ) f_m(x) fm(x),其中 y ∈ { − 1 , 1 } y\in{\{-1,1\}} y∈{−1,1}
f
m
(
x
)
=
∑
i
=
1
m
α
i
G
i
(
x
)
f_m(x) = \sum_{i = 1}^{m}\alpha_{i} G_i(x)
fm(x)=i=1∑mαiGi(x)
f
m
(
x
)
=
∑
i
=
1
m
−
1
α
i
G
i
(
x
)
+
α
m
G
m
(
x
)
f_m(x) = \sum_{i=1}^{m-1}\alpha_{i} G_i(x)+\alpha_m G_m(x)
fm(x)=i=1∑m−1αiGi(x)+αmGm(x)
f
m
(
x
)
=
f
m
−
1
(
x
)
+
α
m
G
m
(
x
)
f_m(x) = f_{m-1}(x) +\alpha_m G_m(x)
fm(x)=fm−1(x)+αmGm(x)
基学习器的损失函数为 L ( y , f ( x ) ) = e − y f ( x ) L(y,f(x)) = e^{-yf(x)} L(y,f(x))=e−yf(x)
所以整个Adaboost模型的损失函数为:
L
=
∑
i
=
1
n
e
x
p
(
−
y
i
f
(
x
i
)
)
L = \sum_{i =1}^{n}{exp{(-y_if(x_i))}}
L=i=1∑nexp(−yif(xi))
该损失函数的 α m \alpha_m αm和 G m ( x ) G_m(x) Gm(x)是需要求得的
( α m , G m ( x ) ) = a r g m i n α m , G m ∑ i = 1 n e x p ( − y i ( f m − 1 ( x i ) + α m G m ( x i ) ) ) (\alpha_m,G_m(x))= \mathop{argmin}\limits_{\alpha_m,G_m} \sum_{i=1}^{n}exp{(-y_i(f_{m-_1} (x_i)+\alpha_m G_m(x_i)))} (αm,Gm(x))=αm,Gmargmini=1∑nexp(−yi(fm−1(xi)+αmGm(xi)))
其中 ∑ i = 1 n e x p ( − y i ( f m − 1 ( x i ) + α m G m ( x i ) ) = ∑ i = 1 n e x p ( − y i f m − 1 ( x i ) ) e x p ( − y i α m G m ( x i ) ) \sum_{i=1}^{n}exp(-y_i(f_{m-1}(x_i)+\alpha_m G_m(x_i))=\sum_{i=1}^{n}exp(-y_if_{m-1}(x_i))exp(-y_i\alpha_m G_m(x_i)) ∑i=1nexp(−yi(fm−1(xi)+αmGm(xi))=∑i=1nexp(−yifm−1(xi))exp(−yiαmGm(xi))–(1)
设 ω i m = e x p ( − y i f m − 1 ( x i ) ) \omega_i^{m} = exp(-y_if_{m-1}(x_i)) ωim=exp(−yifm−1(xi)),将其带入式(1)得到:
∑ i = 1 n ω i m e x p ( − y i α m G m ( x i ) ) \sum\limits_{i=1}^{n}\omega_ i^ {m} exp(-y_i\alpha_mG_m(x_i)) i=1∑nωimexp(−yiαmGm(xi))–(2)
当 y i = G m ( x i ) y_i=G_m(x_i) yi=Gm(xi)时, y i G m ( x i ) = 1 y_iG_m(x_i) = 1 yiGm(xi)=1,当 y i ≠ G m ( x i ) y_i\neq G_m(x_i) yi=Gm(xi)时, y i G m ( x i ) = − 1 y_iG_m(x_i) = -1 yiGm(xi)=−1,所以式(2)可以写成
∑ y i = G m ( x i ) n ω i m e x p ( − α m ) + ∑ y i ≠ G m ( x i ) n ω i m e x p ( α m ) \sum\limits_{y_i=G_m(x_i)}^{n}\omega_{i}^{m}exp(-\alpha_m)+\sum\limits_{y_i \neq G_m(x_i)}^{n}\omega_{i}^{m} exp(\alpha_m) yi=Gm(xi)∑nωimexp(−αm)+yi=Gm(xi)∑nωimexp(αm)
= ∑ y i = G m ( x i ) n ω i m e x p ( − α m ) + ∑ y i ≠ G m ( x i ) n ω i m e x p ( α m ) + ∑ y i ≠ G m ( x i ) n ω i m e x p ( − α m ) − ∑ y i ≠ G m ( x i ) n ω i m e x p ( − α m ) =\sum\limits_{y_i=G_m(x_i)}^{n}\omega_{i}^{m}exp(-\alpha_m)+\sum\limits_{y_i \neq G_m(x_i)}^{n} \omega_{i}^{m}exp(\alpha_m) +\sum\limits_{y_i \neq G_m(x_i)}^{n} \omega_{i}^{m}exp(-\alpha_m) -\sum\limits_{y_i \neq G_m(x_i)}^{n} \omega_{i}^{m}exp(-\alpha_m) =yi=Gm(xi)∑nωimexp(−αm)+yi=Gm(xi)∑nωimexp(αm)+yi=Gm(xi)∑nωimexp(−αm)−yi=Gm(xi)∑nωimexp(−αm)
= e − α m ∑ i = 1 n ω i m − ( e − α m − e α m ) ∑ y i ≠ G m ( x i ) n ω i m =e^{-\alpha_m}\sum_{i=1}^{n}\omega_{i}^{m}-(e^{-\alpha_m}-e^{\alpha_m})\sum\limits_{y_i\neq G_m(x_i)}^{n}\omega_{i}^{m} =e−αm∑i=1nωim−(e−αm−eαm)yi=Gm(xi)∑nωim
= e − α m ∑ i = 1 n ω i m − ( e − α m − e α m ) ∑ i = 1 n ω i m I ( y i ≠ G m ( x i ) ) =e^{-\alpha_m}\sum\limits_{i=1}^{n}{\omega_{i}^{m}}-(e^{-\alpha_m}-e^{\alpha_m})\sum\limits_{i=1}^{n} {\omega_{i}^{m}I(y_i \neq G_m(x_i))} =e−αmi=1∑nωim−(e−αm−eαm)i=1∑nωimI(yi=Gm(xi))–(3)
要使式(3)最小,则 G m ( x ) G_m(x) Gm(x)应该取 a r g m i n G ∑ i = 1 n ω i m I ( y i ≠ G m ( x i ) ) \mathop{argmin}\limits_{G}\sum\limits_{i=1}^{n}\omega_{i}^{m}I(y_i \neq G_m(x_i)) Gargmini=1∑nωimI(yi=Gm(xi))
因为每次求每个点的重要程度时都会除以总和,所以 ∑ i = 1 n ω i m = 1 \sum\limits_{i=1}^{n}{\omega_{i}^{m}}=1 i=1∑nωim=1,对式(3)中的 α m \alpha_m αm求导得到:
− e − α m − ( − e − α m − e α m ) ∑ i = 1 n ω i n I ( y i ≠ G m ( x i ) ) = 0 -e^{-\alpha_m}-(-e^{-\alpha_m}-e^{\alpha_m})\sum\limits_{i=1}^{n}\omega_{i}^{n}I(y_i \neq G_m(x_i)) = 0 −e−αm−(−e−αm−eαm)i=1∑nωinI(yi=Gm(xi))=0
e − α m ( 1 − ∑ i = 1 n ω i n I ( y i ≠ G m ( x i ) ) ) = e α m ∑ i = 1 n I ( y i ≠ G m ( x i ) ) e^{-\alpha_m}(1-\sum\limits_{i=1}^{n}{\omega_{i}^{n}I(y_i\neq G_m(x_i))})= e^{\alpha_m}\sum\limits_{i=1}^{n}I(y_i \neq G_m(x_i)) e−αm(1−i=1∑nωinI(yi=Gm(xi)))=eαmi=1∑nI(yi=Gm(xi))–(4)
令 ∑ i = 1 n ω i = 1 n I ( y i ≠ G m ( x i ) ) = e m \sum\limits_{i=1}^{n}\omega_{i=1}^{n}I(y_i \neq G_m(x_i))= e_m i=1∑nωi=1nI(yi=Gm(xi))=em
化简式(4)得到 α m = 1 2 l n ( 1 − e m e m ) \alpha_m = \frac{1}{2}ln(\frac{1-e_m}{e_m}) αm=21ln(em1−em)
因为 ω i m = e x p ( − y i f m − 1 ( x i ) ) \omega_i^{m} = exp(-y_if_{m-1}(x_i)) ωim=exp(−yifm−1(xi))可以推导得
ω i m + 1 = e x p ( − y i f m ( x i ) ) = e x p ( − y i ( f m − 1 ( x i ) + α m G m ( x i ) ) ) = e x p ( − y i f m − 1 ) e x p ( − y i α m G m ( x i ) ) \omega_i^{m+1} = exp(-y_if_{m}(x_i))=exp(-y_i(f_{m-1}(x_i)+\alpha_m G_m(x_i)))=exp(-y_if_{m-1})exp(-y_i \alpha_m G_m(x_i)) ωim+1=exp(−yifm(xi))=exp(−yi(fm−1(xi)+αmGm(xi)))=exp(−yifm−1)exp(−yiαmGm(xi))
= ω i m e x p ( − y i α m G m ( x i ) ) =\omega_{i}^{m}exp(-y_i\alpha_m G_m(x_i)) =ωimexp(−yiαmGm(xi))