本文利用向前分布算法详细推导AdaBoost算法。首先对向前分布向前分布算法进行的简单介绍。
向前分布算法
向前分布算法学习得到的是加法模型:
f
(
x
)
=
∑
m
=
1
M
α
m
G
(
x
;
γ
m
)
f(x)=\sum_{m=1}^{M}\alpha_{m}G(x;\gamma_{m})
f(x)=m=1∑MαmG(x;γm)
其中,
G
(
x
;
γ
m
)
G(x;\gamma_{m})
G(x;γm)是基学习器,
γ
m
\gamma_{m}
γm为基学习器参数,
α
m
\alpha_{m}
αm为基学习器的系数。
向前分步算法由前往后,逐步增加一个基学习器以及它的系数,去极小化下面的损失函数:
m
i
n
γ
m
,
α
m
∑
i
=
1
N
L
(
y
i
,
∑
k
=
1
m
−
1
α
k
G
(
x
i
,
γ
k
)
+
α
m
G
(
x
i
,
γ
m
)
)
min_{\gamma_{m},\alpha_{m}}~~\sum_{i=1}^{N}L(y_{i},\sum_{k=1}^{m-1}\alpha_{k}G(x_{i},\gamma_{k})+\alpha_{m}G(x_{i},\gamma_{m}))
minγm,αm i=1∑NL(yi,k=1∑m−1αkG(xi,γk)+αmG(xi,γm))
其中N是样本个数,
(
x
i
,
y
i
)
(x_{i},y_{i})
(xi,yi)是样本,
L
(
y
,
f
(
x
)
)
L(y,f(x))
L(y,f(x))是损失函数。
注意到给定样本数据集以及损失函数即可按照向前分布算法学习到一个加法模型
推导过程
当向前分步算法选择的是指数损失函数
L
(
y
,
f
(
x
)
)
=
e
x
p
(
−
y
f
(
x
)
)
L(y,f(x))=exp(-yf(x))
L(y,f(x))=exp(−yf(x))
指数损失函数与分类错误率
e
r
r
err
err的关系:
∑
i
=
1
N
L
(
y
i
,
f
(
x
i
)
)
=
∑
i
=
1
N
e
x
p
(
−
y
i
f
(
x
i
)
)
=
∑
i
=
1
N
e
x
p
(
−
y
i
f
(
x
i
)
)
I
(
y
i
≠
f
(
x
i
)
)
+
∑
i
=
1
N
e
x
p
(
−
y
i
f
(
x
i
)
)
I
(
y
i
=
f
(
x
i
)
)
=
∑
i
=
1
N
e
I
(
y
i
≠
f
(
x
i
)
)
+
∑
i
=
1
N
e
−
1
I
(
y
i
=
f
(
x
i
)
)
=
∑
i
=
1
N
e
I
(
y
i
≠
f
(
x
i
)
)
+
e
−
1
(
N
−
∑
i
=
1
N
I
(
y
i
≠
f
(
x
i
)
)
)
=
(
e
−
e
−
1
)
∑
i
=
1
N
I
(
y
i
≠
f
(
x
i
)
)
+
e
−
1
N
=
N
(
e
−
e
−
1
)
e
r
r
+
e
−
1
N
\begin{aligned} \sum_{i=1}^{N}L(y_{i},f(x_{i})) &=\sum_{i=1}^{N}exp(-y_{i}f(x_{i})) \\ & =\sum_{i=1}^{N}exp(-y_{i}f(x_{i}))I(y_{i}\neq f(x_{i}))+\sum_{i=1}^{N}exp(-y_{i}f(x_{i}))I(y_{i}= f(x_{i}))\\ &=\sum_{i=1}^{N}eI(y_{i}\neq f(x_{i}))+\sum_{i=1}^{N}e^{-1}~I(y_{i}= f(x_{i}))\\ &=\sum_{i=1}^{N}eI(y_{i}\neq f(x_{i}))+e^{-1}(N-\sum_{i=1}^{N}I(y_{i}\neq f(x_{i})))\\ &=(e-e^{-1})\sum_{i=1}^{N}I(y_{i}\neq f(x_{i}))+e^{-1}N\\ &=N(e-e^{-1})err+e^{-1}N \end{aligned}
i=1∑NL(yi,f(xi))=i=1∑Nexp(−yif(xi))=i=1∑Nexp(−yif(xi))I(yi=f(xi))+i=1∑Nexp(−yif(xi))I(yi=f(xi))=i=1∑NeI(yi=f(xi))+i=1∑Ne−1 I(yi=f(xi))=i=1∑NeI(yi=f(xi))+e−1(N−i=1∑NI(yi=f(xi)))=(e−e−1)i=1∑NI(yi=f(xi))+e−1N=N(e−e−1)err+e−1N
假设,m-1次迭代得到
f
m
−
1
(
x
)
=
α
1
G
1
+
α
2
G
2
+
.
.
.
.
+
α
m
−
1
G
m
−
1
f_{m-1}(x)=\alpha_{1}G_{1}+\alpha_{2}G_{2}+....+\alpha_{m-1}G_{m-1}
fm−1(x)=α1G1+α2G2+....+αm−1Gm−1
第
m
m
m次迭代得到的
(
α
m
,
G
m
)
(\alpha_{m},G_{m})
(αm,Gm)由下式求得
(
α
m
,
G
m
)
=
a
r
g
m
i
n
α
,
G
∑
i
=
1
N
e
x
p
(
−
y
i
(
f
m
−
1
(
x
i
)
+
α
y
i
G
(
x
i
)
)
)
(\alpha_{m},G_{m})=argmin_{\alpha,G}\sum_{i=1}^{N}exp(-y_{i}(f_{m-1}(x_{i})+\alpha y_iG(x_{i})))
(αm,Gm)=argminα,Gi=1∑Nexp(−yi(fm−1(xi)+αyiG(xi)))
令
ω
ˉ
m
i
=
−
y
i
(
f
m
−
1
(
x
i
)
)
\bar{\omega}_{mi}=-y_{i}(f_{m-1}(x_{i}))
ωˉmi=−yi(fm−1(xi)),
ω
ˉ
m
i
\bar{\omega}_{mi}
ωˉmi与
(
α
,
G
)
(\alpha,G)
(α,G)无关,随着每轮迭代更新,于是有
(
α
m
,
G
m
)
=
a
r
g
m
i
n
α
,
G
∑
i
=
1
N
ω
ˉ
m
i
e
x
p
(
−
α
y
i
G
(
x
i
)
)
)
(
∗
)
(\alpha_{m},G_{m})=argmin_{\alpha,G}\sum_{i=1}^{N}\bar{\omega}_{mi}exp(-\alpha y_iG(x_{i})))~~~~~~(*)
(αm,Gm)=argminα,Gi=1∑Nωˉmiexp(−αyiG(xi))) (∗)
对任意
α
>
0
\alpha>0
α>0,有
∑
i
=
1
N
ω
ˉ
m
i
e
x
p
(
−
α
y
i
G
(
x
i
)
)
=
∑
i
=
1
N
ω
ˉ
m
i
e
α
I
(
y
i
≠
G
(
x
i
)
)
+
∑
i
=
1
N
ω
ˉ
m
i
e
−
α
I
(
y
i
=
G
(
x
i
)
)
=
∑
i
=
1
N
ω
ˉ
m
i
e
α
I
(
y
i
≠
G
(
x
i
)
)
+
e
−
α
(
∑
i
=
1
N
ω
ˉ
m
i
−
∑
i
=
1
N
ω
ˉ
m
i
I
(
y
i
≠
G
(
x
i
)
)
=
e
−
α
∑
i
=
1
N
ω
ˉ
m
i
+
(
e
α
−
e
−
α
)
∑
i
=
1
N
ω
ˉ
m
i
I
(
y
i
≠
G
(
x
i
)
)
(
∗
∗
)
\begin{aligned} \sum_{i=1}^{N}\bar{\omega}_{mi}exp(-\alpha y_iG(x_{i})) &=\sum_{i=1}^{N}\bar{\omega}_{mi}e^{\alpha}I(y_{i}\neq G(x_{i}))+\sum_{i=1}^{N}\bar{\omega}_{mi}e^{-\alpha}I(y_{i}=G(x_{i}))\\ &=\sum_{i=1}^{N}\bar{\omega}_{mi}e^{\alpha}I(y_{i}\neq G(x_{i}))+e^{-\alpha}(\sum_{i=1}^{N}\bar{\omega}_{mi}-\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))\\ &=e^{-\alpha}\sum_{i=1}^{N}\bar{\omega}_{mi}+(e^\alpha-e^{-\alpha})\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))~~~(**) \end{aligned}
i=1∑Nωˉmiexp(−αyiG(xi))=i=1∑NωˉmieαI(yi=G(xi))+i=1∑Nωˉmie−αI(yi=G(xi))=i=1∑NωˉmieαI(yi=G(xi))+e−α(i=1∑Nωˉmi−i=1∑NωˉmiI(yi=G(xi))=e−αi=1∑Nωˉmi+(eα−e−α)i=1∑NωˉmiI(yi=G(xi)) (∗∗)
由
(
e
α
−
e
−
α
)
>
0
(e^\alpha-e^{-\alpha})>0
(eα−e−α)>0,可知使
(
∗
)
( *)
(∗)式最小的
G
(
x
)
G(x)
G(x)为
G
m
=
a
r
g
m
i
n
G
∑
i
=
1
N
ω
ˉ
m
i
I
(
y
i
≠
G
(
x
i
)
)
G_{m}=arg~min_{G}\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))
Gm=arg minGi=1∑NωˉmiI(yi=G(xi))
再继续求
α
m
\alpha_{m}
αm,对
(
∗
∗
)
(**)
(∗∗)式关于
α
\alpha
α求导,并令其大于0,得到
α
>
1
2
l
o
g
1
−
e
r
r
m
e
r
r
m
\alpha>\frac{1}{2}log \frac{1-err_{m}}{err_{m}}
α>21logerrm1−errm
e
r
r
m
err_{m}
errm为
G
m
G_{m}
Gm在数据集上的加权误差率,等于
e
r
r
m
=
∑
i
=
1
N
ω
ˉ
m
i
I
(
y
i
≠
G
(
x
i
)
)
∑
i
=
1
N
ω
ˉ
m
i
=
∑
i
=
1
N
ω
m
i
I
(
y
i
≠
G
(
x
i
)
)
err_{m}=\frac{\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))}{\sum_{i=1}^{N}\bar{\omega}_{mi}}=\sum_{i=1}^{N}\omega_{mi}I(y_{i}\neq G(x_{i}))
errm=∑i=1Nωˉmi∑i=1NωˉmiI(yi=G(xi))=i=1∑NωmiI(yi=G(xi))
即当
α
m
=
1
2
l
o
g
1
−
e
r
r
m
e
r
r
m
\alpha_{m}=\frac{1}{2}log \frac{1-err_{m}}{err_{m}}
αm=21logerrm1−errm时,
(
∗
∗
)
(**)
(∗∗)取最小值。我们认弱分类器的误差率
e
r
r
m
err_{m}
errm小于1/2,即优于随机猜测的结果,故
α
m
>
0
\alpha_{m}>0
αm>0。
得到模型迭代结果
f
m
(
x
)
=
f
m
−
1
(
x
)
+
α
m
G
m
(
x
)
f_{m}(x)=f_{m-1}(x)+\alpha_{m}G_{m}(x)
fm(x)=fm−1(x)+αmGm(x)
简单总结一下整个算法的过程:
输入:训练数据集:
{
(
x
i
,
y
i
)
}
i
=
1
,
2
,
.
.
.
,
N
\{(x_{i},y_{i})\}_{i=1,2,...,N}
{(xi,yi)}i=1,2,...,N,
x
i
∈
χ
⊆
R
n
,
y
i
=
±
1
x_{i}\in \chi \subseteq R^{n},y_{i}=\pm1
xi∈χ⊆Rn,yi=±1
~~~~~~~~~~
损失函数:
L
(
y
,
f
(
x
)
)
=
e
x
p
(
−
y
f
(
x
)
)
L(y,f(x))=exp(-yf(x))
L(y,f(x))=exp(−yf(x))
1.初始化
f
0
(
x
)
=
0
f_{0}(x)=0
f0(x)=0;
2.计算权值
ω
ˉ
m
i
=
e
x
p
(
−
y
i
f
m
−
1
(
x
i
)
)
,
m
≥
1
,
i
=
1
,
.
.
.
,
N
\bar{\omega}_{mi}=exp(-y_{i}f_{m-1}(x_{i})),m\ge1,i=1,...,N
ωˉmi=exp(−yifm−1(xi)),m≥1,i=1,...,N;
3.求基分类器
G
m
=
a
r
g
m
i
n
G
∑
i
=
1
N
ω
ˉ
m
i
I
(
y
i
≠
G
(
x
i
)
)
G_{m}=arg~min_{G}\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))
Gm=arg minG∑i=1NωˉmiI(yi=G(xi));
4.求分类误差率:
e
r
r
m
=
∑
i
=
1
N
ω
ˉ
m
i
I
(
y
i
≠
G
(
x
i
)
)
∑
i
=
1
N
ω
ˉ
m
i
err_{m}=\frac{\sum_{i=1}^{N}\bar{\omega}_{mi}I(y_{i}\neq G(x_{i}))}{\sum_{i=1}^{N}\bar{\omega}_{mi}}
errm=∑i=1Nωˉmi∑i=1NωˉmiI(yi=G(xi));
5.求分类器系数:
α
m
=
1
2
l
o
g
1
−
e
r
r
m
e
r
r
m
\alpha_{m}=\frac{1}{2}log \frac{1-err_{m}}{err_{m}}
αm=21logerrm1−errm;
6.得到AdaBoost模型:
f
m
(
x
)
=
f
m
−
1
(
x
)
+
α
m
G
m
(
x
)
f_{m}(x)=f_{m-1}(x)+\alpha_{m}G_{m}(x)
fm(x)=fm−1(x)+αmGm(x)
令
ω
m
+
1
i
=
w
m
i
Z
m
e
x
p
(
−
α
m
y
i
G
m
(
x
i
)
)
,
i
=
1
,
.
.
.
,
N
\omega_{m+1i}=\frac{w_{mi}}{Z_{m}}exp(-\alpha_{m}y_{i}G_{m}(x_{i})),i=1,...,N
ωm+1i=Zmwmiexp(−αmyiGm(xi)),i=1,...,N
ω
1
i
=
1
N
,
,
i
=
1
,
.
.
.
,
N
\omega_{1i}=\frac{1}{N},,i=1,...,N
ω1i=N1,,i=1,...,N
其中
Z
m
Z_{m}
Zm是规范化因子
Z
m
=
∑
i
=
1
N
ω
m
i
e
x
p
(
−
α
m
y
i
G
m
(
x
i
)
)
Z_{m}=\sum_{i=1}^{N}\omega_{mi}exp(-\alpha_{m}y_{i}G_{m}(x_{i}))
Zm=i=1∑Nωmiexp(−αmyiGm(xi))
则对任意
m
≥
1
m\ge1
m≥1,有
ω
m
i
=
ω
ˉ
m
i
∑
i
=
1
N
ω
ˉ
m
i
\omega_{mi}=\frac{\bar{\omega}_{mi}}{\sum_{i=1}^{N}\bar{\omega}_{mi}}
ωmi=∑i=1Nωˉmiωˉmi
即 ω m i \omega_{mi} ωmi和 ω ˉ m i \bar{\omega}_{mi} ωˉmi之间只相差一个规范化因子
ω
ˉ
m
i
∑
i
=
1
N
ω
ˉ
m
i
=
1
N
Π
k
=
1
m
−
1
e
x
p
(
−
y
i
α
k
G
k
(
x
i
)
)
1
N
∑
i
=
1
N
Π
k
=
1
m
−
1
e
x
p
(
−
y
i
α
k
G
k
(
x
i
)
)
(
1
)
=
1
N
Π
k
=
1
m
−
1
e
x
p
(
−
y
i
α
k
G
k
(
x
i
)
)
∑
i
=
1
N
ω
2
i
Z
1
Π
k
=
2
m
−
1
e
x
p
(
−
y
i
α
k
G
k
(
x
i
)
)
(
2
)
=
1
N
Π
k
=
1
m
−
1
e
x
p
(
−
y
i
α
k
G
k
(
x
i
)
)
∑
i
=
1
N
ω
3
i
Z
1
Z
2
Π
k
=
3
m
−
1
e
x
p
(
−
y
i
α
k
G
k
(
x
i
)
)
(
3
)
=
.
.
.
.
.
.
(
4
)
=
1
N
Π
k
=
1
m
−
1
e
x
p
(
−
y
i
α
k
G
k
(
x
i
)
)
Z
1
Z
2
.
.
.
Z
m
−
1
(
5
)
=
ω
2
i
Π
k
=
2
m
−
1
e
x
p
(
−
y
i
α
k
G
k
(
x
i
)
)
Z
2
.
.
.
Z
m
−
1
=
ω
3
i
Π
k
=
3
m
−
1
e
x
p
(
−
y
i
α
k
G
k
(
x
i
)
)
Z
3
.
.
.
Z
m
−
1
=
.
.
.
=
ω
m
−
1
i
e
x
p
(
−
y
i
α
m
−
1
G
m
−
1
(
x
i
)
)
Z
m
−
1
=
ω
m
i
\begin{aligned} \frac{\bar{\omega}_{mi}}{\sum_{i=1}^{N}\bar{\omega}_{mi}} & =\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{\frac{1}{N}\sum_{i=1}^{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}~~~~~~~~~~~~~~~~~~(1)\\ &=\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{\sum_{i=1}^{N}\omega_{2i}Z_{1}\Pi_{k=2}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}~~~~~~~~~~~~(2)\\ &=\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{\sum_{i=1}^{N}\omega_{3i}Z_{1}Z_{2}\Pi_{k=3}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}~~~~~~~~(3)\\ &=......~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~(4)\\ &=\frac{\frac{1}{N}\Pi_{k=1}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{Z_{1}Z_{2}...Z_{m-1}}~~~~~~~~~~~~~~~~~~~~~~~~~~~(5)\\ &=\frac{\omega_{2i}\Pi_{k=2}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{Z_{2}...Z_{m-1}}\\ &=\frac{\omega_{3i}\Pi_{k=3}^{m-1}exp(-y_{i}\alpha_{k}G_{k}(x_{i}))}{Z_{3}...Z_{m-1}}\\ &=...\\ &=\frac{\omega_{m-1i}exp(-y_{i}\alpha_{m-1}G_{m-1}(x_{i}))}{Z_{m-1}}\\ &=\omega_{mi} \end{aligned}
∑i=1Nωˉmiωˉmi=N1∑i=1NΠk=1m−1exp(−yiαkGk(xi))N1Πk=1m−1exp(−yiαkGk(xi)) (1)=∑i=1Nω2iZ1Πk=2m−1exp(−yiαkGk(xi))N1Πk=1m−1exp(−yiαkGk(xi)) (2)=∑i=1Nω3iZ1Z2Πk=3m−1exp(−yiαkGk(xi))N1Πk=1m−1exp(−yiαkGk(xi)) (3)=...... (4)=Z1Z2...Zm−1N1Πk=1m−1exp(−yiαkGk(xi)) (5)=Z2...Zm−1ω2iΠk=2m−1exp(−yiαkGk(xi))=Z3...Zm−1ω3iΠk=3m−1exp(−yiαkGk(xi))=...=Zm−1ωm−1iexp(−yiαm−1Gm−1(xi))=ωmi
即
ω
ˉ
m
i
\bar{\omega}_{mi}
ωˉmi和
ω
m
i
\omega_{mi}
ωmi之间只相差一个规范化因子,对求
G
m
G_{m}
Gm无影响,故算法可以变为
输入:训练数据集:
{
(
x
i
,
y
i
)
}
i
=
1
,
2
,
.
.
.
,
N
\{(x_{i},y_{i})\}_{i=1,2,...,N}
{(xi,yi)}i=1,2,...,N,
x
i
∈
χ
⊆
R
n
,
y
i
=
±
1
x_{i}\in \chi \subseteq R^{n},y_{i}=\pm1
xi∈χ⊆Rn,yi=±1
~~~~~~~~~~
损失函数:
L
(
y
,
f
(
x
)
)
=
e
x
p
(
−
y
f
(
x
)
)
L(y,f(x))=exp(-yf(x))
L(y,f(x))=exp(−yf(x))
1.初始化
f
0
(
x
)
=
0
f_{0}(x)=0
f0(x)=0;
2.计算权值
ω
m
i
=
ω
m
−
1
,
i
e
x
p
(
α
m
−
1
y
i
G
m
−
1
(
x
i
)
)
Z
m
−
1
,
m
≥
1
,
i
=
1
,
.
.
.
,
N
\omega_{mi}=\frac{\omega_{m-1,i}exp(\alpha_{m-1}y_{i}G_{m-1}(x_{i}))}{Z_{m-1}},m\ge1,i=1,...,N
ωmi=Zm−1ωm−1,iexp(αm−1yiGm−1(xi)),m≥1,i=1,...,N;
3.求基分类器
G
m
=
a
r
g
m
i
n
G
∑
i
=
1
N
ω
m
i
I
(
y
i
≠
G
(
x
i
)
)
G_{m}=arg~min_{G}\sum_{i=1}^{N}\omega_{mi}I(y_{i}\neq G(x_{i}))
Gm=arg minG∑i=1NωmiI(yi=G(xi));
4.求分类误差率:
e
r
r
m
=
∑
i
=
1
N
ω
m
i
I
(
y
i
≠
G
(
x
i
)
)
err_{m}=\sum_{i=1}^{N}\omega_{mi}I(y_{i}\neq G(x_{i}))
errm=∑i=1NωmiI(yi=G(xi));
5.求分类器系数:
α
m
=
1
2
l
o
g
1
−
e
r
r
m
e
r
r
m
\alpha_{m}=\frac{1}{2}log \frac{1-err_{m}}{err_{m}}
αm=21logerrm1−errm;
6.得到AdaBoost模型:
f
m
(
x
)
=
f
m
−
1
(
x
)
+
α
m
G
m
(
x
)
f_{m}(x)=f_{m-1}(x)+\alpha_{m}G_{m}(x)
fm(x)=fm−1(x)+αmGm(x)
误差分析
AdaBoost的分类误差:
1
N
∑
i
=
1
N
I
(
f
(
x
i
)
≠
y
i
)
≤
e
x
p
(
−
2
∑
m
=
1
M
γ
m
2
)
\frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le exp(-2\sum_{m=1}^M\gamma_{m}^2)
N1i=1∑NI(f(xi)=yi)≤exp(−2m=1∑Mγm2)
其中
γ
m
=
1
2
−
e
m
,
f
(
x
)
=
∑
k
=
1
m
α
k
G
k
(
x
)
\gamma_{m}=\frac{1}{2}-e_{m},f(x)=\sum_{k=1}^{m}\alpha_{k}G_{k}(x)
γm=21−em,f(x)=∑k=1mαkGk(x)
证明:
首先证明
1
N
∑
i
=
1
N
I
(
f
(
x
i
)
≠
y
i
)
≤
Π
k
=
1
m
Z
k
\frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le \Pi_{k=1}^{m}Z_{k}
N1∑i=1NI(f(xi)=yi)≤Πk=1mZk
当
y
i
=
f
(
x
i
)
y_{i}=f(x_{i})
yi=f(xi)时
I
(
f
(
x
i
)
≠
y
i
)
=
0
≤
e
x
p
(
−
y
i
f
(
x
i
)
)
I(f(x_{i})\neq y_{i})=0\le exp(-y_{i}f(x_{i}))
I(f(xi)=yi)=0≤exp(−yif(xi))
当
y
i
≠
f
(
x
i
)
y_{i}\neq f(x_{i})
yi=f(xi)时
I
(
f
(
x
i
)
≠
y
i
)
=
e
x
p
(
−
y
i
f
(
x
i
)
)
=
1
I(f(x_{i})\neq y_{i})= exp(-y_{i}f(x_{i}))=1
I(f(xi)=yi)=exp(−yif(xi))=1
故
1
N
∑
i
=
1
N
I
(
f
(
x
i
)
≠
y
i
)
≤
1
N
∑
i
=
1
N
e
x
p
(
−
y
i
f
(
x
i
)
)
\frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le\frac{1}{N}\sum_{i=1}^{N}exp(-y_{i}f(x_{i}))
N1i=1∑NI(f(xi)=yi)≤N1i=1∑Nexp(−yif(xi))
又由式
(
1
)
(
2
)
(
3
)
(
4
)
(
5
)
(1)(2)(3)(4)(5)
(1)(2)(3)(4)(5)可知
1
N
∑
i
=
1
N
e
x
p
(
−
y
i
f
(
x
i
)
)
=
Π
k
=
1
m
Z
k
\frac{1}{N}\sum_{i=1}^{N}exp(-y_{i}f(x_{i}))=\Pi_{k=1}^{m}Z_{k}
N1i=1∑Nexp(−yif(xi))=Πk=1mZk
其次证明:
Π
k
=
1
m
Z
k
≤
e
x
p
(
−
2
∑
k
=
1
m
γ
k
2
)
\Pi_{k=1}^{m}Z_{k}\le exp(-2\sum_{k=1}^{m}\gamma_{k}^2)
Πk=1mZk≤exp(−2∑k=1mγk2)
∀
k
,
1
≤
k
≤
m
\forall k,1\le k\le m
∀k,1≤k≤m,有
Z
k
=
∑
j
=
1
k
ω
k
i
e
x
p
(
−
α
k
y
i
G
k
(
x
i
)
)
=
∑
G
k
(
x
i
)
=
y
i
ω
k
i
e
−
α
k
+
∑
G
k
(
x
i
)
≠
y
i
ω
k
i
e
α
k
=
(
1
−
e
k
)
e
−
α
k
+
e
k
e
α
k
=
2
(
e
k
(
1
−
e
k
)
)
1
2
=
(
1
−
4
γ
k
2
)
1
2
\begin{aligned} Z_{k} & =\sum_{j=1}^{k}\omega_{ki}exp(-\alpha_{k}y_{i}G_{k}(x_{i}))\\ &=\sum_{G_{k}(x_{i})=y_{i}}\omega_{ki}e^{-\alpha_{k}}+\sum_{G_{k}(x_{i})\neq y_{i}}\omega_{ki}e^{\alpha_{k}}\\ &=(1-e_{k})e^{-\alpha_{k}}+e_{k}e^{\alpha_{k}}\\ &=2(e_{k}(1-e_{k}))^{\frac{1}{2}}\\ &=(1-4\gamma_{k}^2)^{\frac{1}{2}} \end{aligned}
Zk=j=1∑kωkiexp(−αkyiGk(xi))=Gk(xi)=yi∑ωkie−αk+Gk(xi)=yi∑ωkieαk=(1−ek)e−αk+ekeαk=2(ek(1−ek))21=(1−4γk2)21
于是
Π
k
=
1
m
Z
k
=
Π
k
=
1
m
(
1
−
4
γ
k
2
)
1
2
\Pi_{k=1}^{m}Z_{k}=\Pi_{k=1}^{m}(1-4\gamma_{k}^2)^{\frac{1}{2}}
Πk=1mZk=Πk=1m(1−4γk2)21
对
∀
k
,
1
≤
k
≤
m
\forall k,1\le k\le m
∀k,1≤k≤m,只需证明
(
1
−
4
γ
k
2
)
1
2
≤
e
x
p
(
−
2
γ
k
2
)
(
∗
)
(1-4\gamma_{k}^2)^{\frac{1}{2}}\le exp(-2\gamma_{k}^{2})~~~~~~~~~~~(*)
(1−4γk2)21≤exp(−2γk2) (∗)
即有
Π
k
=
1
m
Z
k
=
Π
k
=
1
m
(
1
−
4
γ
k
2
)
1
2
≤
e
x
p
(
−
2
∑
k
=
1
m
γ
k
2
)
\Pi_{k=1}^{m}Z_{k}=\Pi_{k=1}^{m}(1-4\gamma_{k}^2)^{\frac{1}{2}}\le exp(-2\sum_{k=1}^{m}\gamma_{k}^2)
Πk=1mZk=Πk=1m(1−4γk2)21≤exp(−2k=1∑mγk2)
对
(
∗
)
(*)
(∗)式两边加对数,得
1
2
l
n
(
1
−
4
γ
k
2
)
≤
−
2
γ
k
2
\frac{1}{2}ln(1-4\gamma_{k}^2)\le -2\gamma_{k}^{2}
21ln(1−4γk2)≤−2γk2
对
l
n
(
1
−
4
γ
k
2
)
ln(1-4\gamma_{k}^2)
ln(1−4γk2)进行泰勒展开,得
l
n
(
1
−
4
γ
k
2
)
=
−
4
γ
k
2
−
(
−
4
γ
k
2
)
2
2
+
(
−
4
γ
k
2
)
3
3
−
.
.
.
−
4
γ
k
2
n
−
.
.
.
ln(1-4\gamma_{k}^2)=-4\gamma_{k}^2-\frac{(-4\gamma_{k}^2)^2}{2}+\frac{(-4\gamma_{k}^2)^3}{3}-...-\frac{4\gamma_{k}^2}{n}-...
ln(1−4γk2)=−4γk2−2(−4γk2)2+3(−4γk2)3−...−n4γk2−...
故有
1
2
l
n
(
1
−
4
γ
k
2
)
≤
−
2
γ
k
2
\frac{1}{2}ln(1-4\gamma_{k}^2)\le -2\gamma_{k}^{2}
21ln(1−4γk2)≤−2γk2
证毕。
注意:
当存在
γ
>
0
\gamma >0
γ>0,使得对任意
k
k
k,均满足
γ
k
≥
γ
\gamma_{k}\ge \gamma
γk≥γ时,即对所有基分类器的误差率都小于某个小于
1
2
\frac{1}{2}
21时,有
1
N
∑
i
=
1
N
I
(
f
(
x
i
)
≠
y
i
)
≤
e
x
p
(
−
2
m
γ
2
)
\frac{1}{N}\sum_{i=1}^{N}I(f(x_{i})\neq y_{i})\le exp(-2m\gamma^{2})
N1i=1∑NI(f(xi)=yi)≤exp(−2mγ2)
即AdaBoost算法的误差率随者迭代次数的增加以指数的形式下降。
正则化
为了防止过拟合,在AdaBoost算法弱分类器的迭代过程中加入正则化项
v
v
v,即
f
m
(
x
)
=
f
m
−
1
(
x
)
+
v
α
m
G
m
(
x
)
f_{m}(x)=f_{m-1}(x)+v\alpha_{m}G_{m}(x)
fm(x)=fm−1(x)+vαmGm(x)
其中
v
v
v称为步长,
v
v
v越小,所需要的迭代次数越高,通常用步长和迭代的最高次数决定算法的拟合效果。
优缺点
优点:
1)分类正确率高;
2)不易发生过拟合;
3)基于AdaBoost的框架,可以利用各种分类回归学习器构建弱学习器,通常使用决策树和神经网络。
缺点:
1)对噪声敏感,对于异样的样本可能会赋予很高的权重。