本课程来自深度之眼,部分截图来自课程视频以及李航老师的《统计学习方法》第二版。
公式输入请参考: 在线Latex公式
AdaBoost训练误差上界的证明
p160.AdaBoost算法最终分类器的训练误差界为:
1
N
∑
i
=
1
N
I
(
G
(
x
i
)
≠
y
i
)
≤
1
N
∑
i
exp
(
−
y
i
f
(
x
i
)
)
=
∏
m
Z
m
(1)
\cfrac{1}{N}\sum_{i=1}^NI(G(x_i)\ne y_i)\le \cfrac{1}{N}\sum_{i}\exp(-y_if(x_i))=\prod_mZ_m\tag1
N1i=1∑NI(G(xi)=yi)≤N1i∑exp(−yif(xi))=m∏Zm(1)
其中
G
(
x
i
)
=
s
i
g
n
[
f
(
x
)
]
=
s
i
g
n
[
∑
m
=
1
M
α
m
G
m
(
x
)
]
G(x_i)=sign[f(x)]=sign\left[\sum_{m=1}^M\alpha_mG_m(x)\right]
G(xi)=sign[f(x)]=sign[m=1∑MαmGm(x)]
正确分类的点可以写为:
I
(
G
(
x
i
)
≠
y
i
)
=
0
≤
exp
(
−
y
i
f
(
x
i
)
)
=
e
−
1
I(G(x_i)\ne y_i)=0\le \exp(-y_if(x_i))=e^{-1}
I(G(xi)=yi)=0≤exp(−yif(xi))=e−1
错误分类的点可以写为:
I
(
G
(
x
i
)
≠
y
i
)
=
1
≤
exp
(
−
y
i
f
(
x
i
)
)
=
e
−
1
I(G(x_i)\ne y_i)=1\le \exp(-y_if(x_i))=e^{-1}
I(G(xi)=yi)=1≤exp(−yif(xi))=e−1
因此,公式1的前面部分就是不用证明了,成立。
Z
m
Z_m
Zm是归一化因子,在权值更新的公式里面出现过:
w
m
+
1
,
i
=
w
m
i
Z
m
exp
(
−
α
m
y
i
G
m
(
x
i
)
)
w_{m+1,i}=\cfrac{w_{mi}}{Z_m}\exp(-\alpha_my_iG_m(x_i))
wm+1,i=Zmwmiexp(−αmyiGm(xi))
Z
m
Z_m
Zm的定义为:
Z
m
=
∑
i
=
1
N
w
m
i
exp
(
−
α
m
y
i
G
m
(
x
i
)
)
Z_m=\sum_{i=1}^Nw_{mi}\exp(-\alpha_my_iG_m(x_i))
Zm=i=1∑Nwmiexp(−αmyiGm(xi))
将权值更新的两边同乘
Z
m
Z_m
Zm:
Z
m
w
m
+
1
,
i
=
w
m
i
exp
(
−
α
m
y
i
G
m
(
x
i
)
)
(2)
Z_mw_{m+1,i}=w_{mi}\exp(-\alpha_my_iG_m(x_i))\tag2
Zmwm+1,i=wmiexp(−αmyiGm(xi))(2)
有了这个关系,现在就是要证明公式1的后面部分:
1
N
∑
i
N
exp
(
−
y
i
f
(
x
i
)
)
=
∏
m
Z
m
(3)
\cfrac{1}{N}\sum_{i}^N\exp(-y_if(x_i))=\prod_mZ_m\tag3
N1i∑Nexp(−yif(xi))=m∏Zm(3)
公式3的左边把
f
(
x
i
)
=
∑
m
=
1
M
α
m
G
m
(
x
)
f(x_i)=\sum_{m=1}^M\alpha_mG_m(x)
f(xi)=∑m=1MαmGm(x)带入:
1
N
∑
i
=
1
N
exp
[
−
f
(
x
i
)
∑
m
=
1
M
α
m
y
i
G
m
(
x
)
]
=
∑
i
=
1
N
1
N
exp
[
−
∑
m
=
1
M
α
m
y
i
G
m
(
x
)
]
\cfrac{1}{N}\sum_{i=1}^N\exp[-f(x_i)\sum_{m=1}^M\alpha_my_iG_m(x)]\\ =\sum_{i=1}^N\cfrac{1}{N}\exp[-\sum_{m=1}^M\alpha_my_iG_m(x)]
N1i=1∑Nexp[−f(xi)m=1∑MαmyiGm(x)]=i=1∑NN1exp[−m=1∑MαmyiGm(x)]
上式的
1
N
\cfrac{1}{N}
N1相当于第一次为N个数据分配权值,每个数据的权值是
1
N
\cfrac{1}{N}
N1,因此
1
N
\cfrac{1}{N}
N1可以写为
w
1
,
i
w_{1,i}
w1,i,指数的求和放到前面去,变成连乘,上式就变成:
∑
i
=
1
N
w
1
,
i
∏
m
=
1
M
exp
[
−
α
m
y
i
G
m
(
x
)
]
\sum_{i=1}^Nw_{1,i}\prod_{m=1}^M\exp[-\alpha_my_iG_m(x)]
i=1∑Nw1,im=1∏Mexp[−αmyiGm(x)]
把
m
=
1
m=1
m=1的那项写出来:
∑
i
=
1
N
w
1
,
i
exp
[
−
α
1
y
i
G
1
(
x
)
]
∏
m
=
2
M
exp
[
−
α
m
y
i
G
m
(
x
)
]
(4)
\sum_{i=1}^Nw_{1,i}\exp[-\alpha_1y_iG_1(x)]\prod_{m=2}^M\exp[-\alpha_my_iG_m(x)]\tag4
i=1∑Nw1,iexp[−α1yiG1(x)]m=2∏Mexp[−αmyiGm(x)](4)
对于右边,用公式2来推:
Z
1
w
2
,
i
=
w
1
i
exp
(
−
α
1
y
i
G
1
(
x
i
)
)
(5)
Z_1w_{2,i}=w_{1i}\exp(-\alpha_1y_iG_1(x_i))\tag5
Z1w2,i=w1iexp(−α1yiG1(xi))(5)
把公式4中部分用5代替:
Z
1
∑
i
=
1
N
w
2
,
i
∏
m
=
2
M
exp
[
−
α
m
y
i
G
m
(
x
)
]
(6)
Z_1\sum_{i=1}^Nw_{2,i}\prod_{m=2}^M\exp[-\alpha_my_iG_m(x)]\tag6
Z1i=1∑Nw2,im=2∏Mexp[−αmyiGm(x)](6)
同样的,按照思路,公式6可以写成:
Z
1
Z
2
∑
i
=
1
N
w
2
,
i
∏
m
=
2
M
exp
[
−
α
m
y
i
G
m
(
x
)
]
=
Z
1
Z
2
⋯
Z
m
∑
i
=
1
N
w
m
+
1
,
i
Z_1Z_2\sum_{i=1}^Nw_{2,i}\prod_{m=2}^M\exp[-\alpha_my_iG_m(x)]\\ =Z_1Z_2\cdots Z_m\sum_{i=1}^Nw_{m+1,i}
Z1Z2i=1∑Nw2,im=2∏Mexp[−αmyiGm(x)]=Z1Z2⋯Zmi=1∑Nwm+1,i
由于
∑
i
=
1
N
w
m
+
1
,
i
\sum_{i=1}^Nw_{m+1,i}
∑i=1Nwm+1,i是概率分布,因此该项求和为1。
Z
1
Z
2
⋯
Z
m
=
∏
m
=
1
M
Z
m
Z_1Z_2\cdots Z_m=\prod_{m=1}^MZ_m
Z1Z2⋯Zm=m=1∏MZm
证明完毕。
AdaBoost二分类误差上界
先把要证明的东西推出来,根据
Z
m
Z_m
Zm的定义为:
Z
m
=
∑
i
=
1
N
w
m
i
exp
(
−
α
m
y
i
G
m
(
x
i
)
)
Z_m=\sum_{i=1}^Nw_{mi}\exp(-\alpha_my_iG_m(x_i))
Zm=i=1∑Nwmiexp(−αmyiGm(xi))
分开写:
Z
m
=
∑
G
(
x
i
)
≠
y
i
w
m
i
exp
(
α
m
)
+
∑
G
(
x
i
)
=
y
i
w
m
i
exp
(
−
α
m
)
Z_m=\sum_{G(x_i)\ne y_i}w_{mi}\exp(\alpha_m)+\sum_{G(x_i)= y_i}w_{mi}\exp(-\alpha_m)
Zm=G(xi)=yi∑wmiexp(αm)+G(xi)=yi∑wmiexp(−αm)
由于
e
m
=
∑
G
(
x
i
)
≠
y
i
w
m
i
e_m=\sum_{G(x_i)\ne y_i}w_{mi}
em=∑G(xi)=yiwmi,上式变成:
(
1
−
e
m
)
e
−
α
m
+
e
m
e
α
m
(1-e_m)e^{-\alpha_m}+e_me^{\alpha_m}
(1−em)e−αm+emeαm
根据导论那节讲的:
α
m
=
1
2
log
1
−
e
m
e
m
\alpha_m = \cfrac{1}{2}\log\cfrac{1-e_m}{e_m}
αm=21logem1−em
带入上上式:
(
1
−
e
m
)
e
−
1
2
log
1
−
e
m
e
m
+
e
m
e
1
2
log
1
−
e
m
e
m
(1-e_m)e^{-\cfrac{1}{2}\log\cfrac{1-e_m}{e_m}}+e_me^{\cfrac{1}{2}\log\cfrac{1-e_m}{e_m}}
(1−em)e−21logem1−em+eme21logem1−em
又根据:
a
log
b
=
log
b
a
,
e
log
e
a
=
a
a\log b=\log b^a,e^{\log e^a}=a
alogb=logba,elogea=a
(
1
−
e
m
)
(
1
−
e
m
e
m
)
−
1
2
+
e
m
(
1
−
e
m
e
m
)
1
2
=
(
1
−
e
m
)
e
m
1
−
e
m
+
e
m
1
−
e
m
e
m
=
e
m
(
1
−
e
m
)
+
e
m
(
1
−
e
m
)
=
2
e
m
(
1
−
e
m
)
(1-e_m)\left(\cfrac{1-e_m}{e_m}\right)^{-\cfrac{1}{2}}+e_m\left(\cfrac{1-e_m}{e_m}\right)^{\cfrac{1}{2}}\\ =(1-e_m)\sqrt{\cfrac{e_m}{1-e_m}}+e_m\sqrt{\cfrac{1-e_m}{e_m}}\\ =\sqrt{e_m(1-e_m)}+\sqrt{e_m(1-e_m)}\\ =2\sqrt{e_m(1-e_m)}
(1−em)(em1−em)−21+em(em1−em)21=(1−em)1−emem+emem1−em=em(1−em)+em(1−em)=2em(1−em)
因此根据上节的结论:
∏
m
=
1
M
Z
m
=
∏
m
=
1
M
2
e
m
(
1
−
e
m
)
\prod_{m=1}^MZ_m=\prod_{m=1}^M2\sqrt{e_m(1-e_m)}
m=1∏MZm=m=1∏M2em(1−em)
这里令
γ
m
=
1
2
−
e
m
\gamma_m=\cfrac{1}{2}-e_m
γm=21−em
则上上式变成:
∏
m
=
1
M
2
1
−
4
γ
m
2
\prod_{m=1}^M2\sqrt{1-4\gamma_m^2}
m=1∏M21−4γm2
现在要证明上式有上界:
∏
m
=
1
M
2
1
−
4
γ
m
2
≤
exp
(
−
2
∑
m
=
1
M
γ
m
2
)
\prod_{m=1}^M2\sqrt{1-4\gamma_m^2}\le\exp(-2\sum_{m=1}^M\gamma_m^2)
m=1∏M21−4γm2≤exp(−2m=1∑Mγm2)
这里要用到泰勒级数展开,对于不等式左边:
f
(
x
)
=
1
−
x
=
(
1
−
x
)
1
2
f(x)=\sqrt{1-x}=(1-x)^{\cfrac{1}{2}}
f(x)=1−x=(1−x)21
展开后:
f
(
x
)
=
f
(
0
)
+
x
f
′
(
0
)
+
1
2
x
2
f
′
′
(
0
)
+
⋯
f(x)=f(0)+xf'(0)+\cfrac{1}{2}x^2f''(0)+\cdots
f(x)=f(0)+xf′(0)+21x2f′′(0)+⋯
取前面两阶展开做近似
f
(
x
)
≈
1
−
1
2
−
1
8
x
2
f(x)\approx 1-\cfrac{1}{2}-\cfrac{1}{8}x^2
f(x)≈1−21−81x2
相应的:
f
(
4
γ
2
)
=
1
−
4
γ
2
≈
1
−
2
γ
2
−
2
γ
4
f(4\gamma^2)=\sqrt{1-4\gamma^2}\approx 1-2\gamma^2-2\gamma^4
f(4γ2)=1−4γ2≈1−2γ2−2γ4
同理,不等式右边:
exp
(
1
−
2
γ
2
)
≈
1
−
2
γ
2
+
2
γ
4
\exp(1-2\gamma^2)\approx 1-2\gamma^2+2\gamma^4
exp(1−2γ2)≈1−2γ2+2γ4
可以看到右边大于等于左边,当
γ
=
0
\gamma=0
γ=0时,等号成立。