参考资料:https://www.cnblogs.com/pinard/p/6133937.html
该资料比较详细,但是在推导部分有些难以理解。比如如何求得新的弱学习器的权重,让我有些费解,该部分在我参考了西瓜书后,在本博客我又做了补充说明。如有错误和有更好的解释方法,就恳请大家指正、提出。
集成学习
集成学习(ensemble learning)通过构建并结合多个学习器来完成学习任务。
需要解决的问题有:
- 如何构建这若干个学习器
- 如何选择多个学习器的结合策略
boosting
boosting算法的基本思想是多个弱学习器的迭代学习,当前弱学习器的训练依赖于一个弱学习器的训练结果。在前一轮弱学习器的学习中,被错分的样本将会提高权重,从而在本轮弱学习器的训练中被重视。
AdaBoost
AdaBoost是boosting最著名的算法之一。
算法过程
现假设目标为解决二分类问题。
假设训练集样本是:
T
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
(
x
m
,
y
m
)
}
,
y
∈
{
−
1
,
1
}
\mathbf T = \{(x_1,y_1), (x_2, y_2), ...(x_m, y_m)\}, \mathbf y \in\{-1, 1\}
T={(x1,y1),(x2,y2),...(xm,ym)},y∈{−1,1}
训练集在第
k
k
k 个弱学习器的输出权重为:
w
k
=
(
w
k
1
,
w
k
2
,
.
.
.
w
k
m
)
,
w
l
i
=
1
m
,
i
=
1
,
2
…
m
\mathbf w_{k} = (w_{k1},w_{k2},...w_{km}), w_{li} = \frac{1}{m}, i = 1,2 \dots m
wk=(wk1,wk2,...wkm),wli=m1,i=1,2…m
AdaBoost算法有以下任务:
- 第
k
k
k 个弱分类器
G
k
(
x
)
G_k(x)
Gk(x) 在训练集上的加权误差率
e
k
e_k
ek 为:
e k = ∑ i = 1 m w k i I ( G k ( x i ) ≠ y i ) = ∑ i = 1 m w k i e k i (1) e_k = \sum_{i=1}^{m} w_{ki} \mathbb I(G_k(x_i) \neq y_i) = \sum_{i=1}^{m} w_{ki} e_{ki} \tag{1} ek=i=1∑mwkiI(Gk(xi)=yi)=i=1∑mwkieki(1)
相对误差
e
k
e_k
ek:
e
k
i
=
∣
y
i
−
G
k
(
x
i
)
∣
E
k
(2)
e_{ki} =\frac{|y_i - G_k(x_i)|}{E_k} \tag{2}
eki=Ek∣yi−Gk(xi)∣(2)
对于第
k
k
k 个弱学习器,计算他在训练集上的最大误差
E
k
E_k
Ek :
E
k
=
max
∣
y
i
−
G
k
(
x
i
)
∣
(3)
E_k=\max |y_i−G_k(x_i)| \tag{3}
Ek=max∣yi−Gk(xi)∣(3)
另外有指数误差:
e
k
i
=
1
−
exp
(
−
∣
y
i
−
G
k
(
x
i
)
∣
E
k
)
e_{ki} = 1-\exp(\frac{-|y_i - G_k(x_i)|}{E_k})
eki=1−exp(Ek−∣yi−Gk(xi)∣)
-
第 k k k 个弱分类器 G k ( x ) G_k(x) Gk(x) 的权重系数 a k a_k ak 为:
a k = 1 2 log 1 − e k e k (4) a_k = \frac{1}{2}\log {\frac{1-e_k}{e_k}} \tag{4} ak=21logek1−ek(4) -
第 k + 1 k+1 k+1 个弱分类器的样本集权重系数为:
w k + 1 , i = w k i Z k exp ( − a k y i G k ( x i ) ) , Z k = ∑ i = 1 m w k i exp ( − a k y i G k ( x i ) ) (5) w_{k+1, i} = \frac{w_{ki}}{Z_k} \exp (-a_k y_i G_k(x_i)), Z_k = \sum_{i=1}^{m} w_{ki} \exp (-a_k y_i G_k(x_i)) \tag{5} wk+1,i=Zkwkiexp(−akyiGk(xi)),Zk=i=1∑mwkiexp(−akyiGk(xi))(5) -
最终强学习器为:
f ( x ) = s i g n ( ∑ k = 1 K a k G k ( x ) ) (6) f(x) = \mathrm{sign} (\sum_{k=1}^{K}a_kG_k(x)) \tag{6} f(x)=sign(k=1∑KakGk(x))(6)
下面就是最难懂的部分:
过程推导1
定义AdaBoost的损失函数为指数函数:
(
a
k
,
G
k
)
=
arg min
a
k
,
G
k
∑
i
=
1
m
exp
(
−
y
i
f
k
(
x
i
)
)
(a_k,G_k) = \argmin_{a_k,G_k} \sum_{i=1}^{m} \exp( -y_i f_k(x_i))
(ak,Gk)=ak,Gkargmini=1∑mexp(−yifk(xi))
因为已经训练好了前
k
−
1
k-1
k−1 个弱学习器,所以上式可化为:
(
a
k
,
G
k
)
=
arg min
a
k
,
G
k
∑
i
=
1
m
exp
(
−
y
i
(
f
k
−
1
(
x
i
)
+
a
k
G
k
(
x
i
)
)
)
(7)
(a_k,G_k) = \argmin_{a_k,G_k} \sum_{i=1}^{m} \exp( -y_i (f_{k-1}(x_i) + a_k G_k(x_i))) \tag{7}
(ak,Gk)=ak,Gkargmini=1∑mexp(−yi(fk−1(xi)+akGk(xi)))(7)
那么现在令
w
k
i
′
=
exp
(
−
y
i
f
k
−
1
(
x
i
)
)
w_{ki}' = \exp(-y_i f_{k-1}(x_i))
wki′=exp(−yifk−1(xi)),
w
k
i
′
w_{ki}'
wki′ 是未作规范化处理的样本权重。
(
a
k
,
G
k
)
=
arg min
a
k
,
G
k
∑
i
=
1
m
w
k
i
′
exp
(
−
y
i
a
k
G
k
(
x
i
)
)
(8*)
(a_k,G_k)= \argmin_{a_k,G_k} \sum_{i=1}^{m} {w_{ki}'} \exp(-y_i a_k G_k(x_i)) \tag{8*}
(ak,Gk)=ak,Gkargmini=1∑mwki′exp(−yiakGk(xi))(8*)
∑ i = 1 m w k i ′ exp ( − y i a k G k ( x i ) ) = ∑ G k ( x i ) = y i w k i ′ e − a k + ∑ G k ( x i ) ≠ y i w k i e a k = e − a k ∑ i = 1 m w k i ′ I ( y i = G k ( x i ) ) + e a k ∑ i = 1 m w k i ′ I ( y i ≠ G k ( x i ) ) = e − a k ∑ i = 1 m w k i ′ + ( e a k − e − a k ) ∑ i = 1 m w k i ′ I ( y i ≠ G k ( x i ) ) (9*) \begin{aligned} \sum_{i=1}^{m} {w_{ki}'} \exp(-y_i a_k G_k(x_i))&= \sum_{G_k(x_i) = y_i} {w_{ki}'} e^{-a_k} + \sum_{G_k(x_i) \neq y_i} {w_{ki}}e^{a_k} \\ & = e^{-a_k} \sum_{i=1}^m w_{ki}' \mathbb I(y_i = G_k(x_i)) + e^{a_k} \sum_{i=1}^m w_{ki}' \mathbb I(y_i \neq G_k(x_i)) \\ &= e^{-a_k} \sum_{i=1}^m w_{ki}' + (e^{a_k}- e^{-a_k}) \sum_{i=1}^m w_{ki}' \mathbb I(y_i \neq G_k(x_i))\\ \tag{9*} \end{aligned} i=1∑mwki′exp(−yiakGk(xi))=Gk(xi)=yi∑wki′e−ak+Gk(xi)=yi∑wkieak=e−aki=1∑mwki′I(yi=Gk(xi))+eaki=1∑mwki′I(yi=Gk(xi))=e−aki=1∑mwki′+(eak−e−ak)i=1∑mwki′I(yi=Gk(xi))(9*)
根据 (8)式,假设
a
k
a_k
ak 已知,那么对于找
G
k
G_k
Gk 其实就是找在现有的样本的新权重下的最优弱学习器:
G
k
(
x
)
=
arg min
G
k
∑
i
=
1
m
w
k
i
′
I
(
G
k
(
x
i
)
≠
y
i
)
(10*)
G_k(x) = \argmin_{G_k} \sum_{i=1}^{m} w_{ki}' \mathbb I(G_k(x_i) \neq y_i) \tag{10*}
Gk(x)=Gkargmini=1∑mwki′I(Gk(xi)=yi)(10*)
对 (9*)式 求
a
k
a_k
ak 的偏导:
∂
(
e
−
a
k
∑
i
=
1
m
w
k
i
′
+
(
e
a
k
−
e
−
a
k
)
∑
i
=
1
m
w
k
i
′
I
(
y
i
≠
G
k
(
x
i
)
)
∂
a
k
=
−
e
−
a
k
∑
i
=
1
m
w
k
i
′
+
(
e
a
k
+
e
−
a
k
)
∑
i
=
1
m
w
k
i
′
I
(
y
i
≠
G
k
(
x
i
)
)
(11*)
\frac {\partial (e^{-a_k} \sum_{i=1}^m w_{ki}' + (e^{a_k}- e^{-a_k}) \sum_{i=1}^m w_{ki}' \mathbb I(y_i \neq G_k(x_i))}{\partial {a_k}} = \\ -e^{-a_k} \sum_{i=1}^m w_{ki}' + (e^{a_k} + e^{-a_k}) \sum_{i=1}^m w_{ki}' \mathbb I(y_i \neq G_k(x_i)) \tag{11*}
∂ak∂(e−ak∑i=1mwki′+(eak−e−ak)∑i=1mwki′I(yi=Gk(xi))=−e−aki=1∑mwki′+(eak+e−ak)i=1∑mwki′I(yi=Gk(xi))(11*)
此时 视
w
k
i
′
w_{ki}'
wki′为规范化后的权重,又有
e
k
e_k
ek 加权误差率(见 (1)式),令 (11*)式 偏导等于0,两边同时取指数,可得(4)式
a
k
a_k
ak:
−
e
−
a
k
∑
i
=
1
m
w
k
i
′
+
(
e
a
k
+
e
−
a
k
)
∑
i
=
1
m
w
k
i
′
I
(
y
i
≠
G
k
(
x
i
)
)
=
0
(
e
a
k
+
e
−
a
k
)
e
k
=
e
−
a
k
e
a
k
e
k
=
e
−
a
k
(
1
−
e
k
)
a
k
+
ln
e
k
=
−
a
k
+
ln
(
e
k
−
1
)
-e^{-a_k} \sum_{i=1}^m w_{ki}' + (e^{a_k} + e^{-a_k}) \sum_{i=1}^m w_{ki}' \mathbb I(y_i \neq G_k(x_i)) = 0\\ (e^{a_k} + e^{-a_k})e_k = e^{-a_k} \\ e^{a_k} e_k = e^{-a_k}(1 - e_k) \\ a_k + \ln e_k = -a_k + \ln(e_k-1)
−e−aki=1∑mwki′+(eak+e−ak)i=1∑mwki′I(yi=Gk(xi))=0(eak+e−ak)ek=e−akeakek=e−ak(1−ek)ak+lnek=−ak+ln(ek−1)
因为:
f
k
(
x
i
)
=
−
y
i
(
f
k
−
1
(
x
i
)
+
a
k
G
k
(
x
i
)
)
f_k(x_i) = -y_i (f_{k-1}(x_i) + a_k G_k(x_i))
fk(xi)=−yi(fk−1(xi)+akGk(xi))
w
k
i
=
exp
(
−
y
i
f
k
−
1
(
x
i
)
)
w_{ki} = \exp(-y_i f_{k-1}(x_i))
wki=exp(−yifk−1(xi))
所以,对于下一轮
k
+
1
k+1
k+1 轮的样本权重
w
k
+
1
\mathbf w_{k+1}
wk+1
w
k
+
1
,
i
′
=
exp
(
−
y
i
(
f
k
−
1
(
x
i
)
+
a
k
G
k
(
x
i
)
)
)
=
w
k
i
′
exp
(
−
y
i
α
k
G
k
(
x
i
)
)
(12*)
\begin{aligned} w_{k+1,i}' &= \exp(-y_i (f_{k-1}(x_i) + a_kG_k(x_i)))\\ &=w_{ki}'\exp(−y_i α_k G_k(x_i)) \tag{12*} \end{aligned}
wk+1,i′=exp(−yi(fk−1(xi)+akGk(xi)))=wki′exp(−yiαkGk(xi))(12*)
最后,对
w
k
+
1
,
i
′
w_{k+1,i}'
wk+1,i′ 做规范化处理:
w
k
+
1
,
i
=
w
k
i
Z
k
exp
(
−
y
i
α
k
G
k
(
x
i
)
)
w_{k+1, i} = \frac{w_{ki}} {Z_k}{\exp(−y_i α_k G_k(x_i))}
wk+1,i=Zkwkiexp(−yiαkGk(xi))
以下是钻错的洞。不用太在意规范化因子的影响。
过程推导2
AdaBoost根据之前所有学习器的训练结果,生成第
k
k
k 个弱学习器来补充前
k
−
1
k-1
k−1 个弱学习器。用前
k
−
1
k-1
k−1 个弱学习器训练出的各个样本的损失作为样本权重并做规范化处理,作为第
k
k
k 个学习器的样本权重。
(理解为该样本损失越大,新的样本权重越大,并且需要保证所有样本的新权重之和为1)
那么现在令样本权重
w
k
i
=
exp
(
−
y
i
f
k
−
1
(
x
i
)
)
/
Z
k
−
1
w_{ki} = \exp(-y_i f_{k-1}(x_i))/{Z_{k-1}}
wki=exp(−yifk−1(xi))/Zk−1。容易知道
w
k
i
w_{ki}
wki 不依赖于
a
k
,
G
k
a_k, G_k
ak,Gk,只依赖于
f
k
−
1
(
x
)
f_{k-1}(x)
fk−1(x)。所以损失函数可化为:
(
a
k
,
G
k
)
=
arg min
a
k
,
G
k
∑
i
=
1
m
w
k
i
Z
k
−
1
exp
(
−
y
i
a
k
G
k
(
x
i
)
)
(a_k,G_k)= \argmin_{a_k,G_k} \sum_{i=1}^{m} {w_{ki}}{Z_{k-1}} \exp( -y_i a_k G_k(x_i))
(ak,Gk)=ak,Gkargmini=1∑mwkiZk−1exp(−yiakGk(xi))
Z
k
−
1
Z_{k-1}
Zk−1 固定。所以损失函数可再化为:
(
a
k
,
G
k
)
=
arg min
a
k
,
G
k
∑
i
=
1
m
w
k
i
exp
(
−
y
i
a
k
G
k
(
x
i
)
)
(8)
(a_k,G_k)= \argmin_{a_k,G_k} \sum_{i=1}^{m} {w_{ki}} \exp(-y_i a_k G_k(x_i)) \tag{8}
(ak,Gk)=ak,Gkargmini=1∑mwkiexp(−yiakGk(xi))(8)
∑ i = 1 m w k i exp ( − y i a k G k ( x i ) ) = ∑ G k ( x i ) = y i w k i e − a k + ∑ G k ( x i ) ≠ y i w k i e a k = e − a k ( 1 − e k ) + e a k e k (9) \begin{aligned} \sum_{i=1}^{m} {w_{ki}} \exp(-y_i a_k G_k(x_i))&= \sum_{G_k(x_i) = y_i} {w_{ki}} e^{-a_k} + \sum_{G_k(x_i) \neq y_i} {w_{ki}}e^{a_k}\\ & =e^{-a_k}(1-e_k) + e^{a_k} e_k \tag{9} \end{aligned} i=1∑mwkiexp(−yiakGk(xi))=Gk(xi)=yi∑wkie−ak+Gk(xi)=yi∑wkieak=e−ak(1−ek)+eakek(9)
e k e_k ek 已经是加权误差率(见 (1)式),所以可以直接得到 (9)式 最后的结果。
根据 (8)式,假设
a
k
a_k
ak 已知,那么对于找
G
k
G_k
Gk 其实就是找在现有的样本的新权重下的最优弱学习器:
G
k
(
x
)
=
arg min
G
k
∑
i
=
1
m
w
k
i
I
(
G
k
(
x
i
)
≠
y
i
)
(10)
G_k(x) = \argmin_{G_k} \sum_{i=1}^{m} w_{ki} \mathbb I(G_k(x_i) \neq y_i) \tag{10}
Gk(x)=Gkargmini=1∑mwkiI(Gk(xi)=yi)(10)
对 (9)式 求
a
k
a_k
ak 的偏导:
∂
(
e
−
a
(
1
−
e
k
)
+
e
a
e
k
)
∂
a
k
=
−
e
−
a
k
(
1
−
e
k
)
+
e
a
k
e
k
(11)
\frac {\partial (e^{-a}(1-e_k) + e^a e_k)}{\partial {a_k}} = -e^{-a_k}(1-e_k)+e^{a_k} e_k \tag{11}
∂ak∂(e−a(1−ek)+eaek)=−e−ak(1−ek)+eakek(11)
令 (11)式 偏导等于0,两边同时取指数,可得(4)式
a
k
a_k
ak
因为:
f
k
(
x
i
)
=
−
y
i
(
f
k
−
1
(
x
i
)
+
a
k
G
k
(
x
i
)
)
f_k(x_i) = -y_i (f_{k-1}(x_i) + a_k G_k(x_i))
fk(xi)=−yi(fk−1(xi)+akGk(xi))
w
k
i
=
exp
(
−
y
i
f
k
−
1
(
x
i
)
)
/
Z
k
−
1
w_{ki} = \exp(-y_i f_{k-1}(x_i)) / {Z_{k-1}}
wki=exp(−yifk−1(xi))/Zk−1
所以,对于下一轮
k
+
1
k+1
k+1 轮的样本权重
w
k
+
1
\mathbf w_{k+1}
wk+1
w
k
+
1
,
i
=
exp
(
−
y
i
(
f
k
−
1
(
x
i
)
+
a
k
G
k
(
x
i
)
)
)
Z
k
=
w
k
i
Z
k
−
1
Z
k
exp
(
−
y
i
α
k
G
k
(
x
i
)
)
(12)
\begin{aligned} w_{k+1,i} &= \frac{\exp(-y_i (f_{k-1}(x_i) + a_kG_k(x_i)))} {Z_k}\\ &=\frac {w_{ki} Z_{k-1}}{Z_k} \exp(−y_i α_k G_k(x_i)) \tag{12} \end{aligned}
wk+1,i=Zkexp(−yi(fk−1(xi)+akGk(xi)))=ZkwkiZk−1exp(−yiαkGk(xi))(12)
得 (5)式?