5.2 Boosting (Sequential) Learning
Boosting is under the PAC learning model. It involves incrementally building an ensemble by training each new model instance to emphasize the training instances that previous models mis-classified.
5.21 AdaBoost
AdaBoost = Weak Algorithm + Re-Weighting + Linear Aggregation
5.211 What is AdaBoost?
AdaBoost calls a given weak or base learning algorithm repeatedly in a series of rounds
t
=
1
,
2
,
.
.
.
,
T
t=1,2,...,T
t=1,2,...,T ,which is used for classification, mostly binary in practice.
This algorithm takes as input a training set
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
N
,
y
N
)
(x_1,y_1),(x_2,y_2),...,(x_N,y_N)
(x1,y1),(x2,y2),...,(xN,yN), where
x
n
x_n
xn belongs to some domain or instance space
X
X
X. And each lable
y
n
y_n
yn is in some lable set
Y
Y
Y.
One of the main ideas of the algorithm is to maintain a distribution or set of weights over the training set. The weight of this distribution on
x
n
\bf x_n
xn on round
t
t
t is denoted
u
n
(
t
)
u_n^{(t)}
un(t). Initially ,all weights are set equally,but on each round, the weights of incorrectly classified examples are increased so that the weak learner is forced to focus on the hard examples in the training set.
- 集成学习,适用于分类问题 (classification)。
- 有序地训练学习基 (base hypothesis or base learner, which is possibly weak).
- 当前轮的学习基在训练过程中会更重视上一个学习器中判错的样本。
5.212 Why AdaBoost?
AdaBoost can handle weak hypotheses which output real-valued or confidence-rated predictions. That is, for each instance, the weak hypothesis g t g_t gt outputs a prediction $g_t(\bf x_n) $ i n   R in \,\R inR, whose sign is the predicted lable (+1, -1)and whose magnitude ∣ g t ( x n ) ∣ |g_t(\bf x_n) | ∣gt(xn)∣ gives a measure of “confidence” in the prediction.
5.213 How AdaBoost?
Our mission is to find a weak hypothesis
g
t
g_t
gt for each round and their weights used for the final linear aggregation.
Given the learning data set
D
:
D:
D: {
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
N
,
y
N
)
(x_1,y_1),(x_2,y_2),...,(x_N,y_N)
(x1,y1),(x2,y2),...,(xN,yN)} and algorithm
A
.
A.
A.
- Set the initial weight u ( 1 ) = [ 1 N , 1 N , . . . , 1 N ] u^{(1)}=[{1\over N},{1\over N},...,{1\over N}] u(1)=[N1,N1,...,N1]
- f o r t = 1 , 2 , . . . , T \bf for\ t=1,2,...,T for t=1,2,...,T
- g t ← u ( t ) g_t \leftarrow u^{(t)} gt←u(t)
- ϵ t = ∑ n = 1 N u n ( t ) I [ y n ≠ g t ( x n ) ] ∑ n = 1 N u n ( t ) \epsilon_t=\frac{\sum_{n=1}^{N}u_n^{(t)}I\ [y_n\neq g_t(\bf x_n)]}{\sum_{n=1}^{N}u_n^{(t)}} ϵt=∑n=1Nun(t)∑n=1Nun(t)I [yn̸=gt(xn)]
- i f ϵ t > 0.5 if \ \epsilon_t>0.5 if ϵt>0.5, then break
- α t = 1 2 ln ( 1 − ϵ t ϵ t ) \alpha_t={1\over2}\ln({{1-\epsilon_t}\over \epsilon_t}) αt=21ln(ϵt1−ϵt)
-
u
(
t
+
1
)
←
u
(
t
)
u^{(t+1)} \leftarrow u^{(t)}
u(t+1)←u(t)
y n ≠ g t ( x n ) : y_n\neq g_t(\bf x_n):\quad yn̸=gt(xn): u ( t + 1 ) ← u ( t ) 1 − ϵ t ϵ t u^{(t+1)} \leftarrow u^{(t)}\sqrt{ {1-\epsilon_t}\over \epsilon_t} u(t+1)←u(t)ϵt1−ϵt
y n = g t ( x n ) : y_n= g_t(\bf x_n):\quad yn=gt(xn): u ( t + 1 ) ← u ( t ) / 1 − ϵ t ϵ t u^{(t+1)} \leftarrow u^{(t)}/\sqrt{ {1-\epsilon_t}\over \epsilon_t} u(t+1)←u(t)/ϵt1−ϵt - e n d f o r \bf end \ for end for
- R e t u r n \bf Return Return G ( x ) G(\bf x) G(x) = s i g n [ ∑ t = 1 T α t g t ( x ) ] =sign [\sum_{t=1}^T\alpha_tg_t(\bf x)] =sign[∑t=1Tαtgt(x)]