前言
提升方法是一种常用的统计学习方法,应用广泛且有效。在分类问题中,通过改变训练样本的权重,学习多个分类器,并将这些分类器线性组合,提高分类的性能。
一、AdaBoost是什么?
标准AdaBoost关注二分类问题,AdaBoost通过训练一系列的弱分类器来组成一个强分类器,每一轮训练时会提高前一轮弱分类器错误分类样本的权值,而降低那些被正确分类的样本的权值。模型最后的预测结果为各弱分类器预测结果的加权多数表决结果。具体的,加大分类误差率小的弱分类器权值,使其在表决中起较大的作用。
二、AdaBoost算法流程
- 输入:训练数据集 T = ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . , ( x N , y N ) T={(x_1,y_1),(x_2,y_2),..,(x_N,y_N)} T=(x1,y1),(x2,y2),..,(xN,yN),其中 x i ∈ R n x_i\in \mathbb{R}^n xi∈Rn, Y i ∈ { − 1 , 1 } Y_i\in \{-1,1\} Yi∈{−1,1};弱分类器算法(一般为树桩)
- 输出:最终分类器 G ( x ) G(x) G(x).
-
- 初始化训练数据的权值分布为 D 1 = ( w 11 , w 12 , . . . , w 1 N ) , w 1 i = 1 N D_1=(w_{11},w_{12},...,w_{1N}),w_{1i}=\frac{1}{N} D1=(w11,w12,...,w1N),w1i=N1
-
- 对m=1,2,…,M(M为弱分类器数量)
-
-
- 使用具有权值分布 D m D_m Dm的训练数据学习,得到第m个基分类器 G m ( x ) G_m(x) Gm(x)
-
-
-
- 计算
G
m
(
x
)
G_m(x)
Gm(x)的分类误差率:
e m = ∑ i = 1 N w m i I ( G m ( x i ) ≠ y i ) e_m=\sum_{i=1}^Nw_{mi}I(G_m(x_i)\ne y_i) em=i=1∑NwmiI(Gm(xi)=yi)
- 计算
G
m
(
x
)
G_m(x)
Gm(x)的分类误差率:
-
-
-
- 计算
G
m
(
x
)
G_m(x)
Gm(x)的系数(权重)
α m = 1 2 ln 1 − e m e m \alpha_m=\frac{1}{2}\ln \frac{1-e_m}{e_m} αm=21lnem1−em
- 计算
G
m
(
x
)
G_m(x)
Gm(x)的系数(权重)
-
-
-
- 更新训练集数据权值分布
D m + 1 = ( w m 1 , w m 2 , . . . , w m N ) w m i = w m i e − α m y i G ( x i ) Z m Z m = ∑ i = 1 N w m i e − α m y i G ( x i ) D_{m+1}=(w_{m1},w_{m2},...,w_{mN}) \\ w_{mi}=\frac{w_{mi}e^{-\alpha_m y_iG(x_i)}}{Z_m} \\ Z_m=\sum_{i=1}^Nw_{mi}e^{-\alpha_m y_iG(x_i)} Dm+1=(wm1,wm2,...,wmN)wmi=Zmwmie−αmyiG(xi)Zm=i=1∑Nwmie−αmyiG(xi)
— 最终分类器
G ( x ) = s i g n ( ∑ m = 1 M α m G m ( x ) ) G(x)=sign(\sum_{m=1}^M\alpha_mG_m(x)) G(x)=sign(m=1∑MαmGm(x))
- 更新训练集数据权值分布
-
三、 AdaBoost算法的解释
AdaBoost算法可解释为模型是加法模型、损失函数为指数函数、学习算法为前向分步算法时的二分类学习算法。
- 向前分步算法
加法模型:
f ( x ) = ∑ m = 1 M β m b ( x ; γ m ) f(x)=\sum_{m=1}^M\beta_mb(x;\gamma_m) f(x)=m=1∑Mβmb(x;γm)
其中, b ( x ; γ m ) b(x;\gamma_m) b(x;γm)为基函数, γ m \gamma_m γm为基函数参数, β m \beta_m βm为基函数的系数,M为基函数的个数。
学习加法模型 f ( x ) f(x) f(x)即对经验风险极小化:
min β m , γ m ∑ i = 1 N L ( y i , ∑ m = 1 M β m b ( x ; γ m ) ) \min_{\beta_m,\gamma_m}\sum_{i=1}^NL(y_i,\sum_{m=1}^M\beta_mb(x;\gamma_m)) βm,γmmini=1∑NL(yi,m=1∑Mβmb(x;γm))
通常这是一个复杂的优化问题,前向分布算法求解这一优化问题的想法是:从前往后每一步只学习一个基函数及其系数,逐步逼近最优解。具体地,每一步需要优化如下损失函数:
min β , γ ∑ i = 1 N L ( y i , f m − 1 ( x i ) + β b ( x ; γ ) ) \min_{\beta,\gamma}\sum_{i=1}^NL(y_i,f_{m-1}(x_i)+\beta b(x;\gamma)) β,γmini=1∑NL(yi,fm−1(xi)+βb(x;γ)) - 前向分布算法与AdaBoost
前向分步算法逐一学习基函数,这与AdaBoost算法逐一学习基本分类器的过程一致。
当前向分步算法的损失函数为指数损失函数,即:
L ( y , f ( x ) ) = e − y f ( x ) L(y,f(x))=e^{-yf(x)} L(y,f(x))=e−yf(x)
时,其学习的具体操作等价于AdaBoost算法学习的具体操作。
假设经过m-1轮迭代前向分步算法已经得到:
f m − 1 ( x ) = α 1 G 1 ( x ) + α 2 G 2 ( x ) + . . . + α m − 1 G m − 1 ( x ) f_{m-1}(x)=\alpha_1G_1(x)+\alpha_2G_2(x)+...+\alpha_{m-1}G_{m-1}(x) fm−1(x)=α1G1(x)+α2G2(x)+...+αm−1Gm−1(x)
在第m轮迭代得到 α m , G m ( x ) 和 f m ( x ) = f m − 1 ( x ) + α m G m ( x ) \alpha_m,G_m(x)和f_m(x)=f_{m-1}(x)+\alpha_mG_m(x) αm,Gm(x)和fm(x)=fm−1(x)+αmGm(x),目标是希望得到:
( α m , G m ( x ) ) = arg min α m , G m ( x ) ∑ i = 1 N e − y i ( f m − 1 ( x i ) + α m G m ( x i ) ) (\alpha_m,G_m(x))=\arg\min_{\alpha_m,G_m(x)}\sum_{i=1}^Ne^{-y_i(f_{m-1}(x_i)+\alpha_mG_m(x_i))} (αm,Gm(x))=argαm,Gm(x)mini=1∑Ne−yi(fm−1(xi)+αmGm(xi))
令 w ‾ m i = e − y i f m − 1 ( x i ) \overline w_{mi}=e^{-y_if_{m-1}(x_i)} wmi=e−yifm−1(xi),则上式子可以化为:
( α m , G m ) = arg min α m , G m ( x ) ∑ i = 1 N w ‾ m i e − α m y i G m ( x i ) = arg min α m , G m e − α m ∑ y i = G m ( x i ) w ‾ m i + e α m ∑ y i ≠ G m ( x i ) w ‾ m i = arg min α m , G m e − α m ( ∑ i = 1 N w ‾ m i − ∑ y i ≠ G m ( x i ) w ‾ m i ) + e α m ∑ y i ≠ G m ( x i ) w ‾ m i = arg min α m , G m e − α m ∑ i = 1 N w ‾ m i + ( e α m − e − α m ) ∑ y i ≠ G m ( x i ) w ‾ m i = arg min α m , G m e − α m ∑ i = 1 N w ‾ m i + ( e α m − e − α m ) ∑ i = 1 N w ‾ m i I ( y i ≠ G m ( x i ) ) (\alpha_m,G_m)=\arg\min_{\alpha_m,G_m(x)}\sum_{i=1}^N\overline w_{mi}e^{-\alpha_my_iG_m(x_i)} \\ =\arg\min_{\alpha_m,G_m}e^{-\alpha_m}\sum_{y_i=G_m(x_i)}\overline w_{mi}+e^{\alpha_m}\sum_{y_i\ne G_m(x_i)}\overline w_{mi} \\ =\arg\min_{\alpha_m,G_m}e^{-\alpha_m}(\sum_{i=1}^N\overline w_{mi}-\sum_{y_i\ne G_m(x_i)}\overline w_{mi})+e^{\alpha_m}\sum_{y_i\ne G_m(x_i)}\overline w_{mi} \\ =\arg\min_{\alpha_m,G_m}e^{-\alpha_m}\sum_{i=1}^N\overline w_{mi}+(e^{\alpha_m}-e^{-\alpha_m})\sum_{y_i\ne G_m(x_i)}\overline w_{mi} \\ =\arg\min_{\alpha_m,G_m}e^{-\alpha_m}\sum_{i=1}^N\overline w_{mi}+(e^{\alpha_m}-e^{-\alpha_m})\sum_{i=1}^N\overline w_{mi}I(y_i\ne G_m(x_i)) (αm,Gm)=argαm,Gm(x)mini=1∑Nwmie−αmyiGm(xi)=argαm,Gmmine−αmyi=Gm(xi)∑wmi+eαmyi=Gm(xi)∑wmi=argαm,Gmmine−αm(i=1∑Nwmi−yi=Gm(xi)∑wmi)+eαmyi=Gm(xi)∑wmi=argαm,Gmmine−αmi=1∑Nwmi+(eαm−e−αm)yi=Gm(xi)∑wmi=argαm,Gmmine−αmi=1∑Nwmi+(eαm−e−αm)i=1∑NwmiI(yi=Gm(xi))
对于固定的 α m \alpha_m αm,上式中 e − α m ∑ i = 1 N w ‾ m i e^{-\alpha_m}\sum_{i=1}^N\overline w_{mi} e−αm∑i=1Nwmi和 e α m − e − α m e^{\alpha_m}-e^{-\alpha_m} eαm−e−αm都是定值,则上式等价于:
G m ∗ = arg min G m ∑ i = 1 N w ‾ m i I ( y i ≠ G m ( x i ) ) G_m^*=\arg\min_{G_m}\sum_{i=1}^N\overline w_{mi}I(y_i\ne G_m(x_i)) Gm∗=argGmmini=1∑NwmiI(yi=Gm(xi))
这与AdaBoost中要寻找的基本分类器一致。
然后对 α m \alpha_m αm求导并使其等于0,得:
α m ∗ = 1 2 ln 1 − e m e m e m = ∑ i = 1 N w ‾ m i I ( y i ≠ G m ( x i ) ) ∑ i = 1 N w ‾ m i \alpha_m^*=\frac{1}{2}\ln \frac{1-e_m}{e_m} \\ e_m=\frac{\sum_{i=1}^N\overline w_{mi}I(y_i\ne G_m(x_i))}{\sum_{i=1}^N\overline w_{mi}} αm∗=21lnem1−emem=∑i=1Nwmi∑i=1NwmiI(yi=Gm(xi))
令 w m i = w ‾ m i ∑ i = 1 N w ‾ m i w_{mi}=\frac{\overline w_{mi}}{\sum_{i=1}^N\overline w_{mi}} wmi=∑i=1Nwmiwmi,得 e m = ∑ i = 1 N w m i I ( y i ≠ G m ( x i ) ) e_m=\sum_{i=1}^Nw_{mi}I(y_i\ne G_m(x_i)) em=∑i=1NwmiI(yi=Gm(xi)),这与AdaBoost一致。特别地,当 m = 0 m=0 m=0时, w ‾ m i = e − y i ∗ 0 = 1 , w m i = 1 N = w ‾ m i ∑ i = 1 N w ‾ m i \overline w_{mi}=e^{-y_i*0}=1,w_{mi}=\frac{1}{N}=\frac{\overline w_{mi}}{\sum_{i=1}^N\overline w_{mi}} wmi=e−yi∗0=1,wmi=N1=∑i=1Nwmiwmi。
由
w
‾
m
i
=
e
−
y
i
f
m
−
1
(
x
i
)
\overline w_{mi}=e^{-y_if_{m-1}(x_i)}
wmi=e−yifm−1(xi)以及
f
m
(
x
i
)
=
f
m
−
1
(
x
)
+
α
m
G
m
(
x
i
)
f_m(x_i)=f_{m-1}(x)+\alpha_mG_m(x_i)
fm(xi)=fm−1(x)+αmGm(xi)得:
w
‾
m
+
1
,
i
=
w
‾
m
i
e
−
y
i
α
m
G
m
(
x
i
)
\overline w_{m+1,i}=\overline w_{mi}e^{-y_i\alpha_mG_m(x_i)}
wm+1,i=wmie−yiαmGm(xi)
由
w
m
+
1
,
i
=
w
‾
m
+
1
,
i
∑
i
=
1
N
w
‾
m
+
1
,
i
w_{m+1,i}=\frac{\overline w_{m+1,i}}{\sum_{i=1}^N\overline w_{m+1,i}}
wm+1,i=∑i=1Nwm+1,iwm+1,i以及上式得:
w
m
+
1
,
i
=
w
‾
m
i
e
−
y
i
α
m
G
m
(
x
i
)
∑
i
=
1
N
w
‾
m
i
e
−
y
i
α
m
G
m
(
x
i
)
=
w
‾
m
i
∑
i
=
1
N
w
‾
m
i
e
−
y
i
α
m
G
m
(
x
i
)
∑
i
=
1
N
w
‾
m
i
∑
i
=
1
N
w
‾
m
i
e
−
y
i
α
m
G
m
(
x
i
)
=
w
m
i
e
−
y
i
α
m
G
m
(
x
i
)
∑
i
=
1
N
w
m
i
e
−
y
i
α
m
G
m
(
x
i
)
w_{m+1,i}=\frac{\overline w_{mi}e^{-y_i\alpha_mG_m(x_i)}}{\sum_{i=1}^N\overline w_{mi}e^{-y_i\alpha_mG_m(x_i)}} \\ =\frac{\frac{\overline w_{mi}}{\sum_{i=1}^N\overline w_{mi}}e^{-y_i\alpha_mG_m(x_i)}}{\sum_{i=1}^N\frac{\overline w_{mi}}{\sum_{i=1}^N\overline w_{mi}}e^{-y_i\alpha_mG_m(x_i)}} \\ =\frac{w_{mi}e^{-y_i\alpha_mG_m(x_i)}}{\sum_{i=1}^Nw_{mi}e^{-y_i\alpha_mG_m(x_i)}}
wm+1,i=∑i=1Nwmie−yiαmGm(xi)wmie−yiαmGm(xi)=∑i=1N∑i=1Nwmiwmie−yiαmGm(xi)∑i=1Nwmiwmie−yiαmGm(xi)=∑i=1Nwmie−yiαmGm(xi)wmie−yiαmGm(xi)
与AdaBoost一致。
综上,模型是加法模型、损失函数为指数函数、学习算法为前向分步算法时可以推导出AdaBoost。
四、代码实现
"""
AdaBoost 算法
"""
import numpy as np
from sklearn.datasets import load_digits
from tqdm import tqdm
class BasicClassifier(object):
def __init__(self, train_xs, train_ys, weights, attr_type, split_cnt=10):
"""
:param train_xs: 特征
:param train_ys: 标签
:param weights: 权重
:param attr_type: 属性的类别(离散或者连续)
:param split_cnt: 对于连续属性划分区域个数
"""
self.train_xs = train_xs
self.train_ys = train_ys
self.m, self.n = self.train_xs.shape
assert len(weights) == self.m
assert len(attr_type) == self.n
self.weights = weights
self.attr_type = attr_type
self.split_cnt = split_cnt
def build(self):
"""
建立一个基本的分类器
:return:
"""
min_em = float('inf') # 最小误差率
attr_index = -1
attr_value = -1
predict_ys = None
# 对于连续属性为选取属性哪边预测为-1,lt为左侧,gt为右侧
# 对于离散属性为选中类别的选中取值预测为1还是-1,'eq'为选中取值预测为-1,'neq'为选中取值预测为1
side = None
for i in range(len(self.attr_type)):
if self.attr_type[i] == 0: # 该属性为离散的
uniques = np.unique(self.train_xs[:, i])
for j in range(len(uniques)):
for ineq in ['eq', 'neq']:
_predict_ys = np.ones((self.m,))
if ineq == 'eq':
_predict_ys[self.train_xs[:, i] == uniques[j]] = -1
else:
_predict_ys[self.train_xs[:, i] != uniques[j]] = -1
em = self.weights[_predict_ys != self.train_ys].sum() # 计算出误差率
if em < min_em:
min_em = em
attr_index = i
attr_value = uniques[j]
predict_ys = _predict_ys
side = ineq
else: # 该属性为连续的
_min, _max = np.min(self.train_xs[:, i]), np.max(self.train_xs[:, i])
step = (_max - _min) / self.split_cnt
for j in range(self.split_cnt+1):
split_value = _min + step * j
for ineq in ['lt', 'gt']:
_predict_ys = np.ones((self.m,))
if ineq == 'lt':
_predict_ys[self.train_xs[:, i] < split_value] = -1
else:
_predict_ys[self.train_xs[:, i] >= split_value] = -1
em = self.weights[_predict_ys != self.train_ys].sum() # 计算出误差率
if em < min_em:
min_em = em
attr_index = i
attr_value = split_value
predict_ys = _predict_ys
side = ineq
return min_em, attr_index, attr_value, predict_ys, side
class AdaBoost(object):
"""
提升方法AdaBoost方法
"""
def __init__(self) -> None:
self.classifers = []
def train(self, train_xs, train_ys, test_xs, test_ys, attr_type, iters=500, test_freq=50):
m, n = train_xs.shape
weights = (1. / m) * np.ones((m,))
for i in tqdm(range(iters)):
basicclassifier = BasicClassifier(train_xs, train_ys, weights, attr_type)
min_em, attr_index, attr_value, predict_ys, side = basicclassifier.build()
am = 0.5 * np.log((1 - min_em) / min_em)
weights = weights * np.exp(-am * train_ys.reshape((m,)) * predict_ys)
weights /= np.sum(weights)
self.classifers.append((am, attr_index, attr_value, side))
if test_xs is not None and test_ys is not None and (i+1) % test_freq == 0:
accuracy = self.test(test_xs, test_ys, attr_type)
print("iters:%d, accuracy is %.4f" % (i+1, accuracy))
def test(self, test_xs, test_ys, attr_type):
"""
测试函数
"""
predict_ys = np.zeros((test_xs.shape[0]))
for am, attr_index, attr_value, side in self.classifers:
if attr_type[attr_index] == 0: # 属性为离散时
if side == 'eq':
predict_ys[test_xs[:, attr_index] == attr_value] += -am
predict_ys[test_xs[:, attr_index] != attr_value] += am
else:
predict_ys[test_xs[:, attr_index] == attr_value] += am
predict_ys[test_xs[:, attr_index] != attr_value] += -am
else: # 属性为连续时
if side == 'lt':
predict_ys[test_xs[:, attr_index] < attr_value] += -am
predict_ys[test_xs[:, attr_index] >= attr_value] += am
else:
predict_ys[test_xs[:, attr_index] < attr_value] += am
predict_ys[test_xs[:, attr_index] >= attr_value] += -am
predict_ys[predict_ys > 0] = 1
predict_ys[predict_ys < 0] = -1
accuracy = (predict_ys == test_ys).sum() / test_xs.shape[0]
return accuracy
if __name__ == '__main__':
# 加载sklearn自带的手写数字识别数据集
digits = load_digits()
features = digits.data
targets = (digits.target > 4).astype(int)
targets[targets == 0] = -1
# 随机打乱数据
shuffle_indices = np.random.permutation(features.shape[0])
features = features[shuffle_indices]
targets = targets[shuffle_indices]
# 划分训练、测试集
train_count = int(len(features)*0.8)
train_xs, train_ys = features[:train_count], targets[:train_count]
test_xs, test_ys = features[train_count:], targets[train_count:]
attr_type = [1] * train_xs.shape[1]
adaboost = AdaBoost()
adaboost.train(train_xs, train_ys, test_xs, test_ys, attr_type)