模型概述
Adaboost模型属于boost模型中的一种,boost模型的思想是通过从弱学习算法出发,反复学习,得到一系列弱分类器(又称为基本分类器),然后组合这些弱分类器,得到相应的强分类器。大多数的boost方法都是改变训练数据的概率分布,然后针对不同的训练数据分布学习相应的弱分类器。
Adaboost的模型的思想是在每一次训练过程中提高被前一轮弱分类器的错误分类的样本的权重,这样可以让分类器更好的纠正错误。在训练完所有的分类器后,Adaboost采用的是加权多数表决的方法来进行投票,加大分类误差率小的分类器的权重,使其在表决中能够其较大作用。
Adaboost模型可以看成是加法模型的特例,形式如下:
f
(
x
)
=
∑
m
=
1
M
α
m
G
m
(
x
)
f(x) = \sum_{m=1}^M \alpha_mG_m(x)
f(x)=m=1∑MαmGm(x)
G
m
(
x
)
G_m(x)
Gm(x)代表基分类器,
α
m
\alpha_m
αm代表其系数
模型策略
Adaboost模型可以看成是加法模型,相应的损失函数可以是指数损失函数。
L
(
y
,
f
(
x
)
)
=
e
x
p
[
−
y
f
(
x
)
]
L(y,f(x)) = exp[-yf(x)]
L(y,f(x))=exp[−yf(x)]
记
f
k
(
x
)
f_k(x)
fk(x)为经过前k次学习后前k个弱学习器组合后的学习器。假设第k次迭代的参数
α
k
,
G
k
(
x
)
\alpha_k,G_k(x)
αk,Gk(x),则
f
k
(
x
)
=
f
k
−
1
(
x
)
+
α
k
G
k
(
x
)
f_k(x) = f_{k-1}(x) + \alpha_kG_k(x)
fk(x)=fk−1(x)+αkGk(x)
将上式代入损失函数可得:
l
o
s
s
=
∑
i
=
1
N
e
x
p
[
−
y
i
(
f
k
−
1
(
x
i
)
+
α
G
(
x
i
)
)
]
loss = \sum_{i=1}^Nexp[-y_i(f_{k-1}(x_i)+\alpha G(x_i))]
loss=i=1∑Nexp[−yi(fk−1(xi)+αG(xi))]
根据经验损失最小化的原则,有
α
k
,
G
k
(
x
)
\alpha_k,G_k(x)
αk,Gk(x)为:
(
α
k
,
G
k
(
x
)
)
=
a
r
g
m
i
n
α
,
G
(
x
)
∑
i
=
1
N
e
x
p
[
−
y
i
(
f
k
−
1
(
x
)
+
α
G
(
x
i
)
)
]
(\alpha_k,G_k(x)) = \mathop {argmin}\limits_{\alpha,G(x)}\sum\limits_{i=1}^{N}exp[-y_i(f_{k-1}(x)+\alpha G(x_i))]
(αk,Gk(x))=α,G(x)argmini=1∑Nexp[−yi(fk−1(x)+αG(xi))]
固定
α
\alpha
α,则有使上式最小的
G
m
∗
(
x
)
G_m^*(x)
Gm∗(x)应该是
G
m
∗
(
x
)
=
a
r
g
m
i
n
G
∑
i
=
1
N
w
m
i
−
I
(
y
i
≠
G
(
x
i
)
)
G_m^*(x) = \mathop {argmin}\limits_{G}\sum\limits_{i=1}^{N}w_ {mi}^{-} I(y_i \neq G(x_i))
Gm∗(x)=Gargmini=1∑Nwmi−I(yi=G(xi))
其中
w
m
i
−
=
e
x
p
[
−
y
i
f
m
−
1
(
x
i
)
]
w_{mi}^{-}=exp[-y_if_{m-1}(x_i)]
wmi−=exp[−yifm−1(xi)]
而对于
α
m
∗
\alpha_m^*
αm∗,从损失函数有:
∑
i
=
1
N
w
m
i
−
e
x
p
[
−
y
i
α
G
(
x
i
)
)
]
=
∑
y
i
=
G
(
x
i
)
w
m
i
−
e
−
α
+
∑
y
i
≠
G
(
x
i
)
w
m
i
−
e
α
=
(
e
α
−
e
−
α
)
G
∑
i
=
1
N
w
m
i
−
I
(
y
i
≠
G
(
x
i
)
)
+
e
−
α
∑
i
=
1
N
w
m
i
−
\sum_{i=1}^Nw_{mi}^{-}exp[-y_i\alpha G(x_i))] = \sum\limits_{y_i=G(x_i)}w_{mi}^{-}e^{-\alpha}+\sum\limits_{y_i\neq G(x_i)}w_{mi}^{-}e^{\alpha}=\\ (e^{\alpha}-e^{-\alpha}) {G}\sum\limits_{i=1}^{N}w_{mi}^{-}I(y_i \neq G(x_i)) +e^{-\alpha}\sum\limits_{i=1}^{N}w_{mi}^{-}
i=1∑Nwmi−exp[−yiαG(xi))]=yi=G(xi)∑wmi−e−α+yi=G(xi)∑wmi−eα=(eα−e−α)Gi=1∑Nwmi−I(yi=G(xi))+e−αi=1∑Nwmi−
对上式
α
\alpha
α求导,则有
α
m
∗
=
1
2
l
o
g
1
−
e
m
e
m
\alpha_m^* = \frac{1}{2}log\frac{1-e_m}{e_m}
αm∗=21logem1−em
其中:
e
m
=
∑
i
=
1
N
w
m
i
−
I
(
y
i
≠
G
(
x
i
)
)
∑
i
=
1
N
w
m
i
−
=
∑
i
=
1
N
w
m
i
I
(
y
i
≠
G
(
x
i
)
)
e_m = \frac{\sum\limits_{i=1}^{N}w_{mi}^{-}I(y_i \neq G(x_i))}{\sum\limits_{i=1}^{N}w_{mi}^{-}}=\sum\limits_{i=1}^{N}w_{mi}I(y_i \neq G(x_i))
em=i=1∑Nwmi−i=1∑Nwmi−I(yi=G(xi))=i=1∑NwmiI(yi=G(xi))
最后每一轮权值的更新由:
w
m
i
−
=
e
x
p
[
−
y
i
f
m
−
1
(
x
i
)
]
w_{mi}^{-}=exp[-y_if_{m-1}(x_i)]
wmi−=exp[−yifm−1(xi)],以及
f
m
(
x
)
=
f
m
−
1
(
x
)
+
α
k
G
m
(
x
)
f_m(x) = f_{m-1}(x) + \alpha_kG_m(x)
fm(x)=fm−1(x)+αkGm(x)
可得:
w
m
+
1
,
i
−
=
w
m
,
i
−
e
x
p
[
−
y
i
α
m
G
m
(
x
)
]
w_{{m+1},i}^{-}=w_{{m},i}^{-}exp[-y_i \alpha _{m}G_m(x)]
wm+1,i−=wm,i−exp[−yiαmGm(x)]
模型算法
输入:训练数据集
T
=
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
N
,
y
N
)
)
T = {(x_1,y_1),(x_2,y_2),...,(x_N,y_N))}
T=(x1,y1),(x2,y2),...,(xN,yN)),其中
x
i
∈
X
⊆
R
n
,
y
i
∈
Y
=
{
−
1
,
+
1
}
x_i\in X\subseteq R^n,y_i \in Y = \{-1,+1\}
xi∈X⊆Rn,yi∈Y={−1,+1}
输出: 最终分类器
G
(
x
)
G(x)
G(x)
(1)初始化训练数据的权值分布
D
1
=
(
w
11
,
.
.
.
,
w
1
i
,
.
.
.
,
w
1
N
)
,
w
1
i
=
1
N
,
i
=
1
,
2
,
.
.
.
,
N
D_1 = (w_{11},...,w_{1i},...,w_{1N}), w_{1i} = \frac{1}{N},i = 1,2,...,N
D1=(w11,...,w1i,...,w1N),w1i=N1,i=1,2,...,N
(2)对m = 1,2,…,M
( a )使用具有权值分布
D
m
D_m
Dm的训练数据集学习,得到基本分类器
G
m
(
x
)
:
X
−
>
{
−
1
,
+
1
}
G_m(x): X ->\{-1,+1\}
Gm(x):X−>{−1,+1}
( b )计算G_m(x)在训练数据集上的分类误差率
e
m
=
∑
i
=
1
N
w
m
i
−
I
(
y
i
≠
G
(
x
i
)
)
∑
i
=
1
N
w
m
i
−
=
∑
i
=
1
N
w
m
i
I
(
y
i
≠
G
(
x
i
)
)
e_m = \frac{\sum\limits_{i=1}^{N}w_{mi}^{-}I(y_i \neq G(x_i))}{\sum\limits_{i=1}^{N}w_{mi}^{-}}=\sum\limits_{i=1}^{N}w_{mi}I(y_i \neq G(x_i))
em=i=1∑Nwmi−i=1∑Nwmi−I(yi=G(xi))=i=1∑NwmiI(yi=G(xi))
( c )计算
G
m
G_m
Gm的系数
α
m
=
1
2
l
o
g
1
−
e
m
e
m
\alpha_m = \frac{1}{2}log\frac{1-e_m}{e_m}
αm=21logem1−em
( d )更新训练数据集的权值分布
D
m
+
1
=
(
w
m
+
1
,
1
,
.
.
.
,
w
m
+
1
,
i
,
.
.
.
,
w
m
+
1
,
N
)
w
m
+
1
,
i
=
w
m
,
i
Z
m
e
x
p
(
−
α
m
y
i
G
m
(
x
i
)
)
Z
m
=
∑
i
=
1
N
w
m
,
i
e
x
p
(
−
α
m
y
i
G
m
(
x
i
)
)
D_{m+1}= (w_{m+1,1},...,w_{m+1,i},...,w_{m+1,N}) \\ w_{m+1,i}=\frac{w_{m,i}}{Z_m}exp(-\alpha_my_iG_m(x_i)) \\ Z_m = \sum_{i=1}^{N}w_{m,i}exp(-\alpha_my_iG_m(x_i))
Dm+1=(wm+1,1,...,wm+1,i,...,wm+1,N)wm+1,i=Zmwm,iexp(−αmyiGm(xi))Zm=i=1∑Nwm,iexp(−αmyiGm(xi))
(3)生成最终分类器
G
(
x
)
=
s
i
g
n
(
∑
m
=
1
M
α
m
G
m
(
x
)
)
G(x) = sign(\sum_{m=1}^{M}\alpha_mG_m(x))
G(x)=sign(m=1∑MαmGm(x))
代码实现
首先导入相关包
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
import matplotlib.pyplot as pyplot
引入测试数据
def create_data():
iris=load_iris()
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df['label']=iris.target
df.columns=['sepal length','sepal width','pedal_length','pedal width','label']
data=df.iloc[:100,[0,1,-1]]
data['label'].apply(lambda x: 1 if x==1 else -1)
data = np.array(data)
return data[:,:2] ,data[:,-1]
算法核心原理部分主要包括生成G(x)和 α \alpha α的部分
class Adaboost:
def __init__(self, n_estimators, learning_rate):
self.n_estimators = n_estimators
self.learning_rate = learning_rate
self.model = []
def fit(self,X_train, y_train):
"""拟合训练集数据"""
self.m, self.n = X_train.shape
#初始化权值分布
self.weight = [np.ones(self.m) / self.m]
for i in range(self.n_estimators):
compare_array, position, threshold, error, axis = self._G(X_train, y_train,self.weight[i])
alpha_i = self.caculate_alpha(error)
Z_i = self.caculate_Z(alpha_i, self.weight[i], compare_array, y_train)
self.weight.append(self.weight[i] * np.exp(- alpha_i * compare_array * y_train /Z_i))
self.model.append((axis, alpha_i, position, threshold))
def caculate_alpha(self,error):
return 0.5 * np.log((1 -error) / error)
def caculate_Z(self, alpha, weight, pre_y, y):
return np.dot(weight,np.exp(- alpha * pre_y *y))
def calculate_err_rate(self, pre_y, y, weight):
error = sum([weight[i] if pre_y[i] != y[i] else 0 for i in range(self.m)])
return error
def G(self, threshold, x, position):
#基本分类器
if position == 'positive':
pre_y = np.array([1 if x[i] >threshold else -1 for i in range(len(x))])
else:
pre_y = np.array([-1 if x[i] >threshold else 1 for i in range(len(x))])
return pre_y
def _G(self, X_train, y_train,weight):
min_error = np.inf
position = None
threshold = None
compare_array = None
axis = None
for i in range(self.n):
feature = X_train[:, i]
feature_max = max(feature)
feature_min = min(feature)
iter_num = int((feature_max -feature_min) // self.learning_rate)
for j in range(iter_num):
vi = feature_min + j * self.learning_rate
pre_y_positive = self.G(vi, feature, 'positive')
err_positive = self.calculate_err_rate(pre_y_positive, y_train,weight)
pre_y_negative = self.G(vi, feature, 'negative')
err_negative = self.calculate_err_rate(pre_y_negative, y_train, weight)
if err_positive >err_negative:
if err_negative < min_error:
max_error = err_negative
position = 'nagetive'
compare_array = pre_y_negative
threshold = vi
axis = i
else:
if err_positive < min_error:
max_error = err_positive
position = 'positive'
compare_array = pre_y_positive
threshold = vi
axis = i
return compare_array, position, threshold, max_error, axis
def predict(self, X_test, y_test):
"""预测测试集数据"""
result = []
for i in range(len(self.model)):
axis, alpha_i, position, threshold = self.model[i]
result += alpha_i * self.G(threshold, X_test[i], y_test)
return [1 if result[i] > 0 else -1 for i in range(len(result))]
def score(self, X_test, y_test):
"""测试模型正确率"""
num = X_test.shape[0]
acc_num = 0
f = self.predict(X_test)
acc_num = sum(f == y_test)
return float(acc_num / num)