基本概念
Adaboost(Adaptive Boosting)为自适应提升算法。其基本思路为1. 提高那些被前一轮分类器错误分类的样本的权重,而降低那些被正确分类的样本的权重。2. 加大分类错误率低的弱分类器的权重
分类任务
基本概念
对于
K
K
K分类问题而言,当样本标签
y
=
[
y
1
,
.
.
.
,
y
K
]
T
\mathbf{y}=[y_1,...,y_K]^T
y=[y1,...,yK]T的类别
c
(
y
)
c(\mathbf{y})
c(y) 为第
k
k
k类
(
k
=
1
,
.
.
.
,
K
)
(k=1,...,K)
(k=1,...,K)时,
y
i
y_i
yi满足
y
i
=
{
1
,
i
f
c
(
y
)
=
K
−
1
K
−
1
,
i
f
c
(
y
)
≠
K
(
式
1
)
y_i=\left\{ \begin{aligned} &1,\quad &{\rm if}\ c(\mathbf{y})=K\\ &-\frac{1}{K-1},\quad &{\rm if}\ c(\mathbf{y})\neq K \end{aligned} \right. \quad\quad\quad\quad(式1)
yi=⎩⎨⎧1,−K−11,if c(y)=Kif c(y)=K(式1)
则
∑
i
=
1
K
y
i
=
0
\sum_{i=1}^{K} y_i=0
i=1∑Kyi=0
- 损失函数
设模型的输出结果为 f = [ f 1 , . . . , f K ] T \mathbf{f}=[f_1,...,f_K]^T f=[f1,...,fK]T,则记损失函数为
L ( y , f ) = exp ( − y T f K ) L(\mathbf{y},\mathbf{f})=\exp(-\frac{\mathbf{y}^T\mathbf{f}}{K}) L(y,f)=exp(−KyTf)
由于对任意的常数向量 a = [ a , a , . . . , a ] T \boldsymbol{a}=[a,a,...,a]^T a=[a,a,...,a]T有
L ( y , f + a ) = exp ( − y T f K − y T a K ) = exp ( − y T f K − a ∑ i = 1 K y i ) = exp ( − y T f K ) = L ( y , f ) \begin{aligned} L(\mathbf{y}, \mathbf{f}+\boldsymbol{a})&= \exp(-\frac{\mathbf{y}^T\mathbf{f}}{K}-\frac{\mathbf{y}^T\boldsymbol{a}}{K})\\ &= \exp(-\frac{\mathbf{y}^T\mathbf{f}}{K}-a\sum_{i=1}^{K} y_i)\\ &= \exp(-\frac{\mathbf{y}^T\mathbf{f}}{K})\\ &= L(\mathbf{y}, \mathbf{f}) \end{aligned} L(y,f+a)=exp(−KyTf−KyTa)=exp(−KyTf−ai=1∑Kyi)=exp(−KyTf)=L(y,f)
例:假设有一个3分类问题,标签类别为第2类,即 y = [ − 0.5 , 1 , − 0.5 ] T y=[-0.5,1,-0.5]^T y=[−0.5,1,−0.5]T,模型输出的类别标签 f = [ − 0.1 , − 0.3 , 0.4 ] T \mathbf{f}=[-0.1,-0.3,0.4]^T f=[−0.1,−0.3,0.4]T,则模型指数损失 L = exp ( − y T f K ) = exp ( − ( − 0.5 ) ∗ ( − 0.1 ) + 1 ∗ ( − 0.3 ) + ( − 0.5 ) ∗ 0.4 3 ) ≈ 0.86 L=\exp(-\frac{\mathbf{y}^T\mathbf{f}}{K})=\exp(-\frac{(-0.5)*(-0.1)+1*(-0.3)+(-0.5)*0.4}{3})\approx 0.86 L=exp(−KyTf)=exp(−3(−0.5)∗(−0.1)+1∗(−0.3)+(−0.5)∗0.4)≈0.86 - 指数损失函数的意义
满足对称约束条件 f 1 + f 2 + . . . + f K = 0 f_1+f_2+...+f_K=0 f1+f2+...+fK=0的损失函数期望 E Y ∣ x L ( Y , f ) \mathbb{E}_{\mathbf{Y}\vert\mathbf{x}}L(\mathbf{Y},f) EY∣xL(Y,f)达到最小时,由拉格朗日乘子法可解得模型输出为 k ∗ = arg max k f k ∗ ( x ) = arg max k ( K − 1 ) [ ln P ( c = k ∣ x ) − 1 K ∑ i = 1 K ln P ( c = i ∣ x ) ] = arg max k P ( c = k ∣ x ) \begin{aligned} k^* &= \mathop{\arg\max}_kf_k^*(\mathbf{x})\\&= \mathop{\arg\max}_k (K-1)[\ln P(c=k\vert \mathbf{x})-\frac{1}{K}\sum_{i=1}^K\ln P(c=i\vert \mathbf{x})] \\&= \mathop{\arg\max}_k P(c=k\vert \mathbf{x}) \end{aligned} k∗=argmaxkfk∗(x)=argmaxk(K−1)[lnP(c=k∣x)−K1i=1∑KlnP(c=i∣x)]=argmaxkP(c=k∣x)即模型在期望损失达到最小时的输出结果是使得后验概率 P ( c ∣ x ) P(c\vert \mathbf{x}) P(c∣x)达到最大的类别。也就是说选择指数损失能够满足贝叶斯最优决策条件。
SAMME算法
SAMME(Stepwise Additive Modeling using a Multiclass Exponential loss function)算法。
基本定义
-
模型总输出: f ( M ) ( x ) = ∑ m = 1 M β ( m ) b ( m ) ( x ) \mathbf{f}^{(M)}(\mathbf{x})=\sum_{m=1}^M \beta^{(m)} \mathbf{b}^{(m)}(\mathbf{x}) f(M)(x)=m=1∑Mβ(m)b(m)(x)
其中, M M M是模型的总迭代轮数,
β ( m ) ∈ R + \beta^{(m)}\in \mathbb{R^+} β(m)∈R+是每轮模型的加权系数,
b ( m ) ( x ) ∈ R K \mathbf{b}^{(m)}(\mathbf{x}) \in\mathbb{R}^K b(m)(x)∈RK是基模型 G G G输出类别的标签向量,计算参考(式1)。 -
第 m m m轮模型输出: f ( m ) ( x i ) = f ( m − 1 ) ( x i ) + β ∗ ( m ) b ∗ ( m ) ( x i ) \mathbf{f}^{(m)}(\mathbf{x}_i)=\mathbf{f}^{(m-1)}(\mathbf{x}_i)+\beta^{*(m)}\mathbf{b}^{*(m)}(\mathbf{x}_i) f(m)(xi)=f(m−1)(xi)+β∗(m)b∗(m)(xi)
-
样本 x i \mathbf{x}_i xi在第 m m m轮的预测类别: k i ∗ = arg max k f ( m ) ( x i ) k_i^*=\mathop{\arg\max}_{k} \mathbf{f}^{(m)}(\mathbf{x}_i) ki∗=argmaxkf(m)(xi)
-
第 m m m轮优化目标: ( β ∗ ( m ) , b ∗ ( m ) ) = arg min β ( m ) , b ( m ) ∑ i = 1 n L ( y i , f ( m − 1 ) ( x i ) + β ( m ) b ( m ) ( x i ) ) = arg min β ( m ) , b ( m ) ∑ i = 1 n w i exp ( − 1 K β ( m ) y i T b ( m ) ( x i ) ) \begin{aligned} (\beta^{*(m)}, \mathbf{b}^{*(m)})&= \mathop{\arg\min}_{\beta^{(m)}, \mathbf{b}^{(m)}}\sum_{i=1}^n L(\mathbf{y}_i, \mathbf{f}^{(m-1)}(\mathbf{x}_i)+\beta^{(m)}\mathbf{b}^{(m)}(\mathbf{x}_i))\\&= \mathop{\arg\min}_{\beta^{(m)}, \mathbf{b}^{(m)}}\sum_{i=1}^n w_i\exp(-\frac{1}{K}\beta^{(m)}\mathbf{y}_i^T\mathbf{b}^{(m)}(\mathbf{x}_i)) \end{aligned} (β∗(m),b∗(m))=argminβ(m),b(m)i=1∑nL(yi,f(m−1)(xi)+β(m)b(m)(xi))=argminβ(m),b(m)i=1∑nwiexp(−K1β(m)yiTb(m)(xi))
-
第 m m m轮样本权重: w i = exp ( − 1 K y i T f ( m − 1 ) ( x i ) ) w_i=\exp(-\frac{1}{K}\mathbf{y}_i^T\mathbf{f}^{(m-1)}(\mathbf{x}_i)) wi=exp(−K1yiTf(m−1)(xi))
-
第 m m m轮损失函数:
L ~ ( β ( m ) , b ( m ) ) = ∑ i = 1 n w i exp ( − 1 K β ( m ) y i T b ( m ) ( x i ) ) = ∑ i ∈ T w i exp [ − β m K − 1 ] + ∑ i ∉ T w i exp [ β ( m ) ( K − 1 ) 2 ] = ∑ i ∈ T w i exp [ − β m K − 1 ] + ∑ i ∉ T w i exp [ − β m K − 1 ] − ∑ i ∉ T w i exp [ − β m K − 1 ] + ∑ i ∉ T w i exp [ β ( m ) ( K − 1 ) 2 ] = exp [ − β ( m ) K − 1 ] ∑ i = 1 n w i + { exp [ β ( m ) ( K − 1 ) 2 ] − exp [ − β ( m ) K − 1 ] } ∑ i = 1 n w i I { i ∉ T } \begin{aligned} \tilde{L}(\beta^{(m)}, \mathbf{b}^{(m)})&= \sum_{i=1}^n w_i\exp(-\frac{1}{K}\beta^{(m)}\mathbf{y}_i^T\mathbf{b}^{(m)}(\mathbf{x}_i)) \\ &= \sum_{i\in T}w_i\exp[-\frac{\beta^{m}}{K-1}]+\sum_{i \notin T}w_i\exp[\frac{\beta^{(m)}}{(K-1)^2}] \\ &= \sum_{i\in T}w_i\exp[-\frac{\beta^{m}}{K-1}] +\sum_{i\notin T}w_i\exp[-\frac{\beta^{m}}{K-1}]-\sum_{i\notin T}w_i\exp[-\frac{\beta^{m}}{K-1}] +\sum_{i \notin T}w_i\exp[\frac{\beta^{(m)}}{(K-1)^2}] \\ &=\exp[-\frac{\beta^{(m)}}{K-1}]\sum_{i=1}^nw_i + \{ \exp[\frac{\beta^{(m)}}{(K-1)^2}]-\exp[-\frac{\beta^{(m)}}{K-1}] \}\sum_{i=1}^nw_i\mathbb{I}_{\{i\notin T\}} \end{aligned} L~(β(m),b(m))=i=1∑nwiexp(−K1β(m)yiTb(m)(xi))=i∈T∑wiexp[−K−1βm]+i∈/T∑wiexp[(K−1)2β(m)]=i∈T∑wiexp[−K−1βm]+i∈/T∑wiexp[−K−1βm]−i∈/T∑wiexp[−K−1βm]+i∈/T∑wiexp[(K−1)2β(m)]=exp[−K−1β(m)]i=1∑nwi+{exp[(K−1)2β(m)]−exp[−K−1β(m)]}i=1∑nwiI{i∈/T}
其中,第 m m m轮轮预测正确的样本索引集合为 T T T。 -
基模型 G G G输出类别的标签向量估计: b ∗ ( m ) = arg min b ( m ) ∑ i = 1 n w i I { i ∉ T } \mathbf{b}^{*(m)}=\mathop{\arg\min}_{\mathbf{b}^{(m)}}\sum_{i=1}^n w_i\mathbb{I}_{\{i\notin T\}} b∗(m)=argminb(m)i=1∑nwiI{i∈/T}
-
第 m m m轮模型加权系数估计: β ∗ ( m ) = ( K − 1 ) 2 K [ ln 1 − e r r ( m ) e r r ( m ) + ln ( K − 1 ) ] \beta^{*(m)}=\frac{(K-1)^2}{K}[\ln\frac{1-err^{(m)}}{err^{(m)}}+\ln(K-1)] β∗(m)=K(K−1)2[lnerr(m)1−err(m)+ln(K−1)]
其中,样本的加权错误率 e r r ( m ) = ∑ i = 1 n w i ∑ i = 1 n w i I { i ∉ T } err^{(m)}=\sum_{i=1}^n\frac{w_i}{\sum_{i=1}^nw_i}\mathbb{I}_{\{i\notin T\}} err(m)=∑i=1n∑i=1nwiwiI{i∈/T}。
算法步骤
- Step1:初始化训练数据的权值分布为均匀分布
D 1 = ( w 11 , ⋯ , w 1 i , ⋯ , w 1 N ) , w 1 i = 1 N , i = 1 , 2 , ⋯ , N D_{1}=\left(w_{11}, \cdots, w_{1 i}, \cdots, w_{1 N}\right), \quad w_{1 i}=\frac{1}{N}, \quad i=1,2, \cdots, N D1=(w11,⋯,w1i,⋯,w1N),w1i=N1,i=1,2,⋯,N - Step2: 迭代基本分类器
G
m
(
x
)
G_m(x)
Gm(x)的分类错误率
对于m=1,2,…,M- 使用具有权值分布
D
m
D_m
Dm的训练数据集进行学习,得到基本分类器
G m ( x ) = arg min G ∑ i = 1 n w i I { i ∉ T } G_{m}(x)=\mathop{\arg\min}_{G}\sum_{i=1}^n w_i\mathbb{I}_{\{i\notin T\}} Gm(x)=argminGi=1∑nwiI{i∈/T} - 计算 G m ( x ) G_m(x) Gm(x)样本的加权错误率 e r r ( m ) err^{(m)} err(m)
- 计算 G m ( x ) G_m(x) Gm(x)加权系数 β ∗ ( m ) \beta^{*(m)} β∗(m)
- 更新训练数据集的权重分布
D m + 1 = ( w m + 1 , 1 , ⋯ , w m + 1 , i , ⋯ , w m + 1 , N ) w m + 1 , i = w m , i exp ( − 1 K β ( m ) y i T b ( m ) ( x i ) ) \begin{array}{c} D_{m+1}=\left(w_{m+1,1}, \cdots, w_{m+1, i}, \cdots, w_{m+1, N}\right) \\ w_{m+1, i}=w_{m,i} \exp(-\frac{1}{K}\beta^{(m)}\mathbf{y}_i^T\mathbf{b}^{(m)}(\mathbf{x}_i)) \end{array} Dm+1=(wm+1,1,⋯,wm+1,i,⋯,wm+1,N)wm+1,i=wm,iexp(−K1β(m)yiTb(m)(xi)) - 计算模型输出 f ( m ) ( x i ) \mathbf{f}^{(m)}(\mathbf{x}_i) f(m)(xi)
- 使用具有权值分布
D
m
D_m
Dm的训练数据集进行学习,得到基本分类器
- Step3:计算最终预测结果
算法简化
这也是李航《统计学习方法》中采用的算法。
- 简化1:样本错误错误率 e r r ( m ) = ∑ i = 1 n w i I { i ∉ T } err^{(m)}=\sum_{i=1}^nw_i\mathbb{I}_{\{i\notin T\}} err(m)=∑i=1nwiI{i∈/T}
- 简化2:简化 β ∗ ( m ) \beta^{*(m)} β∗(m)为 α ∗ ( m ) = ln 1 − e r r ( m ) e r r ( m ) + ln ( K − 1 ) \alpha^{*(m)}=\ln\frac{1-err^{(m)}}{err^{(m)}}+\ln(K-1) α∗(m)=lnerr(m)1−err(m)+ln(K−1)
- 简化3:简化
w
i
w_i
wi迭代为
w
~
i
=
w
i
⋅
exp
(
α
∗
(
m
)
1
{
i
∉
T
}
)
\tilde{w}_i = w_i\cdot\exp(\alpha^{*(m)}\mathbb{1}_{\{i\notin T\}})
w~i=wi⋅exp(α∗(m)1{i∈/T}),然后再作归一化处理。
SAMME.R算法
与SAMME差异
SAMME.R(SAMME.Real),即模型每轮迭代输出为实数。
由于权重对于总体损失的惩罚方向是一致的,考虑以
w
w
w为权重的基模型
G
G
G,用其输出
P
w
(
s
(
y
)
=
k
∣
x
)
P_w(s(\mathbf{y})=k\vert \mathbf{x})
Pw(s(y)=k∣x)的概率值来代替
w
∣
S
(
y
)
=
k
⋅
P
(
S
(
y
)
=
k
∣
x
)
\left.w\right|_{S(\mathbf{y})=k}\cdot P(S(\mathbf{y})=k\vert \mathbf{x})
w∣S(y)=k⋅P(S(y)=k∣x),
G
G
G通过权重
w
w
w将原本作用于
L
L
L的损失近似地“分配”给了基分类器的损失。
内容 | SAMME | SAMME.R |
---|---|---|
每轮预测结果 | 分类标签 | 分类概率 |
优化参数 | β ∗ ( m ) \beta^{*(m)} β∗(m)、 b ∗ ( m ) b^{*(m)} b∗(m) | h ∗ ( m ) h^{*(m)} h∗(m) |
损失函数 | L ~ ( β ( m ) , b ( m ) ) = exp [ − β ( m ) K − 1 ] ∑ i = 1 n w i + { exp [ β ( m ) ( K − 1 ) 2 ] − exp [ − β ( m ) K − 1 ] } ∑ i = 1 n w i I { i ∉ T } \tilde{L}(\beta^{(m)}, \mathbf{b}^{(m)})=\exp[-\frac{\beta^{(m)}}{K-1}]\sum_{i=1}^nw_i + \{ \exp[\frac{\beta^{(m)}}{(K-1)^2}]-\exp[-\frac{\beta^{(m)}}{K-1}] \}\sum_{i=1}^nw_i\mathbb{I}_{\{i\notin T\}} L~(β(m),b(m))=exp[−K−1β(m)]∑i=1nwi+{exp[(K−1)2β(m)]−exp[−K−1β(m)]}∑i=1nwiI{i∈/T} | E [ L ∣ x ] = ∑ k = 1 K P w ( s ( y ) = k ∣ x ) exp ( − h k ( m ) ( x ) K − 1 ) \mathbb{E} [L\vert \mathbf{x}] = \sum_{k=1}^K P_w(s(\mathbf{y})=k\vert \mathbf{x})\exp(-\frac{h^{(m)}_k(\mathbf{x})}{K-1}) E[L∣x]=∑k=1KPw(s(y)=k∣x)exp(−K−1hk(m)(x)) |
算法步骤
代码
- python库:sklearn.ensemble.AdaBoostClassifier
- 代码
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
class AdaBoost:
def __init__(self, n_estimators, algorithm):
self.n_estimators = n_estimators
self.algorithm = algorithm
self.boostors = []
if self.algorithm == "SAMME":
self.boostor_weights = []
self.classes = None
def fit(self, X, y, **kwargs):
w = np.ones(X.shape[0]) / X.shape[0]
self.classes = np.unique(y.reshape(-1)).shape[0]
output = 0
for n in range(self.n_estimators):
cla = DecisionTreeClassifier(max_depth=1)
cla.fit(X, y, w)
if self.algorithm == "SAMME":
y_pred = cla.predict(X)
err = (w*(y != y_pred)).sum()
alpha = np.log((1-err)/err) + np.log(self.classes-1)
# 建立(式1)矩阵
temp_output = np.full(
(X.shape[0], self.classes), -1/(self.classes-1))
temp_output[np.arange(X.shape[0]), y_pred] = 1
self.boostors.append(cla)
self.boostor_weights.append(alpha)
w *= np.exp(alpha * (y != y_pred))
w /= w.sum()
output += temp_output * alpha
elif self.algorithm == "SAMME.R":
y_pred = cla.predict_proba(X)
log_proba = np.log(y_pred + 1e-6)
temp_output = (
self.classes-1)*(log_proba-log_proba.mean(1).reshape(-1,1))
temp_y = np.full(
(X.shape[0], self.classes), -1/(self.classes-1))
temp_y[np.arange(X.shape[0]), y] = 1
self.boostors.append(cla)
w *= np.exp(
(1-self.classes)/self.classes * (temp_y*log_proba).sum(1))
w /= w.sum()
output += temp_output
def predict(self, X):
result = 0
if self.algorithm == "SAMME":
for alpha, cla in zip(self.boostor_weights, self.boostors):
cur_pred = cla.predict(X)
temp_output = np.full(
(X.shape[0], self.classes), -1/(self.classes-1))
temp_output[np.arange(X.shape[0]), cur_pred] = 1
result += alpha * temp_output
elif self.algorithm == "SAMME.R":
for cla in self.boostors:
y_pred = cla.predict_proba(X)
log_proba = np.log(y_pred + 1e-6)
temp_output = (
self.classes-1)*(log_proba-log_proba.mean(1).reshape(-1,1))
result += temp_output
return np.argmax(result, axis=1)
def score(self, X_test, y_test):
p = self.predict(X_test)
return accuracy_score(y_test, p)
if __name__ == '__main__':
iris = load_iris()
X = iris.data
y = iris.target
y = y
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
adaboost = AdaBoost(n_estimators=100,algorithm = 'SAMME')
adaboost.fit(X_train, y_train)
adaboost.score(X_test, y_test)
clf = AdaBoostClassifier(n_estimators=100,algorithm = 'SAMME')
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
回归任务Adaboost.R2
算法步骤
- 训练过程
- 预测过程
设每个基模型对某一个新测试样本的预测输出为 y 1 , . . . , y M y_1,...,y_M y1,...,yM,基模型对应的预测器权重为 α ( 1 ) , . . . , α ( M ) \alpha^{(1)},...,\alpha^{(M)} α(1),...,α(M),则Adaboost.R2的输出值为加权中位数(即该值左右两边权重和为0.5)
y = inf { y ∣ ∑ m ∈ { m ∣ y m ≤ y } α ( m ) ≥ 0.5 ∑ m = 1 M α ( m ) } y=\inf \{ y\big| \sum_{m\in \{m\vert y_m\leq y\}}\alpha^{(m)} \geq 0.5 \sum_{m=1}^M\alpha^{(m)}\} y=inf{y∣∣m∈{m∣ym≤y}∑α(m)≥0.5m=1∑Mα(m)}
当权重和预测值出现的频率一致时,加权中位数就是中位数。
代码
- python库:sklearn.ensemble.AdaBoostRegressor
- 代码
import warnings
import numbers
import numpy as np
from sklearn.tree import DecisionTreeRegressor
def stable_cumsum(arr, axis=None, rtol=1e-05, atol=1e-08):
out = np.cumsum(arr, axis=axis, dtype=np.float64)
expected = np.sum(arr, axis=axis, dtype=np.float64)
if not np.all(
np.isclose(
out.take(-1, axis=axis), expected, rtol=rtol, atol=atol, equal_nan=True
)
):
warnings.warn(
"cumsum was found to be unstable: "
"its last element does not correspond to sum",
RuntimeWarning,
)
return out
def _num_samples(x):
"""Return number of samples in array-like x."""
message = "Expected sequence or array-like, got %s" % type(x)
if hasattr(x, "fit") and callable(x.fit):
# Don't get num_samples from an ensembles length!
raise TypeError(message)
if not hasattr(x, "__len__") and not hasattr(x, "shape"):
if hasattr(x, "__array__"):
x = np.asarray(x)
else:
raise TypeError(message)
if hasattr(x, "shape") and x.shape is not None:
if len(x.shape) == 0:
raise TypeError(
"Singleton array %r cannot be considered a valid collection." % x
)
# Check that shape is returning an integer or default to len
# Dask dataframes may not return numeric shape[0] value
if isinstance(x.shape[0], numbers.Integral):
return x.shape[0]
try:
return len(x)
except TypeError as type_error:
raise TypeError(message) from type_error
class AdaBoostR2:
def __init__(self, n_estimators):
self.n_estimators = n_estimators
self.boostors = []
self.weight = []
def fit(self, X, y, **kwargs):
w = np.ones(X.shape[0]) / X.shape[0]
for n in range(self.n_estimators):
cla = DecisionTreeRegressor(max_depth=3)
cla.fit(X, y)
y_pred = cla.predict(X)
e = np.abs(y_pred - y)
e /= e.max()
err = (w*e).sum()
beta = err/(1 - err)
alpha = np.log(1/beta + 1e-6) #1e-6使浮点数别太小
w *= np.power(beta,1-e)
w /= w.sum()
self.boostors.append(cla)
self.weight.append(alpha)
def _get_median_predict(self, X, limit):
# Evaluate predictions of all estimators
predictions = np.array([boostor.predict(X) for boostor in self.boostors]).T
sorted_idx = np.argsort(predictions, axis=1)
weight_cdf = stable_cumsum(self.weight[sorted_idx], axis=1)
median_or_above = weight_cdf >= 0.5 * weight_cdf[:, -1][:, np.newaxis]
median_idx = median_or_above.argmax(axis=1)
median_estimators = sorted_idx[np.arange(_num_samples(X)), median_idx]
# Return median predictions
return predictions[np.arange(_num_samples(X)), median_estimators]
def predict(self, X):
return self._get_median_predict(X)
[参考]:
- DataWhale集成学习
- 《统计学习方法》(李航)