Adaboost自适应提升算法

基本概念

Adaboost(Adaptive Boosting)为自适应提升算法。其基本思路为1. 提高那些被前一轮分类器错误分类的样本的权重,而降低那些被正确分类的样本的权重。2. 加大分类错误率低的弱分类器的权重

分类任务

基本概念

对于 K K K分类问题而言,当样本标签 y = [ y 1 , . . . , y K ] T \mathbf{y}=[y_1,...,y_K]^T y=[y1,...,yK]T的类别 c ( y ) c(\mathbf{y}) c(y) 为第 k k k ( k = 1 , . . . , K ) (k=1,...,K) (k=1,...,K)时, y i y_i yi满足
y i = { 1 , i f   c ( y ) = K − 1 K − 1 , i f   c ( y ) ≠ K ( 式 1 ) y_i=\left\{ \begin{aligned} &1,\quad &{\rm if}\ c(\mathbf{y})=K\\ &-\frac{1}{K-1},\quad &{\rm if}\ c(\mathbf{y})\neq K \end{aligned} \right. \quad\quad\quad\quad(式1) yi=1,K11,if c(y)=Kif c(y)=K1

∑ i = 1 K y i = 0 \sum_{i=1}^{K} y_i=0 i=1Kyi=0

  • 损失函数
    设模型的输出结果为 f = [ f 1 , . . . , f K ] T \mathbf{f}=[f_1,...,f_K]^T f=[f1,...,fK]T,则记损失函数为
    L ( y , f ) = exp ⁡ ( − y T f K ) L(\mathbf{y},\mathbf{f})=\exp(-\frac{\mathbf{y}^T\mathbf{f}}{K}) L(y,f)=exp(KyTf)
    由于对任意的常数向量 a = [ a , a , . . . , a ] T \boldsymbol{a}=[a,a,...,a]^T a=[a,a,...,a]T
    L ( y , f + a ) = exp ⁡ ( − y T f K − y T a K ) = exp ⁡ ( − y T f K − a ∑ i = 1 K y i ) = exp ⁡ ( − y T f K ) = L ( y , f ) \begin{aligned} L(\mathbf{y}, \mathbf{f}+\boldsymbol{a})&= \exp(-\frac{\mathbf{y}^T\mathbf{f}}{K}-\frac{\mathbf{y}^T\boldsymbol{a}}{K})\\ &= \exp(-\frac{\mathbf{y}^T\mathbf{f}}{K}-a\sum_{i=1}^{K} y_i)\\ &= \exp(-\frac{\mathbf{y}^T\mathbf{f}}{K})\\ &= L(\mathbf{y}, \mathbf{f}) \end{aligned} L(y,f+a)=exp(KyTfKyTa)=exp(KyTfai=1Kyi)=exp(KyTf)=L(y,f)
    例:假设有一个3分类问题,标签类别为第2类,即 y = [ − 0.5 , 1 , − 0.5 ] T y=[-0.5,1,-0.5]^T y=[0.5,1,0.5]T,模型输出的类别标签 f = [ − 0.1 , − 0.3 , 0.4 ] T \mathbf{f}=[-0.1,-0.3,0.4]^T f=[0.1,0.3,0.4]T,则模型指数损失 L = exp ⁡ ( − y T f K ) = exp ⁡ ( − ( − 0.5 ) ∗ ( − 0.1 ) + 1 ∗ ( − 0.3 ) + ( − 0.5 ) ∗ 0.4 3 ) ≈ 0.86 L=\exp(-\frac{\mathbf{y}^T\mathbf{f}}{K})=\exp(-\frac{(-0.5)*(-0.1)+1*(-0.3)+(-0.5)*0.4}{3})\approx 0.86 L=exp(KyTf)=exp(3(0.5)(0.1)+1(0.3)+(0.5)0.4)0.86
  • 指数损失函数的意义
    满足对称约束条件 f 1 + f 2 + . . . + f K = 0 f_1+f_2+...+f_K=0 f1+f2+...+fK=0的损失函数期望 E Y ∣ x L ( Y , f ) \mathbb{E}_{\mathbf{Y}\vert\mathbf{x}}L(\mathbf{Y},f) EYxL(Y,f)达到最小时,由拉格朗日乘子法可解得模型输出为 k ∗ = arg ⁡ max ⁡ k f k ∗ ( x ) = arg ⁡ max ⁡ k ( K − 1 ) [ ln ⁡ P ( c = k ∣ x ) − 1 K ∑ i = 1 K ln ⁡ P ( c = i ∣ x ) ] = arg ⁡ max ⁡ k P ( c = k ∣ x ) \begin{aligned} k^* &= \mathop{\arg\max}_kf_k^*(\mathbf{x})\\&= \mathop{\arg\max}_k (K-1)[\ln P(c=k\vert \mathbf{x})-\frac{1}{K}\sum_{i=1}^K\ln P(c=i\vert \mathbf{x})] \\&= \mathop{\arg\max}_k P(c=k\vert \mathbf{x}) \end{aligned} k=argmaxkfk(x)=argmaxk(K1)[lnP(c=kx)K1i=1KlnP(c=ix)]=argmaxkP(c=kx)即模型在期望损失达到最小时的输出结果是使得后验概率 P ( c ∣ x ) P(c\vert \mathbf{x}) P(cx)达到最大的类别。也就是说选择指数损失能够满足贝叶斯最优决策条件

SAMME算法

SAMME(Stepwise Additive Modeling using a Multiclass Exponential loss function)算法。

基本定义

  • 模型总输出 f ( M ) ( x ) = ∑ m = 1 M β ( m ) b ( m ) ( x ) \mathbf{f}^{(M)}(\mathbf{x})=\sum_{m=1}^M \beta^{(m)} \mathbf{b}^{(m)}(\mathbf{x}) f(M)(x)=m=1Mβ(m)b(m)(x)
    其中, M M M是模型的总迭代轮数,
    β ( m ) ∈ R + \beta^{(m)}\in \mathbb{R^+} β(m)R+是每轮模型的加权系数,
    b ( m ) ( x ) ∈ R K \mathbf{b}^{(m)}(\mathbf{x}) \in\mathbb{R}^K b(m)(x)RK是基模型 G G G输出类别的标签向量,计算参考(式1)。

  • m m m轮模型输出 f ( m ) ( x i ) = f ( m − 1 ) ( x i ) + β ∗ ( m ) b ∗ ( m ) ( x i ) \mathbf{f}^{(m)}(\mathbf{x}_i)=\mathbf{f}^{(m-1)}(\mathbf{x}_i)+\beta^{*(m)}\mathbf{b}^{*(m)}(\mathbf{x}_i) f(m)(xi)=f(m1)(xi)+β(m)b(m)(xi)

  • 样本 x i \mathbf{x}_i xi在第 m m m轮的预测类别 k i ∗ = arg ⁡ max ⁡ k f ( m ) ( x i ) k_i^*=\mathop{\arg\max}_{k} \mathbf{f}^{(m)}(\mathbf{x}_i) ki=argmaxkf(m)(xi)

  • m m m轮优化目标 ( β ∗ ( m ) , b ∗ ( m ) ) = arg ⁡ min ⁡ β ( m ) , b ( m ) ∑ i = 1 n L ( y i , f ( m − 1 ) ( x i ) + β ( m ) b ( m ) ( x i ) ) = arg ⁡ min ⁡ β ( m ) , b ( m ) ∑ i = 1 n w i exp ⁡ ( − 1 K β ( m ) y i T b ( m ) ( x i ) ) \begin{aligned} (\beta^{*(m)}, \mathbf{b}^{*(m)})&= \mathop{\arg\min}_{\beta^{(m)}, \mathbf{b}^{(m)}}\sum_{i=1}^n L(\mathbf{y}_i, \mathbf{f}^{(m-1)}(\mathbf{x}_i)+\beta^{(m)}\mathbf{b}^{(m)}(\mathbf{x}_i))\\&= \mathop{\arg\min}_{\beta^{(m)}, \mathbf{b}^{(m)}}\sum_{i=1}^n w_i\exp(-\frac{1}{K}\beta^{(m)}\mathbf{y}_i^T\mathbf{b}^{(m)}(\mathbf{x}_i)) \end{aligned} (β(m),b(m))=argminβ(m),b(m)i=1nL(yi,f(m1)(xi)+β(m)b(m)(xi))=argminβ(m),b(m)i=1nwiexp(K1β(m)yiTb(m)(xi))

  • m m m轮样本权重 w i = exp ⁡ ( − 1 K y i T f ( m − 1 ) ( x i ) ) w_i=\exp(-\frac{1}{K}\mathbf{y}_i^T\mathbf{f}^{(m-1)}(\mathbf{x}_i)) wi=exp(K1yiTf(m1)(xi))

  • m m m轮损失函数
    L ~ ( β ( m ) , b ( m ) ) = ∑ i = 1 n w i exp ⁡ ( − 1 K β ( m ) y i T b ( m ) ( x i ) ) = ∑ i ∈ T w i exp ⁡ [ − β m K − 1 ] + ∑ i ∉ T w i exp ⁡ [ β ( m ) ( K − 1 ) 2 ] = ∑ i ∈ T w i exp ⁡ [ − β m K − 1 ] + ∑ i ∉ T w i exp ⁡ [ − β m K − 1 ] − ∑ i ∉ T w i exp ⁡ [ − β m K − 1 ] + ∑ i ∉ T w i exp ⁡ [ β ( m ) ( K − 1 ) 2 ] = exp ⁡ [ − β ( m ) K − 1 ] ∑ i = 1 n w i + { exp ⁡ [ β ( m ) ( K − 1 ) 2 ] − exp ⁡ [ − β ( m ) K − 1 ] } ∑ i = 1 n w i I { i ∉ T } \begin{aligned} \tilde{L}(\beta^{(m)}, \mathbf{b}^{(m)})&= \sum_{i=1}^n w_i\exp(-\frac{1}{K}\beta^{(m)}\mathbf{y}_i^T\mathbf{b}^{(m)}(\mathbf{x}_i)) \\ &= \sum_{i\in T}w_i\exp[-\frac{\beta^{m}}{K-1}]+\sum_{i \notin T}w_i\exp[\frac{\beta^{(m)}}{(K-1)^2}] \\ &= \sum_{i\in T}w_i\exp[-\frac{\beta^{m}}{K-1}] +\sum_{i\notin T}w_i\exp[-\frac{\beta^{m}}{K-1}]-\sum_{i\notin T}w_i\exp[-\frac{\beta^{m}}{K-1}] +\sum_{i \notin T}w_i\exp[\frac{\beta^{(m)}}{(K-1)^2}] \\ &=\exp[-\frac{\beta^{(m)}}{K-1}]\sum_{i=1}^nw_i + \{ \exp[\frac{\beta^{(m)}}{(K-1)^2}]-\exp[-\frac{\beta^{(m)}}{K-1}] \}\sum_{i=1}^nw_i\mathbb{I}_{\{i\notin T\}} \end{aligned} L~(β(m),b(m))=i=1nwiexp(K1β(m)yiTb(m)(xi))=iTwiexp[K1βm]+i/Twiexp[(K1)2β(m)]=iTwiexp[K1βm]+i/Twiexp[K1βm]i/Twiexp[K1βm]+i/Twiexp[(K1)2β(m)]=exp[K1β(m)]i=1nwi+{exp[(K1)2β(m)]exp[K1β(m)]}i=1nwiI{i/T}
    其中,第 m m m轮轮预测正确的样本索引集合为 T T T

  • 基模型 G G G输出类别的标签向量估计 b ∗ ( m ) = arg ⁡ min ⁡ b ( m ) ∑ i = 1 n w i I { i ∉ T } \mathbf{b}^{*(m)}=\mathop{\arg\min}_{\mathbf{b}^{(m)}}\sum_{i=1}^n w_i\mathbb{I}_{\{i\notin T\}} b(m)=argminb(m)i=1nwiI{i/T}

  • m m m轮模型加权系数估计 β ∗ ( m ) = ( K − 1 ) 2 K [ ln ⁡ 1 − e r r ( m ) e r r ( m ) + ln ⁡ ( K − 1 ) ] \beta^{*(m)}=\frac{(K-1)^2}{K}[\ln\frac{1-err^{(m)}}{err^{(m)}}+\ln(K-1)] β(m)=K(K1)2[lnerr(m)1err(m)+ln(K1)]
    其中,样本的加权错误率 e r r ( m ) = ∑ i = 1 n w i ∑ i = 1 n w i I { i ∉ T } err^{(m)}=\sum_{i=1}^n\frac{w_i}{\sum_{i=1}^nw_i}\mathbb{I}_{\{i\notin T\}} err(m)=i=1ni=1nwiwiI{i/T}

算法步骤

  • Step1:初始化训练数据的权值分布为均匀分布
    D 1 = ( w 11 , ⋯   , w 1 i , ⋯   , w 1 N ) , w 1 i = 1 N , i = 1 , 2 , ⋯   , N D_{1}=\left(w_{11}, \cdots, w_{1 i}, \cdots, w_{1 N}\right), \quad w_{1 i}=\frac{1}{N}, \quad i=1,2, \cdots, N D1=(w11,,w1i,,w1N),w1i=N1,i=1,2,,N
  • Step2: 迭代基本分类器 G m ( x ) G_m(x) Gm(x)的分类错误率
    对于m=1,2,…,M
    • 使用具有权值分布 D m D_m Dm的训练数据集进行学习,得到基本分类器
      G m ( x ) = arg ⁡ min ⁡ G ∑ i = 1 n w i I { i ∉ T } G_{m}(x)=\mathop{\arg\min}_{G}\sum_{i=1}^n w_i\mathbb{I}_{\{i\notin T\}} Gm(x)=argminGi=1nwiI{i/T}
    • 计算 G m ( x ) G_m(x) Gm(x)样本的加权错误率 e r r ( m ) err^{(m)} err(m)
    • 计算 G m ( x ) G_m(x) Gm(x)加权系数 β ∗ ( m ) \beta^{*(m)} β(m)
    • 更新训练数据集的权重分布
      D m + 1 = ( w m + 1 , 1 , ⋯   , w m + 1 , i , ⋯   , w m + 1 , N ) w m + 1 , i = w m , i exp ⁡ ( − 1 K β ( m ) y i T b ( m ) ( x i ) ) \begin{array}{c} D_{m+1}=\left(w_{m+1,1}, \cdots, w_{m+1, i}, \cdots, w_{m+1, N}\right) \\ w_{m+1, i}=w_{m,i} \exp(-\frac{1}{K}\beta^{(m)}\mathbf{y}_i^T\mathbf{b}^{(m)}(\mathbf{x}_i)) \end{array} Dm+1=(wm+1,1,,wm+1,i,,wm+1,N)wm+1,i=wm,iexp(K1β(m)yiTb(m)(xi))
    • 计算模型输出 f ( m ) ( x i ) \mathbf{f}^{(m)}(\mathbf{x}_i) f(m)(xi)
  • Step3:计算最终预测结果
    在这里插入图片描述

算法简化

这也是李航《统计学习方法》中采用的算法。

  • 简化1:样本错误错误率 e r r ( m ) = ∑ i = 1 n w i I { i ∉ T } err^{(m)}=\sum_{i=1}^nw_i\mathbb{I}_{\{i\notin T\}} err(m)=i=1nwiI{i/T}
  • 简化2:简化 β ∗ ( m ) \beta^{*(m)} β(m) α ∗ ( m ) = ln ⁡ 1 − e r r ( m ) e r r ( m ) + ln ⁡ ( K − 1 ) \alpha^{*(m)}=\ln\frac{1-err^{(m)}}{err^{(m)}}+\ln(K-1) α(m)=lnerr(m)1err(m)+ln(K1)
  • 简化3:简化 w i w_i wi迭代为 w ~ i = w i ⋅ exp ⁡ ( α ∗ ( m ) 1 { i ∉ T } ) \tilde{w}_i = w_i\cdot\exp(\alpha^{*(m)}\mathbb{1}_{\{i\notin T\}}) w~i=wiexp(α(m)1{i/T}),然后再作归一化处理。
    在这里插入图片描述

SAMME.R算法

与SAMME差异

SAMME.R(SAMME.Real),即模型每轮迭代输出为实数。
由于权重对于总体损失的惩罚方向是一致的,考虑以 w w w为权重的基模型 G G G,用其输出 P w ( s ( y ) = k ∣ x ) P_w(s(\mathbf{y})=k\vert \mathbf{x}) Pw(s(y)=kx)的概率值来代替 w ∣ S ( y ) = k ⋅ P ( S ( y ) = k ∣ x ) \left.w\right|_{S(\mathbf{y})=k}\cdot P(S(\mathbf{y})=k\vert \mathbf{x}) wS(y)=kP(S(y)=kx) G G G通过权重 w w w将原本作用于 L L L的损失近似地“分配”给了基分类器的损失。

内容SAMMESAMME.R
每轮预测结果分类标签分类概率
优化参数 β ∗ ( m ) \beta^{*(m)} β(m) b ∗ ( m ) b^{*(m)} b(m) h ∗ ( m ) h^{*(m)} h(m)
损失函数 L ~ ( β ( m ) , b ( m ) ) = exp ⁡ [ − β ( m ) K − 1 ] ∑ i = 1 n w i + { exp ⁡ [ β ( m ) ( K − 1 ) 2 ] − exp ⁡ [ − β ( m ) K − 1 ] } ∑ i = 1 n w i I { i ∉ T } \tilde{L}(\beta^{(m)}, \mathbf{b}^{(m)})=\exp[-\frac{\beta^{(m)}}{K-1}]\sum_{i=1}^nw_i + \{ \exp[\frac{\beta^{(m)}}{(K-1)^2}]-\exp[-\frac{\beta^{(m)}}{K-1}] \}\sum_{i=1}^nw_i\mathbb{I}_{\{i\notin T\}} L~(β(m),b(m))=exp[K1β(m)]i=1nwi+{exp[(K1)2β(m)]exp[K1β(m)]}i=1nwiI{i/T} E [ L ∣ x ] = ∑ k = 1 K P w ( s ( y ) = k ∣ x ) exp ⁡ ( − h k ( m ) ( x ) K − 1 ) \mathbb{E} [L\vert \mathbf{x}] = \sum_{k=1}^K P_w(s(\mathbf{y})=k\vert \mathbf{x})\exp(-\frac{h^{(m)}_k(\mathbf{x})}{K-1}) E[Lx]=k=1KPw(s(y)=kx)exp(K1hk(m)(x))

算法步骤

在这里插入图片描述

代码

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection  import train_test_split
import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score

class AdaBoost:
   def __init__(self,  n_estimators, algorithm):
       self.n_estimators = n_estimators
       self.algorithm = algorithm
       self.boostors = []
       if self.algorithm == "SAMME":
           self.boostor_weights = []
       self.classes = None
   
   def fit(self, X, y, **kwargs):
       w = np.ones(X.shape[0]) / X.shape[0]
       self.classes = np.unique(y.reshape(-1)).shape[0]
       output = 0
       
       for n in range(self.n_estimators):
           cla = DecisionTreeClassifier(max_depth=1)  
           cla.fit(X, y, w) 
           if self.algorithm == "SAMME":
               y_pred = cla.predict(X)
               err = (w*(y != y_pred)).sum()
               alpha = np.log((1-err)/err) + np.log(self.classes-1)
               # 建立(式1)矩阵
               temp_output = np.full(
                   (X.shape[0], self.classes), -1/(self.classes-1))
               temp_output[np.arange(X.shape[0]), y_pred] = 1
               
               self.boostors.append(cla)
               self.boostor_weights.append(alpha)
               w *= np.exp(alpha * (y != y_pred))
               w /= w.sum()
               output += temp_output * alpha
           elif self.algorithm == "SAMME.R":
               y_pred = cla.predict_proba(X)
               log_proba = np.log(y_pred + 1e-6)
               temp_output = (
                   self.classes-1)*(log_proba-log_proba.mean(1).reshape(-1,1))
               temp_y = np.full(
                   (X.shape[0], self.classes), -1/(self.classes-1))
               temp_y[np.arange(X.shape[0]), y] = 1
               self.boostors.append(cla)
               w *= np.exp(
                   (1-self.classes)/self.classes * (temp_y*log_proba).sum(1))
               w /= w.sum()
               output += temp_output            
   
   def predict(self, X):
       result = 0
       if self.algorithm == "SAMME":
           for alpha, cla in zip(self.boostor_weights, self.boostors):
               cur_pred = cla.predict(X)
               temp_output = np.full(
                   (X.shape[0], self.classes), -1/(self.classes-1))
               temp_output[np.arange(X.shape[0]), cur_pred] = 1
               result += alpha * temp_output
       
       elif self.algorithm == "SAMME.R":
           for cla in self.boostors:
               y_pred = cla.predict_proba(X)
               log_proba = np.log(y_pred + 1e-6)
               temp_output = (
                   self.classes-1)*(log_proba-log_proba.mean(1).reshape(-1,1))
               result += temp_output
       return np.argmax(result, axis=1)
   
   def score(self, X_test, y_test):
       p = self.predict(X_test)
       return accuracy_score(y_test, p)
   
   
if __name__ == '__main__':
   iris = load_iris()
   X = iris.data
   y = iris.target
   y = y
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)    
   adaboost = AdaBoost(n_estimators=100,algorithm = 'SAMME')
   adaboost.fit(X_train, y_train)
   adaboost.score(X_test, y_test)
   
   clf = AdaBoostClassifier(n_estimators=100,algorithm = 'SAMME')
   clf.fit(X_train, y_train)
   clf.score(X_test, y_test)
   

回归任务Adaboost.R2

算法步骤

  • 训练过程
    在这里插入图片描述
  • 预测过程
    设每个基模型对某一个新测试样本的预测输出为 y 1 , . . . , y M y_1,...,y_M y1,...,yM,基模型对应的预测器权重为 α ( 1 ) , . . . , α ( M ) \alpha^{(1)},...,\alpha^{(M)} α(1),...,α(M),则Adaboost.R2的输出值为加权中位数(即该值左右两边权重和为0.5)
    y = inf ⁡ { y ∣ ∑ m ∈ { m ∣ y m ≤ y } α ( m ) ≥ 0.5 ∑ m = 1 M α ( m ) } y=\inf \{ y\big| \sum_{m\in \{m\vert y_m\leq y\}}\alpha^{(m)} \geq 0.5 \sum_{m=1}^M\alpha^{(m)}\} y=inf{ym{mymy}α(m)0.5m=1Mα(m)}
    当权重和预测值出现的频率一致时,加权中位数就是中位数。

代码

import warnings
import numbers
import numpy as np
from sklearn.tree import DecisionTreeRegressor

def stable_cumsum(arr, axis=None, rtol=1e-05, atol=1e-08):
   out = np.cumsum(arr, axis=axis, dtype=np.float64)
   expected = np.sum(arr, axis=axis, dtype=np.float64)
   if not np.all(
       np.isclose(
           out.take(-1, axis=axis), expected, rtol=rtol, atol=atol, equal_nan=True
       )
   ):
       warnings.warn(
           "cumsum was found to be unstable: "
           "its last element does not correspond to sum",
           RuntimeWarning,
       )
   return out    

def _num_samples(x):
   """Return number of samples in array-like x."""
   message = "Expected sequence or array-like, got %s" % type(x)
   if hasattr(x, "fit") and callable(x.fit):
       # Don't get num_samples from an ensembles length!
       raise TypeError(message)

   if not hasattr(x, "__len__") and not hasattr(x, "shape"):
       if hasattr(x, "__array__"):
           x = np.asarray(x)
       else:
           raise TypeError(message)

   if hasattr(x, "shape") and x.shape is not None:
       if len(x.shape) == 0:
           raise TypeError(
               "Singleton array %r cannot be considered a valid collection." % x
           )
       # Check that shape is returning an integer or default to len
       # Dask dataframes may not return numeric shape[0] value
       if isinstance(x.shape[0], numbers.Integral):
           return x.shape[0]
   try:
       return len(x)
   except TypeError as type_error:
       raise TypeError(message) from type_error
   
class AdaBoostR2:
   def __init__(self,  n_estimators):
       self.n_estimators = n_estimators
       self.boostors = []
       self.weight = []
   
   def fit(self, X, y, **kwargs):
       w = np.ones(X.shape[0]) / X.shape[0]
       
       for n in range(self.n_estimators):
           cla = DecisionTreeRegressor(max_depth=3)  
           cla.fit(X, y) 
           y_pred = cla.predict(X)
           e = np.abs(y_pred - y)
           e /= e.max() 
           err = (w*e).sum()
           beta = err/(1 - err)
           alpha = np.log(1/beta + 1e-6) #1e-6使浮点数别太小
           w *= np.power(beta,1-e)
           w /= w.sum()
           self.boostors.append(cla)
           self.weight.append(alpha)
           
   def _get_median_predict(self, X, limit):
       # Evaluate predictions of all estimators
       predictions = np.array([boostor.predict(X) for boostor in self.boostors]).T

       sorted_idx = np.argsort(predictions, axis=1)
       weight_cdf = stable_cumsum(self.weight[sorted_idx], axis=1)
       median_or_above = weight_cdf >= 0.5 * weight_cdf[:, -1][:, np.newaxis]
       median_idx = median_or_above.argmax(axis=1)
       median_estimators = sorted_idx[np.arange(_num_samples(X)), median_idx]

       # Return median predictions
       return predictions[np.arange(_num_samples(X)), median_estimators]           
   
   def predict(self, X):
       return self._get_median_predict(X) 

[参考]:

  1. DataWhale集成学习
  2. 《统计学习方法》(李航)
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值