Adaboost自适应提升算法

最新推荐文章于 2024-04-03 15:36:49 发布

Gu_NN

最新推荐文章于 2024-04-03 15:36:49 发布

阅读量580

点赞数

分类专栏：集成学习文章标签：算法

本文链接：https://blog.csdn.net/Gu_NN/article/details/120987327

版权

集成学习专栏收录该内容

10 篇文章 0 订阅

订阅专栏

基本概念

Adaboost（Adaptive Boosting）为自适应提升算法。其基本思路为1. 提高那些被前一轮分类器错误分类的样本的权重，而降低那些被正确分类的样本的权重。2. 加大分类错误率低的弱分类器的权重

分类任务

基本概念

对于 $K$ 分类问题而言，当样本标签 $\mathbf{y}=[y_1,...,y_K]^T$ 的类别 $c(\mathbf{y})$ 为第 $k$ 类 $(k = 1, . . ., K)$ 时， $y_i$ 满足
$y_i=\left\{ \begin{aligned} &1,\quad &{\rm if}\ c(\mathbf{y})=K\\ &-\frac{1}{K-1},\quad &{\rm if}\ c(\mathbf{y})\neq K \end{aligned} \right. \quad\quad\quad\quad（式1）$
则
$\sum_{i=1}^{K} y_i=0$

损失函数
设模型的输出结果为 $\mathbf{f}=[f_1,...,f_K]^T$ ，则记损失函数为
$L(\mathbf{y},\mathbf{f})=\exp(-\frac{\mathbf{y}^T\mathbf{f}}{K})$
由于对任意的常数向量 $\boldsymbol{a}=[a,a,...,a]^T$ 有
$\begin{aligned} L(\mathbf{y}, \mathbf{f}+\boldsymbol{a})&= \exp(-\frac{\mathbf{y}^T\mathbf{f}}{K}-\frac{\mathbf{y}^T\boldsymbol{a}}{K})\\ &= \exp(-\frac{\mathbf{y}^T\mathbf{f}}{K}-a\sum_{i=1}^{K} y_i)\\ &= \exp(-\frac{\mathbf{y}^T\mathbf{f}}{K})\\ &= L(\mathbf{y}, \mathbf{f}) \end{aligned}$
例：假设有一个3分类问题，标签类别为第2类，即 $y=[-0.5,1,-0.5]^T$ ，模型输出的类别标签 $\mathbf{f}=[-0.1,-0.3,0.4]^T$ ，则模型指数损失 $L=\exp(-\frac{\mathbf{y}^T\mathbf{f}}{K})=\exp(-\frac{(-0.5)*(-0.1)+1*(-0.3)+(-0.5)*0.4}{3})\approx 0.86$
指数损失函数的意义
满足对称约束条件 $f_1+f_2+...+f_K=0$ 的损失函数期望 $\mathbb{E}_{\mathbf{Y}\vert\mathbf{x}}L(\mathbf{Y},f)$ 达到最小时，由拉格朗日乘子法可解得模型输出为 $\begin{aligned} k^* &= \mathop{\arg\max}_kf_k^*(\mathbf{x})\\&= \mathop{\arg\max}_k (K-1)[\ln P(c=k\vert \mathbf{x})-\frac{1}{K}\sum_{i=1}^K\ln P(c=i\vert \mathbf{x})] \\&= \mathop{\arg\max}_k P(c=k\vert \mathbf{x}) \end{aligned}$ 即模型在期望损失达到最小时的输出结果是使得后验概率 $P(c\vert \mathbf{x})$ 达到最大的类别。也就是说选择指数损失能够满足贝叶斯最优决策条件。

SAMME算法

SAMME（Stepwise Additive Modeling using a Multiclass Exponential loss function）算法。

基本定义

模型总输出： $\mathbf{f}^{(M)}(\mathbf{x})=\sum_{m=1}^M \beta^{(m)} \mathbf{b}^{(m)}(\mathbf{x})$
其中， $M$ 是模型的总迭代轮数，
$\beta^{(m)}\in \mathbb{R^+}$ 是每轮模型的加权系数，
$\mathbf{b}^{(m)}(\mathbf{x}) \in\mathbb{R}^K$ 是基模型 $G$ 输出类别的标签向量，计算参考（式1）。
第 $m$ 轮模型输出： $\mathbf{f}^{(m)}(\mathbf{x}_i)=\mathbf{f}^{(m-1)}(\mathbf{x}_i)+\beta^{*(m)}\mathbf{b}^{*(m)}(\mathbf{x}_i)$
样本 $\mathbf{x}_i$ 在第 $m$ 轮的预测类别： $k_i^*=\mathop{\arg\max}_{k} \mathbf{f}^{(m)}(\mathbf{x}_i)$
第 $m$ 轮优化目标： $\begin{aligned} (\beta^{*(m)}, \mathbf{b}^{*(m)})&= \mathop{\arg\min}_{\beta^{(m)}, \mathbf{b}^{(m)}}\sum_{i=1}^n L(\mathbf{y}_i, \mathbf{f}^{(m-1)}(\mathbf{x}_i)+\beta^{(m)}\mathbf{b}^{(m)}(\mathbf{x}_i))\\&= \mathop{\arg\min}_{\beta^{(m)}, \mathbf{b}^{(m)}}\sum_{i=1}^n w_i\exp(-\frac{1}{K}\beta^{(m)}\mathbf{y}_i^T\mathbf{b}^{(m)}(\mathbf{x}_i)) \end{aligned}$
第 $m$ 轮样本权重： $w_i=\exp(-\frac{1}{K}\mathbf{y}_i^T\mathbf{f}^{(m-1)}(\mathbf{x}_i))$
第 $m$ 轮损失函数：
$\begin{aligned} \tilde{L}(\beta^{(m)}, \mathbf{b}^{(m)})&= \sum_{i=1}^n w_i\exp(-\frac{1}{K}\beta^{(m)}\mathbf{y}_i^T\mathbf{b}^{(m)}(\mathbf{x}_i)) \\ &= \sum_{i\in T}w_i\exp[-\frac{\beta^{m}}{K-1}]+\sum_{i \notin T}w_i\exp[\frac{\beta^{(m)}}{(K-1)^2}] \\ &= \sum_{i\in T}w_i\exp[-\frac{\beta^{m}}{K-1}] +\sum_{i\notin T}w_i\exp[-\frac{\beta^{m}}{K-1}]-\sum_{i\notin T}w_i\exp[-\frac{\beta^{m}}{K-1}] +\sum_{i \notin T}w_i\exp[\frac{\beta^{(m)}}{(K-1)^2}] \\ &=\exp[-\frac{\beta^{(m)}}{K-1}]\sum_{i=1}^nw_i + \{ \exp[\frac{\beta^{(m)}}{(K-1)^2}]-\exp[-\frac{\beta^{(m)}}{K-1}] \}\sum_{i=1}^nw_i\mathbb{I}_{\{i\notin T\}} \end{aligned}$
其中，第 $m$ 轮轮预测正确的样本索引集合为 $T$ 。
基模型 $G$ 输出类别的标签向量估计： $\mathbf{b}^{*(m)}=\mathop{\arg\min}_{\mathbf{b}^{(m)}}\sum_{i=1}^n w_i\mathbb{I}_{\{i\notin T\}}$
第 $m$ 轮模型加权系数估计： $\beta^{*(m)}=\frac{(K-1)^2}{K}[\ln\frac{1-err^{(m)}}{err^{(m)}}+\ln(K-1)]$
其中，样本的加权错误率 $err^{(m)}=\sum_{i=1}^n\frac{w_i}{\sum_{i=1}^nw_i}\mathbb{I}_{\{i\notin T\}}$ 。

算法步骤

Step1：初始化训练数据的权值分布为均匀分布
$D_{1}=\left(w_{11}, \cdots, w_{1 i}, \cdots, w_{1 N}\right), \quad w_{1 i}=\frac{1}{N}, \quad i=1,2, \cdots, N$
Step2：迭代基本分类器 $G_m(x)$ 的分类错误率
对于m=1,2,…,M
- 使用具有权值分布 $D_m$ 的训练数据集进行学习，得到基本分类器
  $G_{m}(x)=\mathop{\arg\min}_{G}\sum_{i=1}^n w_i\mathbb{I}_{\{i\notin T\}}$
- 计算 $G_m(x)$ 样本的加权错误率 $err^{(m)}$
- 计算 $G_m(x)$ 加权系数 $\beta^{*(m)}$
- 更新训练数据集的权重分布
  $\begin{array}{c} D_{m+1}=\left(w_{m+1,1}, \cdots, w_{m+1, i}, \cdots, w_{m+1, N}\right) \\ w_{m+1, i}=w_{m,i} \exp(-\frac{1}{K}\beta^{(m)}\mathbf{y}_i^T\mathbf{b}^{(m)}(\mathbf{x}_i)) \end{array}$
- 计算模型输出 $\mathbf{f}^{(m)}(\mathbf{x}_i)$
Step3：计算最终预测结果

算法简化

这也是李航《统计学习方法》中采用的算法。

简化1：样本错误错误率 $err^{(m)}=\sum_{i=1}^nw_i\mathbb{I}_{\{i\notin T\}}$
简化2：简化 $\beta^{*(m)}$ 为 $\alpha^{*(m)}=\ln\frac{1-err^{(m)}}{err^{(m)}}+\ln(K-1)$
简化3：简化 $w_i$ 迭代为 $\tilde{w}_i = w_i\cdot\exp(\alpha^{*(m)}\mathbb{1}_{\{i\notin T\}})$ ，然后再作归一化处理。

SAMME.R算法

与SAMME差异

SAMME.R（SAMME.Real），即模型每轮迭代输出为实数。
由于权重对于总体损失的惩罚方向是一致的，考虑以 $w$ 为权重的基模型 $G$ ，用其输出 $P_w(s(\mathbf{y})=k\vert \mathbf{x})$ 的概率值来代替 $\left.w\right|_{S(\mathbf{y})=k}\cdot P(S(\mathbf{y})=k\vert \mathbf{x})$ ， $G$ 通过权重 $w$ 将原本作用于 $L$ 的损失近似地“分配”给了基分类器的损失。

内容	SAMME	SAMME.R
每轮预测结果	分类标签	分类概率
优化参数	$\beta^{(m)}$ 、 $b^{(m)}$	$h^{*(m)}$
损失函数	$\tilde{L}(\beta^{(m)}, \mathbf{b}^{(m)})=\exp[-\frac{\beta^{(m)}}{K-1}]\sum_{i=1}^nw_i + \{ \exp[\frac{\beta^{(m)}}{(K-1)^2}]-\exp[-\frac{\beta^{(m)}}{K-1}] \}\sum_{i=1}^nw_i\mathbb{I}_{\{i\notin T\}}$	$\mathbb{E} [L\vert \mathbf{x}] = \sum_{k=1}^K P_w(s(\mathbf{y})=k\vert \mathbf{x})\exp(-\frac{h^{(m)}_k(\mathbf{x})}{K-1})$

算法步骤

在这里插入图片描述

代码

python库：sklearn.ensemble.AdaBoostClassifier
代码

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection  import train_test_split
import matplotlib.pyplot as plt
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score

class AdaBoost:
   def __init__(self,  n_estimators, algorithm):
       self.n_estimators = n_estimators
       self.algorithm = algorithm
       self.boostors = []
       if self.algorithm == "SAMME":
           self.boostor_weights = []
       self.classes = None
   
   def fit(self, X, y, **kwargs):
       w = np.ones(X.shape[0]) / X.shape[0]
       self.classes = np.unique(y.reshape(-1)).shape[0]
       output = 0
       
       for n in range(self.n_estimators):
           cla = DecisionTreeClassifier(max_depth=1)  
           cla.fit(X, y, w) 
           if self.algorithm == "SAMME":
               y_pred = cla.predict(X)
               err = (w*(y != y_pred)).sum()
               alpha = np.log((1-err)/err) + np.log(self.classes-1)
               # 建立（式1）矩阵
               temp_output = np.full(
                   (X.shape[0], self.classes), -1/(self.classes-1))
               temp_output[np.arange(X.shape[0]), y_pred] = 1
               
               self.boostors.append(cla)
               self.boostor_weights.append(alpha)
               w *= np.exp(alpha * (y != y_pred))
               w /= w.sum()
               output += temp_output * alpha
           elif self.algorithm == "SAMME.R":
               y_pred = cla.predict_proba(X)
               log_proba = np.log(y_pred + 1e-6)
               temp_output = (
                   self.classes-1)*(log_proba-log_proba.mean(1).reshape(-1,1))
               temp_y = np.full(
                   (X.shape[0], self.classes), -1/(self.classes-1))
               temp_y[np.arange(X.shape[0]), y] = 1
               self.boostors.append(cla)
               w *= np.exp(
                   (1-self.classes)/self.classes * (temp_y*log_proba).sum(1))
               w /= w.sum()
               output += temp_output            
   
   def predict(self, X):
       result = 0
       if self.algorithm == "SAMME":
           for alpha, cla in zip(self.boostor_weights, self.boostors):
               cur_pred = cla.predict(X)
               temp_output = np.full(
                   (X.shape[0], self.classes), -1/(self.classes-1))
               temp_output[np.arange(X.shape[0]), cur_pred] = 1
               result += alpha * temp_output
       
       elif self.algorithm == "SAMME.R":
           for cla in self.boostors:
               y_pred = cla.predict_proba(X)
               log_proba = np.log(y_pred + 1e-6)
               temp_output = (
                   self.classes-1)*(log_proba-log_proba.mean(1).reshape(-1,1))
               result += temp_output
       return np.argmax(result, axis=1)
   
   def score(self, X_test, y_test):
       p = self.predict(X_test)
       return accuracy_score(y_test, p)
   
   
if __name__ == '__main__':
   iris = load_iris()
   X = iris.data
   y = iris.target
   y = y
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)    
   adaboost = AdaBoost(n_estimators=100,algorithm = 'SAMME')
   adaboost.fit(X_train, y_train)
   adaboost.score(X_test, y_test)
   
   clf = AdaBoostClassifier(n_estimators=100,algorithm = 'SAMME')
   clf.fit(X_train, y_train)
   clf.score(X_test, y_test)

回归任务Adaboost.R2

算法步骤

训练过程
预测过程
设每个基模型对某一个新测试样本的预测输出为 $y_1,...,y_M$ ，基模型对应的预测器权重为 $\alpha^{(1)},...,\alpha^{(M)}$ ，则Adaboost.R2的输出值为加权中位数（即该值左右两边权重和为0.5）
$y=\inf \{ y\big| \sum_{m\in \{m\vert y_m\leq y\}}\alpha^{(m)} \geq 0.5 \sum_{m=1}^M\alpha^{(m)}\}$
当权重和预测值出现的频率一致时，加权中位数就是中位数。

代码

python库：sklearn.ensemble.AdaBoostRegressor
代码

import warnings
import numbers
import numpy as np
from sklearn.tree import DecisionTreeRegressor

def stable_cumsum(arr, axis=None, rtol=1e-05, atol=1e-08):
   out = np.cumsum(arr, axis=axis, dtype=np.float64)
   expected = np.sum(arr, axis=axis, dtype=np.float64)
   if not np.all(
       np.isclose(
           out.take(-1, axis=axis), expected, rtol=rtol, atol=atol, equal_nan=True
       )
   ):
       warnings.warn(
           "cumsum was found to be unstable: "
           "its last element does not correspond to sum",
           RuntimeWarning,
       )
   return out    

def _num_samples(x):
   """Return number of samples in array-like x."""
   message = "Expected sequence or array-like, got %s" % type(x)
   if hasattr(x, "fit") and callable(x.fit):
       # Don't get num_samples from an ensembles length!
       raise TypeError(message)

   if not hasattr(x, "__len__") and not hasattr(x, "shape"):
       if hasattr(x, "__array__"):
           x = np.asarray(x)
       else:
           raise TypeError(message)

   if hasattr(x, "shape") and x.shape is not None:
       if len(x.shape) == 0:
           raise TypeError(
               "Singleton array %r cannot be considered a valid collection." % x
           )
       # Check that shape is returning an integer or default to len
       # Dask dataframes may not return numeric shape[0] value
       if isinstance(x.shape[0], numbers.Integral):
           return x.shape[0]
   try:
       return len(x)
   except TypeError as type_error:
       raise TypeError(message) from type_error
   
class AdaBoostR2:
   def __init__(self,  n_estimators):
       self.n_estimators = n_estimators
       self.boostors = []
       self.weight = []
   
   def fit(self, X, y, **kwargs):
       w = np.ones(X.shape[0]) / X.shape[0]
       
       for n in range(self.n_estimators):
           cla = DecisionTreeRegressor(max_depth=3)  
           cla.fit(X, y) 
           y_pred = cla.predict(X)
           e = np.abs(y_pred - y)
           e /= e.max() 
           err = (w*e).sum()
           beta = err/(1 - err)
           alpha = np.log(1/beta + 1e-6) #1e-6使浮点数别太小
           w *= np.power(beta,1-e)
           w /= w.sum()
           self.boostors.append(cla)
           self.weight.append(alpha)
           
   def _get_median_predict(self, X, limit):
       # Evaluate predictions of all estimators
       predictions = np.array([boostor.predict(X) for boostor in self.boostors]).T

       sorted_idx = np.argsort(predictions, axis=1)
       weight_cdf = stable_cumsum(self.weight[sorted_idx], axis=1)
       median_or_above = weight_cdf >= 0.5 * weight_cdf[:, -1][:, np.newaxis]
       median_idx = median_or_above.argmax(axis=1)
       median_estimators = sorted_idx[np.arange(_num_samples(X)), median_idx]

       # Return median predictions
       return predictions[np.arange(_num_samples(X)), median_estimators]           
   
   def predict(self, X):
       return self._get_median_predict(X)

[参考]：

DataWhale集成学习
《统计学习方法》（李航）

Gu_NN

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Adaboost自适应提升算法

目录基本概念分类任务损失函数基本概念Adaboost（Adaptive Boosting）为自适应提升算法。基本思路为1. 提高那些被前一轮分类器错误分类的样本的权重，而降低那些被正确分类的样本的权重。2. 加大分类错误率低的弱分类器的权重分类任务损失函数对于 KKK分类问题而言，当样本标签 y=[y1,...,yK]T\mathbf{y}=[y_1,...,y_K]^Ty=[y1,...,yK]T的类别 c(y)c(\mathbf{y})c(y) 为第kkk类(k=1,...,K)(k=1,.
复制链接

扫一扫

专栏目录