提升方法AdaBoost算法


前言

提升方法是一种常用的统计学习方法,应用广泛且有效。在分类问题中,通过改变训练样本的权重,学习多个分类器,并将这些分类器线性组合,提高分类的性能。


一、AdaBoost是什么?

标准AdaBoost关注二分类问题,AdaBoost通过训练一系列的弱分类器来组成一个强分类器,每一轮训练时会提高前一轮弱分类器错误分类样本的权值,而降低那些被正确分类的样本的权值。模型最后的预测结果为各弱分类器预测结果的加权多数表决结果。具体的,加大分类误差率小的弱分类器权值,使其在表决中起较大的作用。

二、AdaBoost算法流程

  • 输入:训练数据集 T = ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . , ( x N , y N ) T={(x_1,y_1),(x_2,y_2),..,(x_N,y_N)} T=(x1,y1),(x2,y2),..,(xN,yN),其中 x i ∈ R n x_i\in \mathbb{R}^n xiRn Y i ∈ { − 1 , 1 } Y_i\in \{-1,1\} Yi{1,1};弱分类器算法(一般为树桩)
  • 输出:最终分类器 G ( x ) G(x) G(x).
    1. 初始化训练数据的权值分布为 D 1 = ( w 11 , w 12 , . . . , w 1 N ) , w 1 i = 1 N D_1=(w_{11},w_{12},...,w_{1N}),w_{1i}=\frac{1}{N} D1=(w11,w12,...,w1N),w1i=N1
    1. 对m=1,2,…,M(M为弱分类器数量)
      • 使用具有权值分布 D m D_m Dm的训练数据学习,得到第m个基分类器 G m ( x ) G_m(x) Gm(x)
      • 计算 G m ( x ) G_m(x) Gm(x)的分类误差率:
        e m = ∑ i = 1 N w m i I ( G m ( x i ) ≠ y i ) e_m=\sum_{i=1}^Nw_{mi}I(G_m(x_i)\ne y_i) em=i=1NwmiI(Gm(xi)=yi)
      • 计算 G m ( x ) G_m(x) Gm(x)的系数(权重)
        α m = 1 2 ln ⁡ 1 − e m e m \alpha_m=\frac{1}{2}\ln \frac{1-e_m}{e_m} αm=21lnem1em
      • 更新训练集数据权值分布
        D m + 1 = ( w m 1 , w m 2 , . . . , w m N ) w m i = w m i e − α m y i G ( x i ) Z m Z m = ∑ i = 1 N w m i e − α m y i G ( x i ) D_{m+1}=(w_{m1},w_{m2},...,w_{mN}) \\ w_{mi}=\frac{w_{mi}e^{-\alpha_m y_iG(x_i)}}{Z_m} \\ Z_m=\sum_{i=1}^Nw_{mi}e^{-\alpha_m y_iG(x_i)} Dm+1=(wm1,wm2,...,wmN)wmi=ZmwmieαmyiG(xi)Zm=i=1NwmieαmyiG(xi)
        — 最终分类器
        G ( x ) = s i g n ( ∑ m = 1 M α m G m ( x ) ) G(x)=sign(\sum_{m=1}^M\alpha_mG_m(x)) G(x)=sign(m=1MαmGm(x))

三、 AdaBoost算法的解释

AdaBoost算法可解释为模型是加法模型、损失函数为指数函数、学习算法为前向分步算法时的二分类学习算法。

  • 向前分步算法
    加法模型:
    f ( x ) = ∑ m = 1 M β m b ( x ; γ m ) f(x)=\sum_{m=1}^M\beta_mb(x;\gamma_m) f(x)=m=1Mβmb(x;γm)
    其中, b ( x ; γ m ) b(x;\gamma_m) b(x;γm)为基函数, γ m \gamma_m γm为基函数参数, β m \beta_m βm为基函数的系数,M为基函数的个数。
    学习加法模型 f ( x ) f(x) f(x)即对经验风险极小化:
    min ⁡ β m , γ m ∑ i = 1 N L ( y i , ∑ m = 1 M β m b ( x ; γ m ) ) \min_{\beta_m,\gamma_m}\sum_{i=1}^NL(y_i,\sum_{m=1}^M\beta_mb(x;\gamma_m)) βm,γmmini=1NL(yi,m=1Mβmb(x;γm))
    通常这是一个复杂的优化问题,前向分布算法求解这一优化问题的想法是:从前往后每一步只学习一个基函数及其系数,逐步逼近最优解。具体地,每一步需要优化如下损失函数:
    min ⁡ β , γ ∑ i = 1 N L ( y i , f m − 1 ( x i ) + β b ( x ; γ ) ) \min_{\beta,\gamma}\sum_{i=1}^NL(y_i,f_{m-1}(x_i)+\beta b(x;\gamma)) β,γmini=1NL(yi,fm1(xi)+βb(x;γ))
  • 前向分布算法与AdaBoost
    前向分步算法逐一学习基函数,这与AdaBoost算法逐一学习基本分类器的过程一致。
    当前向分步算法的损失函数为指数损失函数,即:
    L ( y , f ( x ) ) = e − y f ( x ) L(y,f(x))=e^{-yf(x)} L(y,f(x))=eyf(x)
    时,其学习的具体操作等价于AdaBoost算法学习的具体操作。
    假设经过m-1轮迭代前向分步算法已经得到:
    f m − 1 ( x ) = α 1 G 1 ( x ) + α 2 G 2 ( x ) + . . . + α m − 1 G m − 1 ( x ) f_{m-1}(x)=\alpha_1G_1(x)+\alpha_2G_2(x)+...+\alpha_{m-1}G_{m-1}(x) fm1(x)=α1G1(x)+α2G2(x)+...+αm1Gm1(x)
    在第m轮迭代得到 α m , G m ( x ) 和 f m ( x ) = f m − 1 ( x ) + α m G m ( x ) \alpha_m,G_m(x)和f_m(x)=f_{m-1}(x)+\alpha_mG_m(x) αm,Gm(x)fm(x)=fm1(x)+αmGm(x),目标是希望得到:
    ( α m , G m ( x ) ) = arg ⁡ min ⁡ α m , G m ( x ) ∑ i = 1 N e − y i ( f m − 1 ( x i ) + α m G m ( x i ) ) (\alpha_m,G_m(x))=\arg\min_{\alpha_m,G_m(x)}\sum_{i=1}^Ne^{-y_i(f_{m-1}(x_i)+\alpha_mG_m(x_i))} (αm,Gm(x))=argαm,Gm(x)mini=1Neyi(fm1(xi)+αmGm(xi))
    w ‾ m i = e − y i f m − 1 ( x i ) \overline w_{mi}=e^{-y_if_{m-1}(x_i)} wmi=eyifm1(xi),则上式子可以化为:
    ( α m , G m ) = arg ⁡ min ⁡ α m , G m ( x ) ∑ i = 1 N w ‾ m i e − α m y i G m ( x i ) = arg ⁡ min ⁡ α m , G m e − α m ∑ y i = G m ( x i ) w ‾ m i + e α m ∑ y i ≠ G m ( x i ) w ‾ m i = arg ⁡ min ⁡ α m , G m e − α m ( ∑ i = 1 N w ‾ m i − ∑ y i ≠ G m ( x i ) w ‾ m i ) + e α m ∑ y i ≠ G m ( x i ) w ‾ m i = arg ⁡ min ⁡ α m , G m e − α m ∑ i = 1 N w ‾ m i + ( e α m − e − α m ) ∑ y i ≠ G m ( x i ) w ‾ m i = arg ⁡ min ⁡ α m , G m e − α m ∑ i = 1 N w ‾ m i + ( e α m − e − α m ) ∑ i = 1 N w ‾ m i I ( y i ≠ G m ( x i ) ) (\alpha_m,G_m)=\arg\min_{\alpha_m,G_m(x)}\sum_{i=1}^N\overline w_{mi}e^{-\alpha_my_iG_m(x_i)} \\ =\arg\min_{\alpha_m,G_m}e^{-\alpha_m}\sum_{y_i=G_m(x_i)}\overline w_{mi}+e^{\alpha_m}\sum_{y_i\ne G_m(x_i)}\overline w_{mi} \\ =\arg\min_{\alpha_m,G_m}e^{-\alpha_m}(\sum_{i=1}^N\overline w_{mi}-\sum_{y_i\ne G_m(x_i)}\overline w_{mi})+e^{\alpha_m}\sum_{y_i\ne G_m(x_i)}\overline w_{mi} \\ =\arg\min_{\alpha_m,G_m}e^{-\alpha_m}\sum_{i=1}^N\overline w_{mi}+(e^{\alpha_m}-e^{-\alpha_m})\sum_{y_i\ne G_m(x_i)}\overline w_{mi} \\ =\arg\min_{\alpha_m,G_m}e^{-\alpha_m}\sum_{i=1}^N\overline w_{mi}+(e^{\alpha_m}-e^{-\alpha_m})\sum_{i=1}^N\overline w_{mi}I(y_i\ne G_m(x_i)) (αm,Gm)=argαm,Gm(x)mini=1NwmieαmyiGm(xi)=argαm,Gmmineαmyi=Gm(xi)wmi+eαmyi=Gm(xi)wmi=argαm,Gmmineαm(i=1Nwmiyi=Gm(xi)wmi)+eαmyi=Gm(xi)wmi=argαm,Gmmineαmi=1Nwmi+(eαmeαm)yi=Gm(xi)wmi=argαm,Gmmineαmi=1Nwmi+(eαmeαm)i=1NwmiI(yi=Gm(xi))
    对于固定的 α m \alpha_m αm,上式中 e − α m ∑ i = 1 N w ‾ m i e^{-\alpha_m}\sum_{i=1}^N\overline w_{mi} eαmi=1Nwmi e α m − e − α m e^{\alpha_m}-e^{-\alpha_m} eαmeαm都是定值,则上式等价于:
    G m ∗ = arg ⁡ min ⁡ G m ∑ i = 1 N w ‾ m i I ( y i ≠ G m ( x i ) ) G_m^*=\arg\min_{G_m}\sum_{i=1}^N\overline w_{mi}I(y_i\ne G_m(x_i)) Gm=argGmmini=1NwmiI(yi=Gm(xi))
    这与AdaBoost中要寻找的基本分类器一致。
    然后对 α m \alpha_m αm求导并使其等于0,得:
    α m ∗ = 1 2 ln ⁡ 1 − e m e m e m = ∑ i = 1 N w ‾ m i I ( y i ≠ G m ( x i ) ) ∑ i = 1 N w ‾ m i \alpha_m^*=\frac{1}{2}\ln \frac{1-e_m}{e_m} \\ e_m=\frac{\sum_{i=1}^N\overline w_{mi}I(y_i\ne G_m(x_i))}{\sum_{i=1}^N\overline w_{mi}} αm=21lnem1emem=i=1Nwmii=1NwmiI(yi=Gm(xi))
    w m i = w ‾ m i ∑ i = 1 N w ‾ m i w_{mi}=\frac{\overline w_{mi}}{\sum_{i=1}^N\overline w_{mi}} wmi=i=1Nwmiwmi,得 e m = ∑ i = 1 N w m i I ( y i ≠ G m ( x i ) ) e_m=\sum_{i=1}^Nw_{mi}I(y_i\ne G_m(x_i)) em=i=1NwmiI(yi=Gm(xi)),这与AdaBoost一致。特别地,当 m = 0 m=0 m=0时, w ‾ m i = e − y i ∗ 0 = 1 , w m i = 1 N = w ‾ m i ∑ i = 1 N w ‾ m i \overline w_{mi}=e^{-y_i*0}=1,w_{mi}=\frac{1}{N}=\frac{\overline w_{mi}}{\sum_{i=1}^N\overline w_{mi}} wmi=eyi0=1,wmi=N1=i=1Nwmiwmi

w ‾ m i = e − y i f m − 1 ( x i ) \overline w_{mi}=e^{-y_if_{m-1}(x_i)} wmi=eyifm1(xi)以及 f m ( x i ) = f m − 1 ( x ) + α m G m ( x i ) f_m(x_i)=f_{m-1}(x)+\alpha_mG_m(x_i) fm(xi)=fm1(x)+αmGm(xi)得:
w ‾ m + 1 , i = w ‾ m i e − y i α m G m ( x i ) \overline w_{m+1,i}=\overline w_{mi}e^{-y_i\alpha_mG_m(x_i)} wm+1,i=wmieyiαmGm(xi)
w m + 1 , i = w ‾ m + 1 , i ∑ i = 1 N w ‾ m + 1 , i w_{m+1,i}=\frac{\overline w_{m+1,i}}{\sum_{i=1}^N\overline w_{m+1,i}} wm+1,i=i=1Nwm+1,iwm+1,i以及上式得:
w m + 1 , i = w ‾ m i e − y i α m G m ( x i ) ∑ i = 1 N w ‾ m i e − y i α m G m ( x i ) = w ‾ m i ∑ i = 1 N w ‾ m i e − y i α m G m ( x i ) ∑ i = 1 N w ‾ m i ∑ i = 1 N w ‾ m i e − y i α m G m ( x i ) = w m i e − y i α m G m ( x i ) ∑ i = 1 N w m i e − y i α m G m ( x i ) w_{m+1,i}=\frac{\overline w_{mi}e^{-y_i\alpha_mG_m(x_i)}}{\sum_{i=1}^N\overline w_{mi}e^{-y_i\alpha_mG_m(x_i)}} \\ =\frac{\frac{\overline w_{mi}}{\sum_{i=1}^N\overline w_{mi}}e^{-y_i\alpha_mG_m(x_i)}}{\sum_{i=1}^N\frac{\overline w_{mi}}{\sum_{i=1}^N\overline w_{mi}}e^{-y_i\alpha_mG_m(x_i)}} \\ =\frac{w_{mi}e^{-y_i\alpha_mG_m(x_i)}}{\sum_{i=1}^Nw_{mi}e^{-y_i\alpha_mG_m(x_i)}} wm+1,i=i=1NwmieyiαmGm(xi)wmieyiαmGm(xi)=i=1Ni=1NwmiwmieyiαmGm(xi)i=1NwmiwmieyiαmGm(xi)=i=1NwmieyiαmGm(xi)wmieyiαmGm(xi)
与AdaBoost一致。
综上,模型是加法模型、损失函数为指数函数、学习算法为前向分步算法时可以推导出AdaBoost。

四、代码实现

"""
AdaBoost 算法
"""

import numpy as np
from sklearn.datasets import load_digits
from tqdm import tqdm


class BasicClassifier(object):
    def __init__(self, train_xs, train_ys, weights, attr_type, split_cnt=10):
        """
        :param train_xs: 特征
        :param train_ys: 标签
        :param weights: 权重
        :param attr_type: 属性的类别(离散或者连续)
        :param split_cnt: 对于连续属性划分区域个数
        """
        self.train_xs = train_xs
        self.train_ys = train_ys
        self.m, self.n = self.train_xs.shape
        assert len(weights) == self.m
        assert len(attr_type) == self.n
        self.weights = weights
        self.attr_type = attr_type
        self.split_cnt = split_cnt

    def build(self):
        """
        建立一个基本的分类器
        :return:
        """
        min_em = float('inf')  # 最小误差率
        attr_index = -1
        attr_value = -1
        predict_ys = None

        # 对于连续属性为选取属性哪边预测为-1,lt为左侧,gt为右侧
        # 对于离散属性为选中类别的选中取值预测为1还是-1,'eq'为选中取值预测为-1,'neq'为选中取值预测为1
        side = None

        for i in range(len(self.attr_type)):
            if self.attr_type[i] == 0:  # 该属性为离散的
                uniques = np.unique(self.train_xs[:, i])
                for j in range(len(uniques)):
                    for ineq in ['eq', 'neq']:
                        _predict_ys = np.ones((self.m,))
                        if ineq == 'eq':
                            _predict_ys[self.train_xs[:, i] == uniques[j]] = -1
                        else:
                            _predict_ys[self.train_xs[:, i] != uniques[j]] = -1
                        em = self.weights[_predict_ys != self.train_ys].sum()  # 计算出误差率

                        if em < min_em:
                            min_em = em
                            attr_index = i
                            attr_value = uniques[j]
                            predict_ys = _predict_ys
                            side = ineq

            else:  # 该属性为连续的
                _min, _max = np.min(self.train_xs[:, i]), np.max(self.train_xs[:, i])
                step = (_max - _min) / self.split_cnt
                for j in range(self.split_cnt+1):
                    split_value = _min + step * j
                    for ineq in ['lt', 'gt']:
                        _predict_ys = np.ones((self.m,))
                        if ineq == 'lt':
                            _predict_ys[self.train_xs[:, i] < split_value] = -1
                        else:
                            _predict_ys[self.train_xs[:, i] >= split_value] = -1
                        em = self.weights[_predict_ys != self.train_ys].sum()  # 计算出误差率

                        if em < min_em:
                            min_em = em
                            attr_index = i
                            attr_value = split_value
                            predict_ys = _predict_ys
                            side = ineq

        return min_em, attr_index, attr_value, predict_ys, side


class AdaBoost(object):
    """
    提升方法AdaBoost方法
    """
    def __init__(self) -> None:
        self.classifers = []

    def train(self, train_xs, train_ys, test_xs, test_ys, attr_type, iters=500, test_freq=50):
        m, n = train_xs.shape
        weights = (1. / m) * np.ones((m,))

        for i in tqdm(range(iters)):
            basicclassifier = BasicClassifier(train_xs, train_ys, weights, attr_type)
            min_em, attr_index, attr_value, predict_ys, side = basicclassifier.build()

            am = 0.5 * np.log((1 - min_em) / min_em)
            weights = weights * np.exp(-am * train_ys.reshape((m,)) * predict_ys)
            weights /= np.sum(weights)

            self.classifers.append((am, attr_index, attr_value, side))

            if test_xs is not None and test_ys is not None and (i+1) % test_freq == 0:
                accuracy = self.test(test_xs, test_ys, attr_type)
                print("iters:%d, accuracy is %.4f" % (i+1, accuracy))

    def test(self, test_xs, test_ys, attr_type):
        """
        测试函数
        """
        predict_ys = np.zeros((test_xs.shape[0]))
        for am, attr_index, attr_value, side in self.classifers:
            if attr_type[attr_index] == 0:  # 属性为离散时
                if side == 'eq':
                    predict_ys[test_xs[:, attr_index] == attr_value] += -am
                    predict_ys[test_xs[:, attr_index] != attr_value] += am
                else:
                    predict_ys[test_xs[:, attr_index] == attr_value] += am
                    predict_ys[test_xs[:, attr_index] != attr_value] += -am

            else:   # 属性为连续时
                if side == 'lt':
                    predict_ys[test_xs[:, attr_index] < attr_value] += -am
                    predict_ys[test_xs[:, attr_index] >= attr_value] += am
                else:
                    predict_ys[test_xs[:, attr_index] < attr_value] += am
                    predict_ys[test_xs[:, attr_index] >= attr_value] += -am

        predict_ys[predict_ys > 0] = 1
        predict_ys[predict_ys < 0] = -1

        accuracy = (predict_ys == test_ys).sum() / test_xs.shape[0]
        return accuracy


if __name__ == '__main__':

    # 加载sklearn自带的手写数字识别数据集
    digits = load_digits()
    features = digits.data
    targets = (digits.target > 4).astype(int)
    targets[targets == 0] = -1

    # 随机打乱数据
    shuffle_indices = np.random.permutation(features.shape[0])
    features = features[shuffle_indices]
    targets = targets[shuffle_indices]

    # 划分训练、测试集
    train_count = int(len(features)*0.8)
    train_xs, train_ys = features[:train_count], targets[:train_count]
    test_xs, test_ys = features[train_count:], targets[train_count:]

    attr_type = [1] * train_xs.shape[1]
    adaboost = AdaBoost()
    adaboost.train(train_xs, train_ys, test_xs, test_ys, attr_type)

  • 28
    点赞
  • 28
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值