Adaboost公式推导

前向分步算法到AdaBoost

前向分步算法与AdaBoost有什么关系呢?除了都属于Boosting的模型,其实AdaBoost是当前向分步算法损失函数为指数损失时的特例。这篇就写一下推导的过程。

前向分步算法 Forward Stagewise Additive Modeling


  1. 初始化f0(x)=0” role=”presentation” style=”position: relative;”>f0(x)=0f0(x)=0
  2. 对于m=1,2,...,M” role=”presentation” style=”position: relative;”>m=1,2,...,Mm=1,2,...,M
    (a)

(βm,γm)=arg⁡minβ,γ∑i=1NL(yi,fm−1(xi)+βb(xi;γ))” role=”presentation” style=”text-align: center; position: relative;”>(βm,γm)=argminβ,γi=1NL(yi,fm1(xi)+βb(xi;γ))(βm,γm)=arg⁡minβ,γ∑i=1NL(yi,fm−1(xi)+βb(xi;γ))
(\beta_m,\gamma_m) = \arg\min_{\beta,\gamma} \sum_{i=1}^N L(y_i,f_{m-1}(x_i)+\beta b(x_i;\gamma))

(b)

fm(x)=fm−1(x)+βmb(x;γm)” role=”presentation” style=”text-align: center; position: relative;”>fm(x)=fm1(x)+βmb(x;γm)fm(x)=fm−1(x)+βmb(x;γm)
f_m(x) = f_{m-1}(x) + \beta_m b(x;\gamma_m)


前向分步算法的步骤如上,其实我觉得应该翻译成前向分步累加模型更适合。因为最终的决策函数f(x)” role=”presentation” style=”position: relative;”>f(x)f(x)

对于回归问题,前向分步算法的损失函数可以选平方损失,即

L(yi,f(x))=(yi−f(x))2” role=”presentation” style=”text-align: center; position: relative;”>L(yi,f(x))=(yif(x))2L(yi,f(x))=(yi−f(x))2
L(y_i,f(x)) = (y_i - f(x))^2

所以有

L(yi,fm−1(xi)+βb(xi;γ))=(yi−fm−1(xi)−βb(xi;γ))2=(rim−βb(xi;γ))2” role=”presentation” style=”position: relative;”>L(yi,fm1(xi)+βb(xi;γ))=(yifm1(xi)βb(xi;γ))2=(rimβb(xi;γ))2L(yi,fm−1(xi)+βb(xi;γ))=(yi−fm−1(xi)−βb(xi;γ))2=(rim−βb(xi;γ))2
L(y_i,f_{m-1}(x_i)+\beta b(x_i;\gamma)) = (y_i - f_{m-1}(x_i) - \beta b(x_i;\gamma))^2 \\ = (r_{im} - \beta b(x_i;\gamma))^2

其中rim=(yi−fm−1(xi))” role=”presentation” style=”position: relative;”>rim=(yifm1(xi))rim=(yi−fm−1(xi)),也就是令其去拟合当前模型的残差。


而AdaBoost是个分类器,对于分类问题,平方损失就不太适合了。所以引入指数损失,即

L(y,f(x))=exp(−yf(x))” role=”presentation” style=”text-align: center; position: relative;”>L(y,f(x))=exp(yf(x))L(y,f(x))=exp(−yf(x))
L(y,f(x)) = exp(-y f(x))

基本的AdaBoost是一个二分类模型,令其基函数b(x;γ)=G(x)” role=”presentation” style=”position: relative;”>b(x;γ)=G(x)b(x;γ)=G(x)
则在指数损失的基础上,就需要解决如下问题

(βm,Gm)=arg⁡minβ,G∑i=1Nexp[−yi(fm−1(xi)+βG(xi))]” role=”presentation” style=”text-align: center; position: relative;”>(βm,Gm)=argminβ,Gi=1Nexp[yi(fm1(xi)+βG(xi))](βm,Gm)=arg⁡minβ,G∑i=1Nexp[−yi(fm−1(xi)+βG(xi))]
(\beta_m,G_m) = \arg\min_{\beta,G} \sum_{i=1}^N exp[-y_i(f_{m-1}(x_i)+\beta G_(x_i))]

wi(m)=exp(−yifm−1(xi))” role=”presentation” style=”position: relative;”>w(m)i=exp(yifm1(xi))wi(m)=exp(−yifm−1(xi)),则上述公式可以写成

(βm,Gm)=arg⁡minβ,G∑i=1Nwi(m)exp(−βyiG(xi))” role=”presentation” style=”text-align: center; position: relative;”>(βm,Gm)=argminβ,Gi=1Nw(m)iexp(βyiG(xi))(βm,Gm)=arg⁡minβ,G∑i=1Nwi(m)exp(−βyiG(xi))
(\beta_m,G_m) = \arg\min_{\beta,G} \sum_{i=1}^N w_i^{(m)} exp(-\beta y_i G(x_i))

因为yi∈{−1,1}” role=”presentation” style=”position: relative;”>yi{1,1}yi∈{−1,1},有

e−β∑yi=G(xi)wi(m)+eβ∑yi≠G(xi)wi(m)” role=”presentation” style=”text-align: center; position: relative;”>eβyi=G(xi)w(m)i+eβyiG(xi)w(m)ie−β∑yi=G(xi)wi(m)+eβ∑yi≠G(xi)wi(m)
e^{-\beta} \sum_{y_i=G(x_i)} w_i^{(m)} + e^{\beta} \sum_{y_i \ne G(x_i)} w_i^{(m)}

在这基础上再添上两项,有

e−β∑yi=G(xi)wi(m)+eβ∑yi≠G(xi)wi(m)+e−β∑yi≠G(xi)wi(m)−e−β∑yi≠G(xi)wi(m)” role=”presentation” style=”text-align: center; position: relative;”>eβyi=G(xi)w(m)i+eβyiG(xi)w(m)i+eβyiG(xi)w(m)ieβyiG(xi)w(m)ie−β∑yi=G(xi)wi(m)+eβ∑yi≠G(xi)wi(m)+e−β∑yi≠G(xi)wi(m)−e−β∑yi≠G(xi)wi(m)
e^{-\beta} \sum_{y_i=G(x_i)} w_i^{(m)} + e^{\beta} \sum_{y_i \ne G(x_i)} w_i^{(m)} + e^{-\beta} \sum_{y_i \ne G(x_i)} w_i^{(m)} - e^{-\beta} \sum_{y_i \ne G(x_i)} w_i^{(m)}

再进一步合并,得到

(1)(eβ−e−β)∑i=1NwiI(yi≠G(xi))+e−β∑i=1Nwi(m)” role=”presentation” style=”position: relative;”>(eβeβ)i=1NwiI(yiG(xi))+eβi=1Nw(m)i(1)(1)(eβ−e−β)∑i=1NwiI(yi≠G(xi))+e−β∑i=1Nwi(m)
(e^{\beta} - e^{-\beta}) \sum_{i=1}^N w_i I(y_i \ne G(x_i)) + e^{-\beta} \sum_{i=1}^N w_i^{(m)} \tag 1

对于迭代的第m” role=”presentation” style=”position: relative;”>mm取最小值。因此有

Gm=arg⁡minG∑i=1Nwi(m)I(yi≠G(xi))” role=”presentation” style=”text-align: center; position: relative;”>Gm=argminGi=1Nw(m)iI(yiG(xi))Gm=arg⁡minG∑i=1Nwi(m)I(yi≠G(xi))
G_m = \arg\min_G \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))

那么βm” role=”presentation” style=”position: relative;”>βmβm求偏导,得到

∂L∂β=eβ∑i=1Nwi(m)I(yi≠G(xi))+e−β∑i=1Nwi(m)I(yi≠G(xi))−e−β∑i=1Nwi(m)” role=”presentation” style=”text-align: center; position: relative;”>Lβ=eβi=1Nw(m)iI(yiG(xi))+eβi=1Nw(m)iI(yiG(xi))eβi=1Nw(m)i∂L∂β=eβ∑i=1Nwi(m)I(yi≠G(xi))+e−β∑i=1Nwi(m)I(yi≠G(xi))−e−β∑i=1Nwi(m)
\frac {\partial_L} {\partial_{\beta}} = e^{\beta} \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i)) + e^{-\beta} \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i)) - e^{-\beta} \sum_{i=1}^N w_i^{(m)}
再令 ∂L∂β=0” role=”presentation” style=”position: relative;”>Lβ=0∂L∂β=0,得
eβ∑i=1Nwi(m)I(yi≠G(xi))=[∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))]e−β” role=”presentation” style=”text-align: center; position: relative;”>eβi=1Nw(m)iI(yiG(xi))=[i=1Nw(m)ii=1Nw(m)iI(yiG(xi))]eβeβ∑i=1Nwi(m)I(yi≠G(xi))=[∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))]e−β
e^{\beta} \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i)) = [\sum_{i=1}^N w_i^{(m)} - \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))] e^{-\beta}
对两边同求 log” role=”presentation” style=”position: relative;”>loglog,得到
log∑i=1Nwi(m)I(yi≠G(xi))+logeβ=log[∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))]+loge−β” role=”presentation” style=”text-align: center; position: relative;”>logi=1Nw(m)iI(yiG(xi))+logeβ=log[i=1Nw(m)ii=1Nw(m)iI(yiG(xi))]+logeβlog∑i=1Nwi(m)I(yi≠G(xi))+logeβ=log[∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))]+loge−β
log \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i)) + log e^{\beta} = log [\sum_{i=1}^N w_i^{(m)} - \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))] + log e^{-\beta}
又因为 loge−β=−logeβ” role=”presentation” style=”position: relative;”>logeβ=logeβloge−β=−logeβ,所以有
logeβ=12log∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))∑i=1Nwi(m)I(yi≠G(xi))” role=”presentation” style=”text-align: center; position: relative;”>logeβ=12logNi=1w(m)iNi=1w(m)iI(yiG(xi))Ni=1w(m)iI(yiG(xi))logeβ=12log∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))∑i=1Nwi(m)I(yi≠G(xi))
log e^{\beta} = \frac {1} {2} log \frac {\sum_{i=1}^N w_i^{(m)} - \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))} {\sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))}
所以解得
βm=12log∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))∑i=1NwiI(yi≠G(xi))” role=”presentation” style=”text-align: center; position: relative;”>βm=12logNi=1w(m)iNi=1w(m)iI(yiG(xi))Ni=1wiI(yiG(xi))βm=12log∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))∑i=1NwiI(yi≠G(xi))
\beta_m = \frac {1} {2} log \frac {\sum_{i=1}^N w_i^{(m)} - \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))} {\sum_{i=1}^N w_i I(y_i \ne G(x_i))}
又因为加权误差率
errm=∑i=1Nwi(m)I(yi≠G(xi))∑i=1Nwi(m)” role=”presentation” style=”text-align: center; position: relative;”>errm=Ni=1w(m)iI(yiG(xi))Ni=1w(m)ierrm=∑i=1Nwi(m)I(yi≠G(xi))∑i=1Nwi(m)
err_m = \frac {\sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))} {\sum_{i=1}^N w_i^{(m)}}
所以 βm” role=”presentation” style=”position: relative;”>βmβm可以写成
βm=12log1−errmerrm” role=”presentation” style=”text-align: center; position: relative;”>βm=12log1errmerrmβm=12log1−errmerrm
\beta_m = \frac {1} {2} log \frac {1 - err_m} {err_m}

求出了Gm(x)” role=”presentation” style=”position: relative;”>Gm(x)Gm(x)的更新公式了

fm(x)=fm−1(x)+βmGm(x)” role=”presentation” style=”text-align: center; position: relative;”>fm(x)=fm1(x)+βmGm(x)fm(x)=fm−1(x)+βmGm(x)
f_m(x) = f_{m-1}(x) + \beta_m G_m(x)

根据wi(m)=exp(−yifm−1(xi))” role=”presentation” style=”position: relative;”>w(m)i=exp(yifm1(xi))wi(m)=exp(−yifm−1(xi))的更新公式

wi(m+1)=exp(−yifm(xi))=exp(−yi(fm−1(xi)+βmGm(xi)))=wi(m)exp(−βmyiGm(xi))” role=”presentation” style=”position: relative;”>w(m+1)i=exp(yifm(xi))=exp(yi(fm1(xi)+βmGm(xi)))=w(m)iexp(βmyiGm(xi))wi(m+1)=exp(−yifm(xi))=exp(−yi(fm−1(xi)+βmGm(xi)))=wi(m)exp(−βmyiGm(xi))
w_i^{(m+1)} = exp(-y_i f_m (x_i)) \\ = exp(-y_i (f_{m-1}(x_i)+\beta_m G_m(x_i))) \\ = w_i^{(m)} exp(- \beta_m y_i G_m(x_i))
因为 yi” role=”presentation” style=”position: relative;”>yiyi,代入上面的公式,得到

wi(m+1)=exp(−yifm(xi))=wi(m)∙exp2βmI(yi≠Gm(xi))∙exp−βm” role=”presentation” style=”text-align: center; position: relative;”>w(m+1)i=exp(yifm(xi))=w(m)iexp2βmI(yiGm(xi))expβmwi(m+1)=exp(−yifm(xi))=wi(m)∙exp2βmI(yi≠Gm(xi))∙exp−βm
w_i^{(m+1)} = exp(-y_i f_m (x_i)) = w_i^{(m)} \bullet exp^{2 \beta_m I(y_i \ne G_m(x_i))} \bullet exp^{-\beta_m}

再令αm=2βm” role=”presentation” style=”position: relative;”>αm=2βmαm=2βm都一样,所以可以舍去。这样就得到了

wi(m+1)=wi(m)∙expαmI(yi≠Gm(xi))” role=”presentation” style=”text-align: center; position: relative;”>w(m+1)i=w(m)iexpαmI(yiGm(xi))wi(m+1)=wi(m)∙expαmI(yi≠Gm(xi))
w_i^{(m+1)} = w_i^{(m)} \bullet exp^{\alpha_m I(y_i \ne G_m(x_i))}
这就与AdaBoost的样本权值更新公式一样了。
αm=2βm=log1−errmerrm” role=”presentation” style=”position: relative;”>αm=2βm=log1errmerrmαm=2βm=log1−errmerrm 也与AdaBoost的弱分类器系数一样。

到这里也就推导出了当前向分步算法的损失函数选为指数损失的时候,前向分步算法也就是AdaBoost啦。

前向分步算法到AdaBoost

前向分步算法与AdaBoost有什么关系呢?除了都属于Boosting的模型,其实AdaBoost是当前向分步算法损失函数为指数损失时的特例。这篇就写一下推导的过程。

前向分步算法 Forward Stagewise Additive Modeling


  1. 初始化f0(x)=0” role=”presentation” style=”position: relative;”>f0(x)=0f0(x)=0
  2. 对于m=1,2,...,M” role=”presentation” style=”position: relative;”>m=1,2,...,Mm=1,2,...,M
    (a)

(βm,γm)=arg⁡minβ,γ∑i=1NL(yi,fm−1(xi)+βb(xi;γ))” role=”presentation” style=”text-align: center; position: relative;”>(βm,γm)=argminβ,γi=1NL(yi,fm1(xi)+βb(xi;γ))(βm,γm)=arg⁡minβ,γ∑i=1NL(yi,fm−1(xi)+βb(xi;γ))
(\beta_m,\gamma_m) = \arg\min_{\beta,\gamma} \sum_{i=1}^N L(y_i,f_{m-1}(x_i)+\beta b(x_i;\gamma))

(b)

fm(x)=fm−1(x)+βmb(x;γm)” role=”presentation” style=”text-align: center; position: relative;”>fm(x)=fm1(x)+βmb(x;γm)fm(x)=fm−1(x)+βmb(x;γm)
f_m(x) = f_{m-1}(x) + \beta_m b(x;\gamma_m)


前向分步算法的步骤如上,其实我觉得应该翻译成前向分步累加模型更适合。因为最终的决策函数f(x)” role=”presentation” style=”position: relative;”>f(x)f(x)

对于回归问题,前向分步算法的损失函数可以选平方损失,即

L(yi,f(x))=(yi−f(x))2” role=”presentation” style=”text-align: center; position: relative;”>L(yi,f(x))=(yif(x))2L(yi,f(x))=(yi−f(x))2
L(y_i,f(x)) = (y_i - f(x))^2

所以有

L(yi,fm−1(xi)+βb(xi;γ))=(yi−fm−1(xi)−βb(xi;γ))2=(rim−βb(xi;γ))2” role=”presentation” style=”position: relative;”>L(yi,fm1(xi)+βb(xi;γ))=(yifm1(xi)βb(xi;γ))2=(rimβb(xi;γ))2L(yi,fm−1(xi)+βb(xi;γ))=(yi−fm−1(xi)−βb(xi;γ))2=(rim−βb(xi;γ))2
L(y_i,f_{m-1}(x_i)+\beta b(x_i;\gamma)) = (y_i - f_{m-1}(x_i) - \beta b(x_i;\gamma))^2 \\ = (r_{im} - \beta b(x_i;\gamma))^2

其中rim=(yi−fm−1(xi))” role=”presentation” style=”position: relative;”>rim=(yifm1(xi))rim=(yi−fm−1(xi)),也就是令其去拟合当前模型的残差。


而AdaBoost是个分类器,对于分类问题,平方损失就不太适合了。所以引入指数损失,即

L(y,f(x))=exp(−yf(x))” role=”presentation” style=”text-align: center; position: relative;”>L(y,f(x))=exp(yf(x))L(y,f(x))=exp(−yf(x))
L(y,f(x)) = exp(-y f(x))

基本的AdaBoost是一个二分类模型,令其基函数b(x;γ)=G(x)” role=”presentation” style=”position: relative;”>b(x;γ)=G(x)b(x;γ)=G(x)
则在指数损失的基础上,就需要解决如下问题

(βm,Gm)=arg⁡minβ,G∑i=1Nexp[−yi(fm−1(xi)+βG(xi))]” role=”presentation” style=”text-align: center; position: relative;”>(βm,Gm)=argminβ,Gi=1Nexp[yi(fm1(xi)+βG(xi))](βm,Gm)=arg⁡minβ,G∑i=1Nexp[−yi(fm−1(xi)+βG(xi))]
(\beta_m,G_m) = \arg\min_{\beta,G} \sum_{i=1}^N exp[-y_i(f_{m-1}(x_i)+\beta G_(x_i))]

wi(m)=exp(−yifm−1(xi))” role=”presentation” style=”position: relative;”>w(m)i=exp(yifm1(xi))wi(m)=exp(−yifm−1(xi)),则上述公式可以写成

(βm,Gm)=arg⁡minβ,G∑i=1Nwi(m)exp(−βyiG(xi))” role=”presentation” style=”text-align: center; position: relative;”>(βm,Gm)=argminβ,Gi=1Nw(m)iexp(βyiG(xi))(βm,Gm)=arg⁡minβ,G∑i=1Nwi(m)exp(−βyiG(xi))
(\beta_m,G_m) = \arg\min_{\beta,G} \sum_{i=1}^N w_i^{(m)} exp(-\beta y_i G(x_i))

因为yi∈{−1,1}” role=”presentation” style=”position: relative;”>yi{1,1}yi∈{−1,1},有

e−β∑yi=G(xi)wi(m)+eβ∑yi≠G(xi)wi(m)” role=”presentation” style=”text-align: center; position: relative;”>eβyi=G(xi)w(m)i+eβyiG(xi)w(m)ie−β∑yi=G(xi)wi(m)+eβ∑yi≠G(xi)wi(m)
e^{-\beta} \sum_{y_i=G(x_i)} w_i^{(m)} + e^{\beta} \sum_{y_i \ne G(x_i)} w_i^{(m)}

在这基础上再添上两项,有

e−β∑yi=G(xi)wi(m)+eβ∑yi≠G(xi)wi(m)+e−β∑yi≠G(xi)wi(m)−e−β∑yi≠G(xi)wi(m)” role=”presentation” style=”text-align: center; position: relative;”>eβyi=G(xi)w(m)i+eβyiG(xi)w(m)i+eβyiG(xi)w(m)ieβyiG(xi)w(m)ie−β∑yi=G(xi)wi(m)+eβ∑yi≠G(xi)wi(m)+e−β∑yi≠G(xi)wi(m)−e−β∑yi≠G(xi)wi(m)
e^{-\beta} \sum_{y_i=G(x_i)} w_i^{(m)} + e^{\beta} \sum_{y_i \ne G(x_i)} w_i^{(m)} + e^{-\beta} \sum_{y_i \ne G(x_i)} w_i^{(m)} - e^{-\beta} \sum_{y_i \ne G(x_i)} w_i^{(m)}

再进一步合并,得到

(1)(eβ−e−β)∑i=1NwiI(yi≠G(xi))+e−β∑i=1Nwi(m)” role=”presentation” style=”position: relative;”>(eβeβ)i=1NwiI(yiG(xi))+eβi=1Nw(m)i(1)(1)(eβ−e−β)∑i=1NwiI(yi≠G(xi))+e−β∑i=1Nwi(m)
(e^{\beta} - e^{-\beta}) \sum_{i=1}^N w_i I(y_i \ne G(x_i)) + e^{-\beta} \sum_{i=1}^N w_i^{(m)} \tag 1

对于迭代的第m” role=”presentation” style=”position: relative;”>mm取最小值。因此有

Gm=arg⁡minG∑i=1Nwi(m)I(yi≠G(xi))” role=”presentation” style=”text-align: center; position: relative;”>Gm=argminGi=1Nw(m)iI(yiG(xi))Gm=arg⁡minG∑i=1Nwi(m)I(yi≠G(xi))
G_m = \arg\min_G \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))

那么βm” role=”presentation” style=”position: relative;”>βmβm求偏导,得到

∂L∂β=eβ∑i=1Nwi(m)I(yi≠G(xi))+e−β∑i=1Nwi(m)I(yi≠G(xi))−e−β∑i=1Nwi(m)” role=”presentation” style=”text-align: center; position: relative;”>Lβ=eβi=1Nw(m)iI(yiG(xi))+eβi=1Nw(m)iI(yiG(xi))eβi=1Nw(m)i∂L∂β=eβ∑i=1Nwi(m)I(yi≠G(xi))+e−β∑i=1Nwi(m)I(yi≠G(xi))−e−β∑i=1Nwi(m)
\frac {\partial_L} {\partial_{\beta}} = e^{\beta} \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i)) + e^{-\beta} \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i)) - e^{-\beta} \sum_{i=1}^N w_i^{(m)}
再令 ∂L∂β=0” role=”presentation” style=”position: relative;”>Lβ=0∂L∂β=0,得
eβ∑i=1Nwi(m)I(yi≠G(xi))=[∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))]e−β” role=”presentation” style=”text-align: center; position: relative;”>eβi=1Nw(m)iI(yiG(xi))=[i=1Nw(m)ii=1Nw(m)iI(yiG(xi))]eβeβ∑i=1Nwi(m)I(yi≠G(xi))=[∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))]e−β
e^{\beta} \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i)) = [\sum_{i=1}^N w_i^{(m)} - \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))] e^{-\beta}
对两边同求 log” role=”presentation” style=”position: relative;”>loglog,得到
log∑i=1Nwi(m)I(yi≠G(xi))+logeβ=log[∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))]+loge−β” role=”presentation” style=”text-align: center; position: relative;”>logi=1Nw(m)iI(yiG(xi))+logeβ=log[i=1Nw(m)ii=1Nw(m)iI(yiG(xi))]+logeβlog∑i=1Nwi(m)I(yi≠G(xi))+logeβ=log[∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))]+loge−β
log \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i)) + log e^{\beta} = log [\sum_{i=1}^N w_i^{(m)} - \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))] + log e^{-\beta}
又因为 loge−β=−logeβ” role=”presentation” style=”position: relative;”>logeβ=logeβloge−β=−logeβ,所以有
logeβ=12log∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))∑i=1Nwi(m)I(yi≠G(xi))” role=”presentation” style=”text-align: center; position: relative;”>logeβ=12logNi=1w(m)iNi=1w(m)iI(yiG(xi))Ni=1w(m)iI(yiG(xi))logeβ=12log∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))∑i=1Nwi(m)I(yi≠G(xi))
log e^{\beta} = \frac {1} {2} log \frac {\sum_{i=1}^N w_i^{(m)} - \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))} {\sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))}
所以解得
βm=12log∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))∑i=1NwiI(yi≠G(xi))” role=”presentation” style=”text-align: center; position: relative;”>βm=12logNi=1w(m)iNi=1w(m)iI(yiG(xi))Ni=1wiI(yiG(xi))βm=12log∑i=1Nwi(m)−∑i=1Nwi(m)I(yi≠G(xi))∑i=1NwiI(yi≠G(xi))
\beta_m = \frac {1} {2} log \frac {\sum_{i=1}^N w_i^{(m)} - \sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))} {\sum_{i=1}^N w_i I(y_i \ne G(x_i))}
又因为加权误差率
errm=∑i=1Nwi(m)I(yi≠G(xi))∑i=1Nwi(m)” role=”presentation” style=”text-align: center; position: relative;”>errm=Ni=1w(m)iI(yiG(xi))Ni=1w(m)ierrm=∑i=1Nwi(m)I(yi≠G(xi))∑i=1Nwi(m)
err_m = \frac {\sum_{i=1}^N w_i^{(m)} I(y_i \ne G(x_i))} {\sum_{i=1}^N w_i^{(m)}}
所以 βm” role=”presentation” style=”position: relative;”>βmβm可以写成
βm=12log1−errmerrm” role=”presentation” style=”text-align: center; position: relative;”>βm=12log1errmerrmβm=12log1−errmerrm
\beta_m = \frac {1} {2} log \frac {1 - err_m} {err_m}

求出了Gm(x)” role=”presentation” style=”position: relative;”>Gm(x)Gm(x)的更新公式了

fm(x)=fm−1(x)+βmGm(x)” role=”presentation” style=”text-align: center; position: relative;”>fm(x)=fm1(x)+βmGm(x)fm(x)=fm−1(x)+βmGm(x)
f_m(x) = f_{m-1}(x) + \beta_m G_m(x)

根据wi(m)=exp(−yifm−1(xi))” role=”presentation” style=”position: relative;”>w(m)i=exp(yifm1(xi))wi(m)=exp(−yifm−1(xi))的更新公式

wi(m+1)=exp(−yifm(xi))=exp(−yi(fm−1(xi)+βmGm(xi)))=wi(m)exp(−βmyiGm(xi))” role=”presentation” style=”position: relative;”>w(m+1)i=exp(yifm(xi))=exp(yi(fm1(xi)+βmGm(xi)))=w(m)iexp(βmyiGm(xi))wi(m+1)=exp(−yifm(xi))=exp(−yi(fm−1(xi)+βmGm(xi)))=wi(m)exp(−βmyiGm(xi))
w_i^{(m+1)} = exp(-y_i f_m (x_i)) \\ = exp(-y_i (f_{m-1}(x_i)+\beta_m G_m(x_i))) \\ = w_i^{(m)} exp(- \beta_m y_i G_m(x_i))
因为 yi” role=”presentation” style=”position: relative;”>yiyi,代入上面的公式,得到

wi(m+1)=exp(−yifm(xi))=wi(m)∙exp2βmI(yi≠Gm(xi))∙exp−βm” role=”presentation” style=”text-align: center; position: relative;”>w(m+1)i=exp(yifm(xi))=w(m)iexp2βmI(yiGm(xi))expβmwi(m+1)=exp(−yifm(xi))=wi(m)∙exp2βmI(yi≠Gm(xi))∙exp−βm
w_i^{(m+1)} = exp(-y_i f_m (x_i)) = w_i^{(m)} \bullet exp^{2 \beta_m I(y_i \ne G_m(x_i))} \bullet exp^{-\beta_m}

再令αm=2βm” role=”presentation” style=”position: relative;”>αm=2βmαm=2βm都一样,所以可以舍去。这样就得到了

wi(m+1)=wi(m)∙expαmI(yi≠Gm(xi))” role=”presentation” style=”text-align: center; position: relative;”>w(m+1)i=w(m)iexpαmI(yiGm(xi))wi(m+1)=wi(m)∙expαmI(yi≠Gm(xi))
w_i^{(m+1)} = w_i^{(m)} \bullet exp^{\alpha_m I(y_i \ne G_m(x_i))}
这就与AdaBoost的样本权值更新公式一样了。
αm=2βm=log1−errmerrm” role=”presentation” style=”position: relative;”>αm=2βm=log1errmerrmαm=2βm=log1−errmerrm 也与AdaBoost的弱分类器系数一样。

到这里也就推导出了当前向分步算法的损失函数选为指数损失的时候,前向分步算法也就是AdaBoost啦。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值