统计学习方法-Adaboost训练误差有界定理

定理(AdaBoost的训练误差界): AdaBoost算法最终分类器的训练误差界为:
1 N ∑ i = 1 N I ( G ( x i ) ≠ y i ) ≤ 1 N ∑ i = 1 N e x p ( − y i f ( x i ) ) = ∏ m Z m \begin{aligned} \frac{1}{N}\sum_{i=1}^N \mathtt{I} (G(x_i) \neq y_i) \leq \frac{1}{N}\sum_{i=1}^N\mathtt{exp}(-y_i f(x_i))=\prod_{m}Z_m \end{aligned} N1i=1NI(G(xi)=yi)N1i=1Nexp(yif(xi))=mZm
这里的 G ( x ) , f ( x ) , Z m G(x),f(x),Z_m G(x),f(x),Zm,在统计学习方法的中定义。

Proof:

这其中: G ( x ) = f ( x ) = ∑ m α m G m ( x ) G(x)=f(x)=\sum_{m}\alpha_m G_m(x) G(x)=f(x)=mαmGm(x),都表示由AdaBoost方法得到的最终分类器。 Z m = ∑ i = 1 N w m i e x p ( − α m y i G m ( x i ) ) Z_m=\sum_{i=1}^N w_{mi} \mathtt{exp}(-\alpha_m y_i G_m(x_i)) Zm=i=1Nwmiexp(αmyiGm(xi)),表示第 m + 1 m+1 m+1个弱分类器的数值分布的归一化因子。这其中: w m i = w m − 1 i Z m − 1 e x p ( − α m − 1 y i G m − 1 ( x i ) ) w_{mi}= \frac{w_{m-1i}}{Z_{m-1}} \mathtt{exp}(-\alpha_{m-1} y_i G_{m-1}(x_i)) wmi=Zm1wm1iexp(αm1yiGm1(xi))表示第 m m m分类器的数据分布中第 i i i个数值的分布值; α m = 1 2 l o g 1 − e m e m \alpha_m = \frac{1}{2}\mathtt{log}\frac{1-e_m}{e_m} αm=21logem1em,表示第 m m m个弱分类器的系数,其中 e m = ∑ i = 1 N P ( G m ( x i ) ≠ y i ) = ∑ i = 1 N w m i I ( G m ( x i ) ≠ y i ) e_m = \sum_{i=1}^N \mathbb{P}(G_m(x_i) \neq y_i)=\sum_{i=1}^N w_{mi} \mathtt{I}(G_m(x_i) \neq y_i) em=i=1NP(Gm(xi)=yi)=i=1NwmiI(Gm(xi)=yi)表示分类错误率。

此时,我们看上面的定理,他是用所有的归一化因子来作为分类误差的上界。

首先:

G ( x i ) ≠ y ( x i ) → y ( x i ) f ( x i ) < 0 → e x p ( y ( x i ) f ( x i ) ) < 1 → e x p ( − y ( x i ) f ( x i ) ) > 1 ≥ I ( G ( x i ) ≠ f ( x i ) G(x_i) \neq y(x_i) \to y(x_i)f(x_i) < 0 \to \mathtt{exp}(y(x_i)f(x_i)) < 1 \to \mathtt{exp}(-y(x_i)f(x_i)) > 1 \geq \mathtt{I}(G(x_i) \ne f(x_i) G(xi)=y(xi)y(xi)f(xi)<0exp(y(xi)f(xi))<1exp(y(xi)f(xi))>1I(G(xi)=f(xi).

那么就可以得到:

1 N ∑ i = 1 N I ( G ( x i ) ≠ y i ) ≤ 1 N ∑ i = 1 N e x p ( − y i f ( x i ) ) \frac{1}{N}\sum_{i=1}^N \mathtt{I} (G(x_i) \neq y_i) \leq \frac{1}{N}\sum_{i=1}^N\mathtt{exp}(-y_i f(x_i)) N1i=1NI(G(xi)=yi)N1i=1Nexp(yif(xi))

下面证明定理右边的等式成立:
1 N ∑ i = 1 N e x p ( − y i f ( x i ) ) = 1 N ∑ i = 1 N e x p ( − y i ∑ m = 1 M α m G m ( x i ) ) = 1 N ∑ i = 1 N e x p ( ∑ m = 1 M y i α m G m ( x i ) ) = 1 N ∑ i = 1 N ∏ m = 1 M e x p ( y i α m G m ( x i ) ) \begin{aligned} \frac{1}{N}\sum_{i=1}^N\mathtt{exp}(-y_i f(x_i)) &=\frac{1}{N}\sum_{i=1}^N \mathtt{exp}(-y_i \sum_{m=1}^M \alpha_m G_m(x_i)) \\ & = \frac{1}{N}\sum_{i=1}^N\mathtt{exp}(\sum_{m=1}^M y_i \alpha_m G_m(x_i)) \\ & = \frac{1}{N}\sum_{i=1}^N \prod_{m=1}^M \mathtt{exp}(y_i \alpha_m G_m(x_i)) \end{aligned} N1i=1Nexp(yif(xi))=N1i=1Nexp(yim=1MαmGm(xi))=N1i=1Nexp(m=1MyiαmGm(xi))=N1i=1Nm=1Mexp(yiαmGm(xi))
由上述式子,可知 w m + 1 i Z m = w m i e x p ( − α m y i G m ( x i ) ) w_{m+1i}Z_m = w_{mi} \mathtt{exp}(-\alpha_m y_i G_m(x_i)) wm+1iZm=wmiexp(αmyiGm(xi)),并且在Adaboost中 ∑ i w m i = 1 \sum_i w_{mi}=1 iwmi=1。则有:
1 N ∑ i = 1 N e x p ( − y i f ( x i ) ) = 1 N ∑ i = 1 N w 1 i ∏ m = 1 M e x p ( y i α m G m ( x i ) ) = Z 1 1 N ∑ i = 1 N w 2 i ∏ m = 2 M e x p ( y i α m G m ( x i ) ) = ⋯ = 1 N Z 1 Z 2 ⋯ Z M ∑ i = 1 m 1 = 1 N Z 1 Z 2 ⋯ Z M N = ∏ m = 1 M Z m \begin{aligned} \frac{1}{N}\sum_{i=1}^N\mathtt{exp}(-y_i f(x_i)) & = \frac{1}{N}\sum_{i=1}^N w_{1i} \prod_{m=1}^M \mathtt{exp}(y_i \alpha_m G_m(x_i))\\ & = Z_1 \frac{1}{N}\sum_{i=1}^N w_{2i} \prod_{m=2}^M \mathtt{exp}(y_i \alpha_m G_m(x_i))\\ & = \cdots\\ & = \frac{1}{N}Z_1Z_2\cdots Z_M\sum_{i=1}^m 1\\ & = \frac{1}{N}Z_1Z_2\cdots Z_MN\\ & = \prod_{m=1}^MZ_m \end{aligned} N1i=1Nexp(yif(xi))=N1i=1Nw1im=1Mexp(yiαmGm(xi))=Z1N1i=1Nw2im=2Mexp(yiαmGm(xi))==N1Z1Z2ZMi=1m1=N1Z1Z2ZMN=m=1MZm
综上,定理得证。

  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值