引言
在《统计学习方法》一书中,详细说明了期望风险最小化与后验概率最大化之间的关系,但是其中的公式推导过程有所省略,这篇文章作为补充说明。
证明
首先我们假设损失函数为0-1损失函数
L o s s = L ( Y , f ( X ) ) = { 1 , Y ≠ f ( X ) 0 , Y = f ( X ) Loss=L(Y, f(X))= \begin{cases} 1,\quad Y \neq f(X) \\ 0, \quad Y=f(X) \end{cases} Loss=L(Y,f(X))={1,Y=f(X)0,Y=f(X)
则期望风险为
R e x p ( f ) = R e x p ( L ( Y , f ( X ) ) ) = ∫ X ⋅ Y L ( y , f ( x ) ) P ( y , x ) d x d y = ∫ X ⋅ Y L ( y , f ( x ) ) P ( y ∣ x ) P ( x ) d x d y = ∫ X ∫ Y L ( y , f ( x ) ) P ( y ∣ x ) d y P ( x ) d x = ∫ X ( ∫ Y L ( y , f ( x ) ) P ( y ∣ x ) d y ) P ( x ) d x = E x ( ∫ Y L ( y , f ( X ) ) P ( y ∣ X ) d y ) \begin{aligned} R_{exp}(f)=R_{exp}(L(Y, f(X))) &=\int_{X \cdot Y} L(y,f(x))P(y,x)dxdy\\ & =\int_{X \cdot Y} L(y,f(x))P(y|x)P(x)dxdy \\ & =\int_{X} \int_{Y}L(y,f(x))P(y|x)dyP(x)dx = \int_{X} \Bigg(\int_{Y}L(y,f(x))P(y|x)dy\Bigg) P(x)dx \\ & = E_{x} \Bigg(\int_{Y}L(y,f(X))P(y|X)dy\Bigg) \end{aligned} Rexp(f)=Rexp(L(Y,f(X)))=∫X⋅YL(y,f(x))P(y,x)dxdy=∫X⋅YL(y,f(x))P(y∣x)P(x)dxdy=∫X∫YL(y,f(x))P(y∣x)dyP(x)dx=∫X(∫YL(y,f(x))P(y∣x)dy)P(x)dx=Ex(∫YL(y,f(X))P(y∣X)dy)
在朴素贝叶斯估计中是数据是离散的,故
R e x p ( f ) = E x ( ∫ Y L ( y , f ( X ) ) P ( y ∣ X ) d y ) = E x ( ∑ k K L ( c k , f ( X ) ) P ( c k ∣ X ) ) \begin{aligned} R_{exp}(f)=E_{x} \Bigg(\int_{Y}L(y,f(X))P(y|X)dy\Bigg) &=E_{x}\Bigg(\sum_{k}^{K}L(c_{k},f(X))P(c_{k}|X)\Bigg) \end{aligned} Rexp(f)=Ex(∫YL(y,f(X))P(y∣X)dy)=Ex(k∑KL(ck,f(X))P(ck∣X))
因此如果要使得期望风险最小化只需要对 X = x X=x X=x逐个极小化即可
F ( x ) = argmin y ∈ Y ∑ k K L ( c k , y ) P ( c k ∣ X = x ) ∵ y = f ( X = x ) ∵ E q u a t i o n ( 1 ) w h e n y = c k L ( c k , y ) = 0 = argmin y ∈ Y ∑ k K P ( c k ≠ y ∣ X = x ) ∵ E a c h X = x h a s o n l y o n e c k = y = f ( X = x ) = argmin y ∈ Y ( 1 − P ( c k = y ∣ X = x ) ) = argmax y ∈ Y P ( c k = y ∣ X = x ) \begin{aligned} F(x) &= \underset{ y \in Y }{\operatorname{argmin}} \sum_{k}^{K}L(c_{k},y)P(c_{k}|X=x) \quad \because y=f(X=x) \\ & \because Equation(1) \quad when \quad y=c_{k} \quad L(c_{k},y) =0 \\ & = \underset{ y \in Y }{\operatorname{argmin}} \sum_{k}^{K}P(c_{k} \neq y|X=x) \\ & \because Each \quad X=x \quad has \quad only \quad one \quad c_{k}=y=f(X=x) \\ & = \underset{ y \in Y }{\operatorname{argmin}}(1 - P(c_{k} = y|X=x)) \\ & = \underset{ y \in Y }{\operatorname{argmax}}P(c_{k} = y|X=x) \\ \end{aligned} F(x)=y∈Yargmink∑KL(ck,y)P(ck∣X=x)∵y=f(X=x)∵Equation(1)wheny=ckL(ck,y)=0=y∈Yargmink∑KP(ck=y∣X=x)∵EachX=xhasonlyoneck=y=f(X=x)=y∈Yargmin(1−P(ck=y∣X=x))=y∈YargmaxP(ck=y∣X=x)
结论
可证期望风险最小化等价于后验概率最大化