- 假设数据集上需要预测的样本为Y ,特征为X, 潜在模型为 Y = f ( X ) + ε Y=f(X)+ \varepsilon Y=f(X)+ε,其中 ε ∼ N ( 0 , σ ε ) \varepsilon \sim N(0,\sigma_\varepsilon) ε∼N(0,σε)是噪声,估计的模型为 f ^ ( x ) \hat{f}(x) f^(x)。
- 推导过程
- E r r ( X ) = E [ ( Y − f ^ ( X ) ) 2 ] Err(X)=E[(Y-\hat{f}(X))^{2}] Err(X)=E[(Y−f^(X))2]
- E r r ( X ) = E [ ( f ( X ) + ε − f ^ ( X ) ) 2 ] Err(X)=E[(f(X)+\varepsilon-\hat{f}(X))^{2}] Err(X)=E[(f(X)+ε−f^(X))2]
-
E
r
r
(
X
)
=
E
[
(
f
(
X
)
−
f
^
(
X
)
)
2
+
2
ε
(
f
(
X
)
−
f
^
(
X
)
)
+
ε
2
]
Err(X)=E[(f(X)-\hat{f}(X))^{2}+2\varepsilon (f(X)-\hat{f}(X))+\varepsilon^{2}]
Err(X)=E[(f(X)−f^(X))2+2ε(f(X)−f^(X))+ε2]
由于 ε \varepsilon ε服从均值为0的分布,故对 2 ε ( f ( X ) − f ^ ( X ) ) 2\varepsilon (f(X)-\hat{f}(X)) 2ε(f(X)−f^(X))求期望得0, ε 2 \varepsilon^{2} ε2的期望等于其方差。 - E r r ( X ) = E [ ( E ( f ^ ( X ) ) − f ( X ) + f ^ ( X ) − E ( f ^ ( X ) ) ) 2 ] + σ ε 2 Err(X)=E[(E(\hat{f}(X) )-f(X) +\hat{f}(X) -E(\hat{f}(X) ) )^{2}]+\sigma _{\varepsilon }^{2} Err(X)=E[(E(f^(X))−f(X)+f^(X)−E(f^(X)))2]+σε2
- E r r ( X ) = E [ ( E ( f ^ ( X ) ) − f ( X ) ) 2 ] + E [ ( f ^ ( X ) − E ( f ^ ( X ) ) ) 2 ] + 2 E [ ( E ( f ^ ( X ) ) − f ( X ) ) ( f ^ ( X ) − E ( f ^ ( X ) ) ) ] + σ ε 2 Err(X)=E[(E(\hat{f}(X) )-f(X) )^{2}]+E[(\hat{f}(X) -E(\hat{f}(X) ) )^{2}] +2E[ (E(\hat{f}(X) )-f(X) )(\hat{f}(X) -E(\hat{f}(X) )) ]+\sigma _{\varepsilon }^{2} Err(X)=E[(E(f^(X))−f(X))2]+E[(f^(X)−E(f^(X)))2]+2E[(E(f^(X))−f(X))(f^(X)−E(f^(X)))]+σε2
- 对
E
[
(
E
(
f
^
(
X
)
)
−
f
(
X
)
)
(
f
^
(
X
)
−
E
(
f
^
(
X
)
)
)
]
E[ (E(\hat{f}(X) )-f(X) )(\hat{f}(X) -E(\hat{f}(X) )) ]
E[(E(f^(X))−f(X))(f^(X)−E(f^(X)))]进一步展开可得:
E [ E ( f ^ ( X ) ) f ^ ( X ) − E ( f ^ ( X ) ) 2 − f ( X ) f ^ ( X ) + f ( X ) E ( f ^ ( X ) ) ] E[ E(\hat{f}(X) ) \hat{f}(X) - E(\hat{f}(X) )^{2} - f(X)\hat{f}(X) + f(X)E(\hat{f}(X) ) ] E[E(f^(X))f^(X)−E(f^(X))2−f(X)f^(X)+f(X)E(f^(X))]
其中前两项和为0,得 E [ f ( X ) E ( f ^ ( X ) ) − f ( X ) f ^ ( X ) ] E[ f(X)E(\hat{f}(X) ) - f(X)\hat{f}(X) ] E[f(X)E(f^(X))−f(X)f^(X)] -
E
(
f
^
(
X
)
)
E(\hat{f}(X) )
E(f^(X))是一个值所以可以从式中提出来,同时
f
(
X
)
f(X)
f(X),
f
^
(
X
)
\hat{f}(X)
f^(X)相互独立。故
E [ f ( X ) E ( f ^ ( X ) ) − f ( X ) f ^ ( X ) ] = E ( f ^ ( X ) ) E ( f ( X ) ) − E ( f ( X ) ) E ( f ^ ( X ) ) = 0 E[ f(X)E(\hat{f}(X) ) - f(X)\hat{f}(X) ] = E(\hat{f}(X) )E(f(X) ) - E(f(X) )E(\hat{f}(X) ) = 0 E[f(X)E(f^(X))−f(X)f^(X)]=E(f^(X))E(f(X))−E(f(X))E(f^(X))=0 - E r r ( X ) = E [ ( E ( f ^ ( X ) ) − f ( X ) ) 2 ] + E [ ( f ^ ( X ) − E ( f ^ ( X ) ) ) 2 ] + σ ε 2 Err(X)=E[(E(\hat{f}(X) )-f(X) )^{2}]+E[(\hat{f}(X) -E(\hat{f}(X) ) )^{2}] +\sigma _{\varepsilon }^{2} Err(X)=E[(E(f^(X))−f(X))2]+E[(f^(X)−E(f^(X)))2]+σε2
- E r r ( X ) = B i a s 2 + V a r ( X ) + σ ε 2 Err(X)=Bias^{2}+Var(X)+\sigma_{\varepsilon}^{2} Err(X)=Bias2+Var(X)+σε2
- 泛化误差可以分解为:偏差+方差
- 偏差:反映了模型在样本上的期望输出与真是标记之间的差距,即模型本身的精准度,反映的是模型本身的拟合能力。
- 方差:反映了模型在不同训练数据集下学得的函数的输出与期望输出之间的误差,即模型的稳定性,反应的是模型的波动情况。
- 泛化误差分析:欠拟合,高偏差低方差;过拟合,低偏差高方差。
偏差-方差分解简要推导
最新推荐文章于 2024-05-04 11:16:46 发布