算法期望泛化误差
(本文为个人学习总结笔记)
算法期望泛化误差
原公式:
E ( f ; D ) = E D [ ( f ( x ; D ) − y D ) 2 ] = E D [ ( f ( x ; D ) − f ˉ ( x ) + f ˉ ( x ) − y D ) 2 ] = E D [ ( f ( x ; D ) − f ˉ ( x ) ) 2 ] + E D [ ( f ˉ ( x ) − y D ) 2 ] + E D [ + 2 ( f ( x ; D ) − f ˉ ( x ) ) ( f ˉ ( x ) − y D ) ] = E D [ ( f ( x ; D ) − f ˉ ( x ) ) 2 ] + E D [ ( f ˉ ( x ) − y D ) 2 ] = E D [ ( f ( x ; D ) − f ˉ ( x ) ) 2 ] + E D [ ( f ˉ ( x ) − y + y − y D ) 2 ] = E D [ ( f ( x ; D ) − f ˉ ( x ) ) 2 ] + E D [ ( f ˉ ( x ) − y + E D [ ( y − y D ) 2 ] + 2 E D [ ( f ˉ ( x ) − y ) ( y − y D ) ] = E D [ ( f ( x ; D ) − f ˉ ( x ) ) 2 ] + ( f ˉ ( x ) − y ) 2 + E D [ ( y D − y ) 2 ] \begin{aligned} E(f ; D)=& \mathbb{E}_{D}\left[\left(f(\boldsymbol{x} ; D)-y_{D}\right)^{2}\right] \\ =& \mathbb{E}_{D}\left[\left(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x})+\bar{f}(\boldsymbol{x})-y_{D}\right)^{2}\right] \\ =& \mathbb{E}_{D}\left[(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))^{2}\right]+\mathbb{E}_{D}\left[\left(\bar{f}(\boldsymbol{x})-y_{D}\right)^{2}\right] \\ &+\mathbb{E}_{D}\left[+2(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))\left(\bar{f}(\boldsymbol{x})-y_{D}\right)\right] \\ =& \mathbb{E}_{D}\left[(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))^{2}\right]+\mathbb{E}_{D}\left[\left(\bar{f}(\boldsymbol{x})-y_{D}\right)^{2}\right] \\ =& \mathbb{E}_{D}\left[(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))^{2}\right]+\mathbb{E}_{D}\left[\left(\bar{f}(\boldsymbol{x})-y+y-y_{D}\right)^{2}\right] \\ =& \mathbb{E}_{D}\left[(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))^{2}\right]+\mathbb{E}_{D}\left[\left(\bar{f}(\boldsymbol{x})-y+\mathbb{E}_{D}\left[\left(y-y_{D}\right)^{2}\right]\right.\right.\\ &+2 \mathbb{E}_{D}\left[(\bar{f}(\boldsymbol{x})-y)\left(y-y_{D}\right)\right] \\ =& \mathbb{E}_{D}\left[(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))^{2}\right]+(\bar{f}(\boldsymbol{x})-y)^{2}+\mathbb{E}_{D}\left[\left(y_{D}-y\right)^{2}\right] \end{aligned} E(f;D)=======ED[(f(x;D)−yD)2]ED[(f(x;D)−fˉ(x)+fˉ(x)−yD)2]ED[(f(x;D)−fˉ(x))2]+ED[(fˉ(x)−yD)2]+ED[+2(f(x;D)−fˉ(x))(fˉ(x)−yD)]ED[(f(x;D)−fˉ(x))2]+ED[(fˉ(x)−yD)2]ED[(f(x;D)−fˉ(x))2]+ED[(fˉ(x)−y+y−yD)2]ED[(f(x;D)−fˉ(x))2]+ED[(fˉ(x)−y+ED[(y−yD)2]+2ED[(fˉ(x)−y)(y−yD)]ED[(f(x;D)−fˉ(x))2]+(fˉ(x)−y)2+ED[(yD−y)2]
1、第一步:减一个
f
ˉ
(
x
)
\bar{f}(\boldsymbol{x})
fˉ(x)再加一个
f
ˉ
(
x
)
\bar{f}(\boldsymbol{x})
fˉ(x),属于简单的恒等变形。
2、第二步:首先将中括号中的式子展开
E
D
[
(
f
(
x
;
D
)
−
f
ˉ
(
x
)
)
2
+
(
f
ˉ
(
x
)
−
y
D
)
2
+
2
(
f
(
x
;
D
)
−
f
ˉ
(
x
)
)
(
f
ˉ
(
x
)
−
y
D
)
]
\mathbb{E}_{D}\left[(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))^{2}+\left(\bar{f}(\boldsymbol{x})-y_{D}\right)^{2}+2(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))\left(\bar{f}(\boldsymbol{x})-y_{D}\right)\right]
ED[(f(x;D)−fˉ(x))2+(fˉ(x)−yD)2+2(f(x;D)−fˉ(x))(fˉ(x)−yD)]
然后根据期望的运算性质,得:
E
D
[
(
f
(
x
;
D
)
−
f
ˉ
(
x
)
)
2
]
+
E
D
[
(
f
ˉ
(
x
)
−
y
D
)
2
]
+
E
D
[
2
(
f
(
x
;
D
)
−
f
ˉ
(
x
)
)
(
f
ˉ
(
x
)
−
y
D
)
]
\mathbb{E}_{D}\left[(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))^{2}\right]+\mathbb{E}_{D}\left[\left(\bar{f}(\boldsymbol{x})-y_{D}\right)^{2}\right]+\mathbb{E}_{D}\left[2(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))\left(\bar{f}(\boldsymbol{x})-y_{D}\right)\right]
ED[(f(x;D)−fˉ(x))2]+ED[(fˉ(x)−yD)2]+ED[2(f(x;D)−fˉ(x))(fˉ(x)−yD)]
3、第三步:再次利用期望的运算性质将第3步得到的式子的最后一项展开
E
D
[
2
(
f
(
x
;
D
)
−
f
ˉ
(
x
)
)
(
f
ˉ
(
x
)
−
y
D
)
]
=
E
D
[
2
(
f
(
x
;
D
)
−
f
ˉ
(
x
)
)
⋅
f
ˉ
(
x
)
]
−
E
D
[
2
(
f
(
x
;
D
)
−
f
ˉ
(
x
)
)
⋅
y
D
]
=
0
+
0
\begin{aligned} &\mathbb{E}_{D}\left[2(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x}))\left(\bar{f}(\boldsymbol{x})-y_{D}\right)\right]=\mathbb{E}_{D}[2(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x})) \cdot \bar{f}(\boldsymbol{x})]-\mathbb{E}_{D}\left[2(f(\boldsymbol{x} ; D)-\bar{f}(\boldsymbol{x})) \cdot y_{D}\right]\\ &=0+0 \end{aligned}
ED[2(f(x;D)−fˉ(x))(fˉ(x)−yD)]=ED[2(f(x;D)−fˉ(x))⋅fˉ(x)]−ED[2(f(x;D)−fˉ(x))⋅yD]=0+0
4、第四步:同第1步一样,减一个y再加一个y,属于简单的恒等变形;
5、第五步:同第2步一样,将最后一项利用期望的运算性质进行展开;
6、第六步:因为
f
ˉ
(
x
)
\bar{f}(\boldsymbol{x})
fˉ(x)和y均为常量,所以根据期望的运算性质可知,第6步中的第2项可化为
E
D
[
(
f
ˉ
(
x
)
−
y
)
2
]
=
(
f
ˉ
(
x
)
−
y
)
2
\mathbb{E}_{D}\left[(\bar{f}(\boldsymbol{x})-y)^{2}\right]=(\bar{f}(\boldsymbol{x})-y)^{2}
ED[(fˉ(x)−y)2]=(fˉ(x)−y)2
同理,第6步中的最后一项可化为:
2
E
D
[
(
f
ˉ
(
x
)
−
y
)
(
y
−
y
D
)
]
=
2
(
f
ˉ
(
x
)
−
y
)
E
D
[
(
y
−
y
D
)
]
2 \mathbb{E}_{D}\left[(\bar{f}(\boldsymbol{x})-y)\left(y-y_{D}\right)\right]=2(\bar{f}(\boldsymbol{x})-y) \mathbb{E}_{D}\left[\left(y-y_{D}\right)\right]
2ED[(fˉ(x)−y)(y−yD)]=2(fˉ(x)−y)ED[(y−yD)]
由于此时假设噪声的期望为零,故:
2
E
D
[
(
f
ˉ
(
x
)
−
y
)
(
y
−
y
D
)
]
=
2
(
f
ˉ
(
x
)
−
y
)
⋅
0
=
0
2 \mathbb{E}_{D}\left[(\bar{f}(\boldsymbol{x})-y)\left(y-y_{D}\right)\right]=2(\bar{f}(\boldsymbol{x})-y) \cdot 0=0
2ED[(fˉ(x)−y)(y−yD)]=2(fˉ(x)−y)⋅0=0