定义
z j l = ∑ k w j k l a k l − 1 + b j l ( 1.1 ) a j l = σ ( z j l ) ( 1.2 ) C = 1 2 ∑ j ( y j − a j l ) 2 ( 1.3 ) \begin{aligned} z_j^l&=\sum_kw_{jk}^la_k^{l-1}+b_j^l&(1.1)\\ a_j^l&=\sigma(z_j^l)&(1.2)\\ C&=\frac{1}{2}\sum_j(y_j-a_j^l)^2&(1.3) \end{aligned} zjlajlC=k∑wjklakl−1+bjl=σ(zjl)=21j∑(yj−ajl)2(1.1)(1.2)(1.3)
其中
- z j l z_j^l zjl为第 l l l层第 j j j个神经元激活函数的带权输入
- a j l a_j^l ajl为第 l l l层第 j j j个神经元的激活输出, σ \sigma σ是激活函数
- C C C为输出层二次代价函数
定义第 l l l层的第 j j j个神经元的误差 δ j l \delta_j^l δjl为:
δ j l = ∂ C ∂ z j l (2) \delta_j^l=\frac{\partial C}{\partial z_j^l}\tag{2} δjl=∂zjl∂C(2)
BP基本方程
δ j L = ( a j L − y j ) σ ′ ( z j L ) ( 3.1 ) δ j l = σ ′ ( z j l ) ∑ k w k j l + 1 δ k l + 1 ( 3.2 ) ∂ C ∂ b j l = σ j l ( 3.3 ) ∂ C ∂ w j k l = a k l − 1 δ j l ( 3.4 ) \begin{aligned} &\delta_j^L&=&(a_j^L-y_j)\sigma'(z_j^L)&(3.1)\\ &\delta_j^l&=&\sigma'(z_j^l)\sum_kw_{kj}^{l+1}\delta_k^{l+1}&(3.2)\\ &\frac{\partial C}{\partial b_j^l}&=&\sigma_j^l&(3.3)\\ &\frac{\partial C}{\partial w_{jk}^l}&=&a_k^{l-1}\delta_j^l&(3.4) \end{aligned} δjLδjl∂bjl∂C∂wjkl∂C====(ajL−yj)σ′(zjL)σ′(zjl)k∑wkjl+1δkl+1σjlakl−1δjl(3.1)(3.2)(3.3)(3.4)
其中
- δ j L \delta_j^L δjL是输出层第 j j j个神经元误差
- δ j l \delta_j^l δjl是第 l l l层第 j j j个神经元误差,式(3.2)实现了通过下一层的误差计算当前层误差
- ∂ C ∂ b j l \frac{\partial C}{\partial b_j^l} ∂bjl∂C是代价函数关于网络中第 l l l层第 j j j个偏置的改变率,式(3.3)说明了该改变率就是对应神经元的误差
- ∂ C ∂ w j k l \frac{\partial C}{\partial w_{jk}^l} ∂wjkl∂C是代价函数关于网络中连接第 l − 1 l-1 l−1层第 k k k个神经元与第 l l l层第 j j j个神经元权重的改变率,式(3.4)表明其仅与该神经元误差和第 l − 1 l-1 l−1层第 k k k个神经元的激活输出有关
方程推导
方程3.1
δ j L = ∂ C ∂ z j L = ∂ C ∂ a j L ∂ a j L ∂ z j L = ∂ C ∂ a j L ∂ [ σ ( z j L ) ] ∂ z j L = [ 1 2 ∑ k ( y k − a k L ) 2 ] ∂ a j L σ ′ ( z j L ) = ( a j L − y j ) σ ′ ( z j L ) \begin{aligned} \delta_j^L=&\frac{\partial C}{\partial z_j^L}\\=&\frac{\partial C}{\partial a_j^L}\frac{\partial a_j^L}{\partial z_j^L}\\=&\frac{\partial C}{\partial a_j^L}\frac{\partial [\sigma(z_j^L)]}{\partial z_j^L}\\=&\frac{[\frac{1}{2}\sum_k(y_k-a_k^L)^2]}{\partial a_j^L}\sigma'(z_j^L)\\=&(a_j^L-y_j)\sigma'(z_j^L) \end{aligned} δjL=====∂zjL∂C∂ajL∂C∂zjL∂ajL∂ajL∂C∂zjL∂[σ(zjL)]∂ajL[21∑k(yk−akL)2]σ′(zjL)(ajL−yj)σ′(zjL)
方程3.2
δ
j
l
=
∂
C
∂
z
j
l
=
∑
k
(
∂
C
∂
z
k
l
+
1
∂
z
k
l
+
1
∂
z
j
l
)
=
∑
k
(
δ
k
l
+
1
∂
z
k
l
+
1
∂
z
j
l
)
\begin{aligned} \delta_j^l=&\frac{\partial C}{\partial z_j^l}\\=&\sum_k\left ( \frac{\partial C}{\partial z_k^{l+1}}\frac{\partial z_k^{l+1}}{\partial z_j^l}\right )\\=&\sum_k\left ( \delta_k^{l+1}\frac{\partial z_k^{l+1}}{\partial z_j^l}\right ) \end{aligned}
δjl===∂zjl∂Ck∑(∂zkl+1∂C∂zjl∂zkl+1)k∑(δkl+1∂zjl∂zkl+1)
因为
z
k
l
+
1
=
∑
j
(
w
k
j
l
+
1
a
j
l
+
b
k
l
+
1
)
=
∑
j
(
w
k
j
l
+
1
σ
(
z
j
l
)
+
b
k
l
+
1
)
\begin{aligned} z_k^{l+1}=&\sum_j(w_{kj}^{l+1}a_j^l+b_k^{l+1})\\=&\sum_j(w_{kj}^{l+1}\sigma(z_j^l)+b_k^{l+1}) \end{aligned}
zkl+1==j∑(wkjl+1ajl+bkl+1)j∑(wkjl+1σ(zjl)+bkl+1)
有
∂
z
k
l
+
1
∂
z
j
l
=
w
k
j
l
+
1
σ
′
(
z
j
l
)
\frac{\partial z_k^{l+1}}{\partial z_j^l}=w_{kj}^{l+1}\sigma'(z_j^l)
∂zjl∂zkl+1=wkjl+1σ′(zjl)
所以
δ
j
l
=
σ
′
(
z
j
l
)
∑
k
(
δ
k
l
+
1
w
k
j
l
+
1
)
\delta_j^l=\sigma'(z_j^l)\sum_k\left ( \delta_k^{l+1}w_{kj}^{l+1}\right )
δjl=σ′(zjl)k∑(δkl+1wkjl+1)
方程3.3
∂
C
∂
b
j
l
=
∂
C
∂
z
j
l
∂
z
j
l
∂
b
j
l
=
δ
j
l
∂
z
j
l
∂
b
j
l
\begin{aligned} \frac{\partial C}{\partial b_j^l}=&\frac{\partial C}{\partial z_j^l}\frac{\partial z_j^l}{\partial b_j^l}\\=&\delta_j^l\frac{\partial z_j^l}{\partial b_j^l} \end{aligned}
∂bjl∂C==∂zjl∂C∂bjl∂zjlδjl∂bjl∂zjl
因为
z
j
l
=
∑
k
(
w
j
k
l
a
k
l
−
1
+
b
j
l
)
z_j^l=\sum_k\left (w_{jk}^la_k^{l-1}+b_j^l\right )
zjl=k∑(wjklakl−1+bjl)
有
∂
z
j
l
∂
b
j
l
=
1
\frac{\partial z_j^l}{\partial b_j^l}=1
∂bjl∂zjl=1
所以
∂
C
∂
b
j
l
=
δ
j
l
\frac{\partial C}{\partial b_j^l}=\delta_j^l
∂bjl∂C=δjl
方程3.4
∂ C ∂ w j k l = ∂ C ∂ z j l ∂ z j l ∂ w j k l = δ j l ∂ z j l ∂ w j k l = δ j l ∂ ( ∑ i ( w j i l a i l − 1 + b j l ) ) ∂ w j k l = δ j l a k l − 1 \begin{aligned} \frac{\partial C}{\partial w_{jk}^l}=&\frac{\partial C}{\partial z_j^l}\frac{\partial z_j^l}{\partial w_{jk}^l}\\=&\delta_j^l\frac{\partial z_j^l}{\partial w_{jk}^l}\\=&\delta_j^l\frac{\partial (\sum_i(w_{ji}^la_i^{l-1}+b_j^l))}{\partial w_{jk}^l}\\=&\delta_j^la_k^{l-1} \end{aligned} ∂wjkl∂C====∂zjl∂C∂wjkl∂zjlδjl∂wjkl∂zjlδjl∂wjkl∂(∑i(wjilail−1+bjl))δjlakl−1