# 多层感知机BP算法推导

## 前向计算

z 1 = g 1 ( x , W 1 ) z 2 = g 2 ( z 1 , W 2 ) ⋯ z l − 1 = g l − 1 ( z l − 2 , W l − 1 ) z l = g l ( z l − 1 , W l ) z l + 1 = g l + 1 ( z l , W l + 1 ) ⋯ z L = g L ( z L − 1 , W L ) y = f L ( z L ) J ( W , y ) z^{1}=g_{1}(x,W^{1})\\ z^{2}=g_{2}(z^{1},W^{2})\\ \cdots\\ z^{l-1}=g_{l-1}(z^{l-2},W^{l-1})\\ z^{l}=g_{l}(z^{l-1},W^{l})\\ z^{l+1}=g_{l+1}(z^{l},W^{l+1})\\ \cdots\\ z^{L}=g_{L}(z^{L-1},W^{L})\\ y=f_{L}(z^{L})\\ J(W,y)

J ( W , y ) J(W,y) x x 的依赖关系： J ( W , y ) = J ( W , f ( g L ( . . . g 2 ( g 1 ( x , W 1 ) , W 2 ) . . . , W L ) ) J(W,y)=J(W,f(g_{L}(...g_{2}(g_{1}(x,W^{1}),W^{2})...,W^{L}))
J ( W , y ) J(W,y) z 1 z^{1} 的依赖关系： J ( W , y ) = J ( W , f ( g L ( . . . g 2 ( z 1 , W 2 ) . . . , W L ) ) J(W,y)=J(W,f(g_{L}(...g_{2}(z^{1},W^{2})...,W^{L}))
J ( W , y ) J(W,y) z 2 z^{2} 的依赖关系： J ( W , y ) = J ( W , f ( g L ( . . g 3 ( z 2 , W 3 ) . . . , W L ) ) J(W,y)=J(W,f(g_{L}(..g_{3}(z^{2},W^{3})...,W^{L}))
… …
J ( W , y ) J(W,y) z l z^{l} 的依赖关系： J ( W , y ) = J ( W , f ( g L ( . . g l + 1 ( z l , W l + 1 ) . . . , W L ) ) J(W,y)=J(W,f(g_{L}(..g_{l+1}(z^{l},W^{l+1})...,W^{L}))

## 反向传播

W ( l ) = W ( l ) − α ∂ J ( W , b ) ∂ W ( l ) = W ( l ) − α ∂ 1 N ∑ i = 1 N J ( W , b ; x ( i ) , y ( i ) ) ∂ W ( l ) b ( l ) = b ( l ) − α ∂ J ( W , b ) ∂ b ( l ) = b ( l ) − α ∂ 1 N ∑ i = 1 N J ( W , b ; x ( i ) , y ( i ) ) ∂ b ( l ) W^{(l)}=W^{(l)}-\alpha \frac{\partial J(W,\bm{b})}{\partial W^{(l)}} =W^{(l)}-\alpha \frac{\partial \frac{1}{N}\sum_{i=1}^{N}J(W,\bm{b};\bm{x}^{(i)},y^{(i)})}{\partial W^{(l)}}\\ \bm{b}^{(l)}=\bm{b}^{(l)}-\alpha \frac{\partial J(W,\bm{b})}{\partial \bm{b}^{(l)}} =\bm{b}^{(l)}-\alpha \frac{\partial \frac{1}{N}\sum_{i=1}^{N}J(W,\bm{b};\bm{x}^{(i)},y^{(i)})}{\partial \bm{b}^{(l)}}

l l z l z^{l} 的梯度为 δ ( l ) \delta^{(l)} :
δ ( l ) = ∂ J ( W , b ; x , y ) ∂ z ( l ) = ∂ z ( l + 1 ) ∂ z ( l ) ⋅ ∂ J ( W , b ; x , y ) ∂ z ( l + 1 ) = ∂ a ( l ) ∂ z ( l ) ⋅ ∂ z ( l + 1 ) ∂ a ( l ) ⋅ ∂ J ( W , b ; x , y ) ∂ z ( l + 1 ) = ∂ a ( l ) ∂ z ( l ) ⋅ ∂ z ( l + 1 ) ∂ a ( l ) ⋅ δ ( l + 1 ) \delta^{(l)}=\frac{\partial J(W,b;x,y)}{\partial z^{(l)}}=\frac{\partial z^{(l+1)}}{\partial z^{(l)}}\cdot \frac{\partial J(W,b;x,y)}{\partial z^{(l+1)}}\\ =\frac{\partial a^{(l)}}{\partial z^{(l)}}\cdot \frac{\partial z^{(l+1)}}{\partial a^{(l)}}\cdot \frac{\partial J(W,b;x,y)}{\partial z^{(l+1)}}\\ =\frac{\partial a^{(l)}}{\partial z^{(l)}}\cdot \frac{\partial z^{(l+1)}}{\partial a^{(l)}}\cdot \delta^{(l+1)}

l + 1 l+1 层的梯度 δ ( l + 1 ) \delta^{(l+1)} 已知，求此时的 l l 层的梯度 δ ( l ) \delta^{(l)}

∂ z j ( l + 1 ) ∂ z i ( l ) = ∂ a i l ∂ z i ( l ) ⋅ ∂ z j ( l + 1 ) ∂ a i ( l ) = f i ′ ( l ) ( z i ( l ) ) w i j ( l + 1 ) \frac{\partial z_{j}^{(l+1)}}{\partial z_{i}^{(l)}}=\frac{\partial a_{i}^{l}}{\partial z_{i}^{(l)}}\cdot \frac{\partial z_{j}^{(l+1)}}{\partial a_{i}^{(l)}}=f_{i}^{&#x27;(l)}(z_{i}^{(l)})w_{ij}^{(l+1)}
l l 层第 i i 个输出值 z i ( l ) z_{i}^{(l)} 的梯度为：
δ i ( l ) = ∂ L ∂ z i ( l ) = ∑ j ∂ z j ( l + 1 ) ∂ z i ( l ) ∂ L ∂ z j ( l + 1 ) = ∑ j ∂ z j ( l + 1 ) ∂ z i ( l ) δ j ( l + 1 ) = ∑ j f i ′ ( l ) ( z i ( l ) ) w i j ( l + 1 ) δ j ( l + 1 ) = f i ′ ( l ) ( z i ( l ) ) ∑ j w i j ( l + 1 ) δ j ( l + 1 ) \delta_{i}^{(l)}=\frac{\partial L}{\partial z_{i}^{(l)}}=\sum_{j}\frac{\partial z_{j}^{(l+1)}}{\partial z_{i}^{(l)}}\frac{\partial L}{\partial z_{j}^{(l+1)}}=\sum_{j}\frac{\partial z_{j}^{(l+1)}}{\partial z_{i}^{(l)}}\delta_{j}^{(l+1)}\\ =\sum_{j}f_{i}^{&#x27;(l)}(z_{i}^{(l)})w_{ij}^{(l+1)}\delta_{j}^{(l+1)}=f_{i}^{&#x27;(l)}(z_{i}^{(l)})\sum_{j}w_{ij}^{(l+1)}\delta_{j}^{(l+1)}

∂ J ∂ w k i ( l ) = ∂ z i ( l ) ∂ w k i ( l ) ∂ J ∂ z i ( l ) = a k ( l − 1 ) δ i ( l )   ∂ J ∂ b i ( l ) = ∂ z i ( l ) ∂ b i ( l ) ∂ J ∂ z i ( l ) = δ i ( l ) \frac{\partial J}{\partial w_{ki}^{(l)}}=\frac{\partial z_{i}^{(l)}}{\partial w_{ki}^{(l)}}\frac{\partial J}{\partial z_{i}^{(l)}}=a_{k}^{(l-1)}\delta_{i}^{(l)}\\ \ \\ \frac{\partial J}{\partial b_{i}^{(l)}}=\frac{\partial z_{i}^{(l)}}{\partial b_{i}^{(l)}}\frac{\partial J}{\partial z_{i}^{(l)}}=\delta_{i}^{(l)}

## MLP的BP算法的步骤

（1）前向计算，并记录 z i ( l ) z_{i}^{(l)}
（2）反向计算 z i ( l ) z_{i}^{(l)} 的梯度 δ i ( l ) \delta_{i}^{(l)} :

δ i ( l ) = f i ′ ( l ) ( z i ( l ) ) ∑ j w i j ( l + 1 ) δ j ( l + 1 ) \delta_{i}^{(l)}=f_{i}^{&#x27;(l)}(z_{i}^{(l)})\sum_{j}w_{ij}^{(l+1)}\delta_{j}^{(l+1)}
（3）计算权重和偏置参数的梯度：
∂ J ∂ w k i ( l ) = ∂ z i ( l ) ∂ w k i ( l ) ∂ J ∂ z i ( l ) = a k ( l − 1 ) δ i ( l )   ∂ J ∂ b i ( l ) = ∂ z i ( l ) ∂ b i ( l ) ∂ J ∂ z i ( l ) = δ i ( l ) \frac{\partial J}{\partial w_{ki}^{(l)}}=\frac{\partial z_{i}^{(l)}}{\partial w_{ki}^{(l)}}\frac{\partial J}{\partial z_{i}^{(l)}}=a_{k}^{(l-1)}\delta_{i}^{(l)}\\ \ \\ \frac{\partial J}{\partial b_{i}^{(l)}}=\frac{\partial z_{i}^{(l)}}{\partial b_{i}^{(l)}}\frac{\partial J}{\partial z_{i}^{(l)}}=\delta_{i}^{(l)}

07-12 465
12-07 2176
09-22 938
01-15 1万+
03-24 119
08-03 2586
05-23 619
06-23 6万+
05-29 4102
01-11 9561
09-16 3823
08-05 1381
03-09 6503
06-01 2万+