多层感知机BP算法推导

前向计算

含有数据输入层,1个以上隐藏层,1个输出层。各层神经元之间全连接,同一层的神经元之间无连接。
在这里插入图片描述

在图中, z ( l ) = W ( l ) ⋅ a ( l − 1 ) + b ( l ) a ( l ) = f ( l ) ( z ( l ) ) z^{(l)}=W^{(l)}\cdot a^{(l-1)}+b^{(l)}\\ a^{(l)}=f^{(l)}(z^{(l)}) z(l)=W(l)a(l1)+b(l)a(l)=f(l)(z(l))
其中 f ( ⋅ ) f(\cdot) f()是激励函数, a a a是该层的输出值
变量关系:
z 1 = g 1 ( x , W 1 ) z 2 = g 2 ( z 1 , W 2 ) ⋯ z l − 1 = g l − 1 ( z l − 2 , W l − 1 ) z l = g l ( z l − 1 , W l ) z l + 1 = g l + 1 ( z l , W l + 1 ) ⋯ z L = g L ( z L − 1 , W L ) y = f L ( z L ) J ( W , y ) z^{1}=g_{1}(x,W^{1})\\ z^{2}=g_{2}(z^{1},W^{2})\\ \cdots\\ z^{l-1}=g_{l-1}(z^{l-2},W^{l-1})\\ z^{l}=g_{l}(z^{l-1},W^{l})\\ z^{l+1}=g_{l+1}(z^{l},W^{l+1})\\ \cdots\\ z^{L}=g_{L}(z^{L-1},W^{L})\\ y=f_{L}(z^{L})\\ J(W,y) z1=g1(x,W1)z2=g2(z1,W2)zl1=gl1(zl2,Wl1)zl=gl(zl1,Wl)zl+1=gl+1(zl,Wl+1)zL=gL(zL1,WL)y=fL(zL)J(W,y)
变量依赖:
J ( W , y ) J(W,y) J(W,y) x x x的依赖关系: J ( W , y ) = J ( W , f ( g L ( . . . g 2 ( g 1 ( x , W 1 ) , W 2 ) . . . , W L ) ) J(W,y)=J(W,f(g_{L}(...g_{2}(g_{1}(x,W^{1}),W^{2})...,W^{L})) J(W,y)=J(W,f(gL(...g2(g1(x,W1),W2)...,WL))
J ( W , y ) J(W,y) J(W,y) z 1 z^{1} z1的依赖关系: J ( W , y ) = J ( W , f ( g L ( . . . g 2 ( z 1 , W 2 ) . . . , W L ) ) J(W,y)=J(W,f(g_{L}(...g_{2}(z^{1},W^{2})...,W^{L})) J(W,y)=J(W,f(gL(...g2(z1,W2)...,WL))
J ( W , y ) J(W,y) J(W,y) z 2 z^{2} z2的依赖关系: J ( W , y ) = J ( W , f ( g L ( . . g 3 ( z 2 , W 3 ) . . . , W L ) ) J(W,y)=J(W,f(g_{L}(..g_{3}(z^{2},W^{3})...,W^{L})) J(W,y)=J(W,f(gL(..g3(z2,W3)...,WL))
… …
J ( W , y ) J(W,y) J(W,y) z l z^{l} zl的依赖关系: J ( W , y ) = J ( W , f ( g L ( . . g l + 1 ( z l , W l + 1 ) . . . , W L ) ) J(W,y)=J(W,f(g_{L}(..g_{l+1}(z^{l},W^{l+1})...,W^{L})) J(W,y)=J(W,f(gL(..gl+1(zl,Wl+1)...,WL))

反向传播

目标是最小化损失函数,通过梯度下降:
W ( l ) = W ( l ) − α ∂ J ( W , b ) ∂ W ( l ) = W ( l ) − α ∂ 1 N ∑ i = 1 N J ( W , b ; x ( i ) , y ( i ) ) ∂ W ( l ) b ( l ) = b ( l ) − α ∂ J ( W , b ) ∂ b ( l ) = b ( l ) − α ∂ 1 N ∑ i = 1 N J ( W , b ; x ( i ) , y ( i ) ) ∂ b ( l ) W^{(l)}=W^{(l)}-\alpha \frac{\partial J(W,\bm{b})}{\partial W^{(l)}} =W^{(l)}-\alpha \frac{\partial \frac{1}{N}\sum_{i=1}^{N}J(W,\bm{b};\bm{x}^{(i)},y^{(i)})}{\partial W^{(l)}}\\ \bm{b}^{(l)}=\bm{b}^{(l)}-\alpha \frac{\partial J(W,\bm{b})}{\partial \bm{b}^{(l)}} =\bm{b}^{(l)}-\alpha \frac{\partial \frac{1}{N}\sum_{i=1}^{N}J(W,\bm{b};\bm{x}^{(i)},y^{(i)})}{\partial \bm{b}^{(l)}} W(l)=W(l)αW(l)J(W,b)=W(l)αW(l)N1i=1NJ(W,b;x(i),y(i))b(l)=b(l)αb(l)J(W,b)=b(l)αb(l)N1i=1NJ(W,b;x(i),y(i))
局部梯度迭代:
在这里插入图片描述
l l l z l z^{l} zl的梯度为 δ ( l ) \delta^{(l)} δ(l):
δ ( l ) = ∂ J ( W , b ; x , y ) ∂ z ( l ) = ∂ z ( l + 1 ) ∂ z ( l ) ⋅ ∂ J ( W , b ; x , y ) ∂ z ( l + 1 ) = ∂ a ( l ) ∂ z ( l ) ⋅ ∂ z ( l + 1 ) ∂ a ( l ) ⋅ ∂ J ( W , b ; x , y ) ∂ z ( l + 1 ) = ∂ a ( l ) ∂ z ( l ) ⋅ ∂ z ( l + 1 ) ∂ a ( l ) ⋅ δ ( l + 1 ) \delta^{(l)}=\frac{\partial J(W,b;x,y)}{\partial z^{(l)}}=\frac{\partial z^{(l+1)}}{\partial z^{(l)}}\cdot \frac{\partial J(W,b;x,y)}{\partial z^{(l+1)}}\\ =\frac{\partial a^{(l)}}{\partial z^{(l)}}\cdot \frac{\partial z^{(l+1)}}{\partial a^{(l)}}\cdot \frac{\partial J(W,b;x,y)}{\partial z^{(l+1)}}\\ =\frac{\partial a^{(l)}}{\partial z^{(l)}}\cdot \frac{\partial z^{(l+1)}}{\partial a^{(l)}}\cdot \delta^{(l+1)} δ(l)=z(l)J(W,b;x,y)=z(l)z(l+1)z(l+1)J(W,b;x,y)=z(l)a(l)a(l)z(l+1)z(l+1)J(W,b;x,y)=z(l)a(l)a(l)z(l+1)δ(l+1)
上述的形式是矩阵优化的形式,下面求具体的某一个连接参数的优化迭代式:
l + 1 l+1 l+1层的梯度 δ ( l + 1 ) \delta^{(l+1)} δ(l+1)已知,求此时的 l l l层的梯度 δ ( l ) \delta^{(l)} δ(l)
在这里插入图片描述对于第 j j j个神经元输出值 z j ( l + 1 ) = ∑ i a i ( l ) w i j ( l + 1 ) = ∑ i f i ( l ) ( z i ( l ) ) w i j ( l + 1 ) z_{j}^{(l+1)}=\sum_{i}a_{i}^{(l)}w_{ij}^{(l+1)}=\sum_{i}f_{i}^{(l)}(z_{i}^{(l)})w_{ij}^{(l+1)} zj(l+1)=iai(l)wij(l+1)=ifi(l)(zi(l))wij(l+1)
在这里插入图片描述
由上式可得到:
∂ z j ( l + 1 ) ∂ z i ( l ) = ∂ a i l ∂ z i ( l ) ⋅ ∂ z j ( l + 1 ) ∂ a i ( l ) = f i ′ ( l ) ( z i ( l ) ) w i j ( l + 1 ) \frac{\partial z_{j}^{(l+1)}}{\partial z_{i}^{(l)}}=\frac{\partial a_{i}^{l}}{\partial z_{i}^{(l)}}\cdot \frac{\partial z_{j}^{(l+1)}}{\partial a_{i}^{(l)}}=f_{i}^{'(l)}(z_{i}^{(l)})w_{ij}^{(l+1)} zi(l)zj(l+1)=zi(l)ailai(l)zj(l+1)=fi(l)(zi(l))wij(l+1)
l l l层第 i i i个输出值 z i ( l ) z_{i}^{(l)} zi(l)的梯度为:
δ i ( l ) = ∂ L ∂ z i ( l ) = ∑ j ∂ z j ( l + 1 ) ∂ z i ( l ) ∂ L ∂ z j ( l + 1 ) = ∑ j ∂ z j ( l + 1 ) ∂ z i ( l ) δ j ( l + 1 ) = ∑ j f i ′ ( l ) ( z i ( l ) ) w i j ( l + 1 ) δ j ( l + 1 ) = f i ′ ( l ) ( z i ( l ) ) ∑ j w i j ( l + 1 ) δ j ( l + 1 ) \delta_{i}^{(l)}=\frac{\partial L}{\partial z_{i}^{(l)}}=\sum_{j}\frac{\partial z_{j}^{(l+1)}}{\partial z_{i}^{(l)}}\frac{\partial L}{\partial z_{j}^{(l+1)}}=\sum_{j}\frac{\partial z_{j}^{(l+1)}}{\partial z_{i}^{(l)}}\delta_{j}^{(l+1)}\\ =\sum_{j}f_{i}^{'(l)}(z_{i}^{(l)})w_{ij}^{(l+1)}\delta_{j}^{(l+1)}=f_{i}^{'(l)}(z_{i}^{(l)})\sum_{j}w_{ij}^{(l+1)}\delta_{j}^{(l+1)} δi(l)=zi(l)L=jzi(l)zj(l+1)zj(l+1)L=jzi(l)zj(l+1)δj(l+1)=jfi(l)(zi(l))wij(l+1)δj(l+1)=fi(l)(zi(l))jwij(l+1)δj(l+1)
最后一层输出层的梯度为: δ o ( L ) = ∂ L ∂ z o ( L ) = ∂ a o L ∂ z o ( L ) ∂ L ∂ a o L = f o ′ ( L ) ( z 0 ( L ) ) ∂ L ∂ a o L \delta_{o}^{(L)}=\frac{\partial L}{\partial z_{o}^{(L)}}=\frac{\partial a_{o}^{L}}{\partial z_{o}^{(L)}}\frac{\partial L}{\partial a_{o}^{L}}=f_{o}^{'(L)}(z_{0}^{(L)})\frac{\partial L}{\partial a_{o}^{L}} δo(L)=zo(L)L=zo(L)aoLaoLL=fo(L)(z0(L))aoLL
梯度更新沿着网络反向计算:
在这里插入图片描述
求解 z i ( l ) z_{i}^{(l)} zi(l)对应的权重 { w k i ( l ) } k = 1 K ( K \{w_{ki}^{(l)}\}_{k=1}^{K}(K {wki(l)}k=1K(K表示 l − 1 l-1 l1层的神经元个数)和偏置 b i ( l ) b_{i}^{(l)} bi(l)的梯度:
∂ J ∂ w k i ( l ) = ∂ z i ( l ) ∂ w k i ( l ) ∂ J ∂ z i ( l ) = a k ( l − 1 ) δ i ( l )   ∂ J ∂ b i ( l ) = ∂ z i ( l ) ∂ b i ( l ) ∂ J ∂ z i ( l ) = δ i ( l ) \frac{\partial J}{\partial w_{ki}^{(l)}}=\frac{\partial z_{i}^{(l)}}{\partial w_{ki}^{(l)}}\frac{\partial J}{\partial z_{i}^{(l)}}=a_{k}^{(l-1)}\delta_{i}^{(l)}\\ \ \\ \frac{\partial J}{\partial b_{i}^{(l)}}=\frac{\partial z_{i}^{(l)}}{\partial b_{i}^{(l)}}\frac{\partial J}{\partial z_{i}^{(l)}}=\delta_{i}^{(l)} wki(l)J=wki(l)zi(l)zi(l)J=ak(l1)δi(l) bi(l)J=bi(l)zi(l)zi(l)J=δi(l)
可总结出BP算法的一般步骤。

MLP的BP算法的步骤

(1)前向计算,并记录 z i ( l ) z_{i}^{(l)} zi(l)
(2)反向计算 z i ( l ) z_{i}^{(l)} zi(l)的梯度 δ i ( l ) \delta_{i}^{(l)} δi(l):
先计算输出层: δ o L = f o ′ ( L ) ( z 0 ( L ) ) ∂ L ∂ a o L \delta_{o}^{L}=f_{o}^{'(L)}(z_{0}^{(L)})\frac{\partial L}{\partial a_{o}^{L}} δoL=fo(L)(z0(L))aoLL
从后向前依次计算:
δ i ( l ) = f i ′ ( l ) ( z i ( l ) ) ∑ j w i j ( l + 1 ) δ j ( l + 1 ) \delta_{i}^{(l)}=f_{i}^{'(l)}(z_{i}^{(l)})\sum_{j}w_{ij}^{(l+1)}\delta_{j}^{(l+1)} δi(l)=fi(l)(zi(l))jwij(l+1)δj(l+1)
(3)计算权重和偏置参数的梯度:
∂ J ∂ w k i ( l ) = ∂ z i ( l ) ∂ w k i ( l ) ∂ J ∂ z i ( l ) = a k ( l − 1 ) δ i ( l )   ∂ J ∂ b i ( l ) = ∂ z i ( l ) ∂ b i ( l ) ∂ J ∂ z i ( l ) = δ i ( l ) \frac{\partial J}{\partial w_{ki}^{(l)}}=\frac{\partial z_{i}^{(l)}}{\partial w_{ki}^{(l)}}\frac{\partial J}{\partial z_{i}^{(l)}}=a_{k}^{(l-1)}\delta_{i}^{(l)}\\ \ \\ \frac{\partial J}{\partial b_{i}^{(l)}}=\frac{\partial z_{i}^{(l)}}{\partial b_{i}^{(l)}}\frac{\partial J}{\partial z_{i}^{(l)}}=\delta_{i}^{(l)} wki(l)J=wki(l)zi(l)zi(l)J=ak(l1)δi(l) bi(l)J=bi(l)zi(l)zi(l)J=δi(l)

已标记关键词 清除标记
相关推荐
©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页