BP算法的原理解释和推导

BP算法的原理解释和推导

已知的神经网络结构:
在这里插入图片描述

且已知的条件:

  • a ( j ) = f ( z ( j ) ) \mathbf{a}^{\left( \mathbf{j} \right)}=\mathbf{f}\left( \mathbf{z}^{\left( \mathbf{j} \right)} \right) a(j)=f(z(j))
  • z ( j ) = W ( j ) a ( j − 1 ) + b ( j ) ,而 θ ( j ) = { W ( j ) , b ( j ) } \mathbf{z}^{\left( \mathbf{j} \right)}=\mathbf{W}^{\left( \mathbf{j} \right)}\mathbf{a}^{\left( \mathbf{j}-1 \right)}+\mathbf{b}^{\left( \mathbf{j} \right)}\text{,而}\mathbf{\theta }^{\left( \mathbf{j} \right)}=\left\{ \mathbf{W}^{\left( \mathbf{j} \right)},\mathbf{b}^{\left( \mathbf{j} \right)} \right\} z(j)=W(j)a(j1)+b(j),而θ(j)={W(j),b(j)}

对于上图,如果我们想得到 ∂ l ∂ θ ( j ) \frac{\partial \mathbf{l}}{\partial \mathbf{\theta }^{\left( \mathbf{j} \right)}} θ(j)l,可以通过 z ( j ) \mathbf{z}^{\left( \mathbf{j} \right)} z(j)建立l和θ(j)之间的联系,即 ∂ l ∂ θ ( j ) = ∂ l ∂ z ( j ) ∗ ∂ z ( j ) ∂ θ ( j ) \frac{\partial \mathbf{l}}{\partial \mathbf{\theta }^{\left( \mathbf{j} \right)}}=\frac{\partial \mathbf{l}}{\partial \mathbf{z}^{\left( \mathbf{j} \right)}}*\frac{\partial \mathbf{z}^{\left( \mathbf{j} \right)}}{\partial \mathbf{\theta }^{\left( \mathbf{j} \right)}} θ(j)l=z(j)lθ(j)z(j),而l和z(j)之间的联系则可以通过z(j+1)进行建立 ∂ l ∂ z ( j ) = ∂ l ∂ z ( j + 1 ) ∗ ∂ z ( j + 1 ) ∂ z ( j ) = ∂ l ∂ z ( j + 1 ) ∗ ∂ z ( j + 1 ) ∂ a ( j ) ∗ ∂ a ( j ) ∂ z ( j ) \frac{\partial \mathbf{l}}{\partial \mathbf{z}^{\left( \mathbf{j} \right)}}=\frac{\partial \mathbf{l}}{\partial \mathbf{z}^{\left( \mathbf{j}+1 \right)}}*\frac{\partial \mathbf{z}^{\left( \mathbf{j}+1 \right)}}{\partial \mathbf{z}^{\left( \mathbf{j} \right)}}=\frac{\partial \mathbf{l}}{\partial \mathbf{z}^{\left( \mathbf{j}+1 \right)}}*\frac{\partial \mathbf{z}^{\left( \mathbf{j}+1 \right)}}{\partial \mathbf{a}^{\left( \mathbf{j} \right)}}*\frac{\partial \mathbf{a}^{\left( \mathbf{j} \right)}}{\partial \mathbf{z}^{\left( \mathbf{j} \right)}} z(j)l=z(j+1)lz(j)z(j+1)=z(j+1)la(j)z(j+1)z(j)a(j),由此,我们得到 ∂ l ∂ θ ( j ) = ∂ l ∂ z ( j + 1 ) ∗ ∂ z ( j + 1 ) ∂ a ( j ) ∗ ∂ a ( j ) ∂ z ( j ) ∗ ∂ z ( j ) ∂ θ ( j ) \frac{\partial \mathbf{l}}{\partial \mathbf{\theta }^{\left( \mathbf{j} \right)}}=\frac{\partial \mathbf{l}}{\partial \mathbf{z}^{\left( \mathbf{j}+1 \right)}}*\frac{\partial \mathbf{z}^{\left( \mathbf{j}+1 \right)}}{\partial \mathbf{a}^{\left( \mathbf{j} \right)}}*\frac{\partial \mathbf{a}^{\left( \mathbf{j} \right)}}{\partial \mathbf{z}^{\left( \mathbf{j} \right)}}*\frac{\partial \mathbf{z}^{\left( \mathbf{j} \right)}}{\partial \mathbf{\theta }^{\left( \mathbf{j} \right)}} θ(j)l=z(j+1)la(j)z(j+1)z(j)a(j)θ(j)z(j)(链式求导法则),然后不断的迭代求导下去。

这里我们细心观察下式:

在这里插入图片描述

其中, ∂ z ( j + 1 ) ∂ a ( j ) = w ( j + 1 ) \frac{\partial \mathbf{z}^{\left( \mathbf{j}+1 \right)}}{\partial \mathbf{a}^{\left( \mathbf{j} \right)}}=\mathbf{w}^{\left( \mathbf{j}+1 \right)} a(j)z(j+1)=w(j+1),而 ∂ a ( j ) ∂ z ( j ) = f ′ ( z ( j ) ) \frac{\partial \mathbf{a}^{\left( \mathbf{j} \right)}}{\partial \mathbf{z}^{\left( \mathbf{j} \right)}}=\mathbf{f}'\left( \mathbf{z}^{\left( \mathbf{j} \right)} \right) z(j)a(j)=f(z(j))。然后,我们将这两个式子代入上式,得到了一个新的式子:

在这里插入图片描述

∂ l ∂ W ( j ) \frac{\partial \mathbf{l}}{\partial \mathbf{W}^{\left( \mathbf{j} \right)}} W(j)l ∂ l ∂ b ( j ) \frac{\partial \mathbf{l}}{\partial \mathbf{b}^{\left( \mathbf{j} \right)}} b(j)l是什么样子的呢?

在这里插入图片描述

此时,让我们来分析一个相对复杂一些的神经网络结构的BackPropagation过程:

在这里插入图片描述

且已知条件:

  • l = l ( h ) \mathbf{l}=\mathbf{l}\left( \mathbf{h} \right) l=l(h)
  • h = f ( w 1 , 1 ( 3 ) a 1 ( 2 ) + w 2 , 1 ( 3 ) a 2 ( 2 ) )    = f ( w 1 , 1 ( 3 ) f ( z 1 ( 2 ) ) + w 2 , 1 ( 3 ) f ( z 2 ( 2 ) ) )    = f ( w 1 , 1 ( 3 ) f ( w 1 , 1 ( 2 ) f ( z 1 ( 1 ) ) ) + w 2 , 1 ( 3 ) f ( w 2 , 1 ( 2 ) f ( z 1 ( 1 ) ) ) ) \mathbf{h}=\mathbf{f}\left( \mathbf{w}_{1,1}^{\left( 3 \right)}\mathbf{a}_{1}^{\left( 2 \right)}+\mathbf{w}_{2,1}^{\left( 3 \right)}\mathbf{a}_{2}^{\left( 2 \right)} \right) \\\,\, =\mathbf{f}\left( \mathbf{w}_{1,1}^{\left( 3 \right)}\mathbf{f}\left( \mathbf{z}_{1}^{\left( 2 \right)} \right) +\mathbf{w}_{2,1}^{\left( 3 \right)}\mathbf{f}\left( \mathbf{z}_{2}^{\left( 2 \right)} \right) \right) \\\,\, =\mathbf{f}\left( \mathbf{w}_{1,1}^{\left( 3 \right)}\mathbf{f}\left( \mathbf{w}_{1,1}^{\left( 2 \right)}\mathbf{f}\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) \right) +\mathbf{w}_{2,1}^{\left( 3 \right)}\mathbf{f}\left( \mathbf{w}_{2,1}^{\left( 2 \right)}\mathbf{f}\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) \right) \right) h=f(w1,1(3)a1(2)+w2,1(3)a2(2))=f(w1,1(3)f(z1(2))+w2,1(3)f(z2(2)))=f(w1,1(3)f(w1,1(2)f(z1(1)))+w2,1(3)f(w2,1(2)f(z1(1))))

此时,我们令 g 1 ( z 1 ( 1 ) ) = w 1 , 1 ( 3 ) f ( w 1 , 1 ( 2 ) f ( z 1 ( 1 ) ) ) \mathbf{g}_1\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) =\mathbf{w}_{1,1}^{\left( 3 \right)}\mathbf{f}\left( \mathbf{w}_{1,1}^{\left( 2 \right)}\mathbf{f}\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) \right) g1(z1(1))=w1,1(3)f(w1,1(2)f(z1(1))) g 2 ( z 1 ( 1 ) ) = w 2 , 1 ( 3 ) f ( w 2 , 1 ( 2 ) f ( z 1 ( 1 ) ) ) \mathbf{g}_2\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) =\mathbf{w}_{2,1}^{\left( 3 \right)}\mathbf{f}\left( \mathbf{w}_{2,1}^{\left( 2 \right)}\mathbf{f}\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) \right) g2(z1(1))=w2,1(3)f(w2,1(2)f(z1(1))),然后我们将上面h的表达式进行转换:

  • h = f ( g 1 ( z 1 ( 1 ) ) + g 2 ( z 1 ( 1 ) ) ) \mathbf{h}=\mathbf{f}\left( \mathbf{g}_1\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) +\mathbf{g}_2\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) \right) h=f(g1(z1(1))+g2(z1(1)))

然后,我们求解 ∂ h ∂ z 1 ( 1 ) \frac{\partial \mathbf{h}}{\partial \mathbf{z}_{1}^{\left( 1 \right)}} z1(1)h,来接着分析化简:

  • ∂ h ∂ z 1 ( 1 ) = ∂ h ∂ g 1 ∗ ∂ g 1 ∂ z 1 ( 1 ) + ∂ h ∂ g 2 ∗ ∂ g 2 ∂ z 1 ( 1 )    = ∂ g 1 ∂ z 1 ( 2 ) w 1 , 1 ( 2 ) f ′ ( z 1 ( 1 ) ) + ∂ g 2 ∂ z 2 ( 2 ) w 2 , 1 ( 2 ) f ′ ( z 1 ( 1 ) )    = [ ∂ g 1 ∂ z 1 ( 2 ) w 1 , 1 ( 2 ) + ∂ g 2 ∂ z 2 ( 2 ) w 2 , 1 ( 2 ) ] f ′ ( z 1 ( 1 ) ) \frac{\partial \mathbf{h}}{\partial \mathbf{z}_{1}^{\left( 1 \right)}}=\frac{\partial \mathbf{h}}{\partial \mathbf{g}_1}*\frac{\partial \mathbf{g}_1}{\partial \mathbf{z}_{1}^{\left( 1 \right)}}+\frac{\partial \mathbf{h}}{\partial \mathbf{g}_2}*\frac{\partial \mathbf{g}_2}{\partial \mathbf{z}_{1}^{\left( 1 \right)}}\\\,\, =\frac{\partial \mathbf{g}_1}{\partial \mathbf{z}_{1}^{\left( 2 \right)}}\mathbf{w}_{1,1}^{\left( 2 \right)}\mathbf{f}'\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) +\frac{\partial \mathbf{g}_2}{\partial \mathbf{z}_{2}^{\left( 2 \right)}}\mathbf{w}_{2,1}^{\left( 2 \right)}\mathbf{f}'\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) \\\,\, =\left[ \frac{\partial \mathbf{g}_1}{\partial \mathbf{z}_{1}^{\left( 2 \right)}}\mathbf{w}_{1,1}^{\left( 2 \right)}+\frac{\partial \mathbf{g}_2}{\partial \mathbf{z}_{2}^{\left( 2 \right)}}\mathbf{w}_{2,1}^{\left( 2 \right)} \right] \mathbf{f}'\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) z1(1)h=g1hz1(1)g1+g2hz1(1)g2=z1(2)g1w1,1(2)f(z1(1))+z2(2)g2w2,1(2)f(z1(1))=[z1(2)g1w1,1(2)+z2(2)g2w2,1(2)]f(z1(1))
  • 进而得到迭代关系: δ 1 ( 1 ) = [ δ 1 ( 2 ) w 1 , 1 ( 2 ) + δ 2 ( 2 ) w 2 , 1 ( 2 ) ] f ′ ( z 1 ( 1 ) ) \mathbf{\delta }_{1}^{\left( 1 \right)}=\left[ \mathbf{\delta }_{1}^{\left( 2 \right)}\mathbf{w}_{1,1}^{\left( 2 \right)}+\mathbf{\delta }_{2}^{\left( 2 \right)}\mathbf{w}_{2,1}^{\left( 2 \right)} \right] \mathbf{f}'\left( \mathbf{z}_{1}^{\left( 1 \right)} \right) δ1(1)=[δ1(2)w1,1(2)+δ2(2)w2,1(2)]f(z1(1))

最后我们便通过上式得到 ∂ h ∂ w 1 ( 1 ) \frac{\partial \mathbf{h}}{\partial \mathbf{w}_{1}^{\left( 1 \right)}} w1(1)h ∂ h ∂ b 1 ( 1 ) \frac{\partial \mathbf{h}}{\partial \mathbf{b}_{1}^{\left( 1 \right)}} b1(1)h,过程如下:

  • ∂ h ∂ w 1 ( 1 ) = ∂ h ∂ z 1 ( 1 ) ∂ z 1 ( 1 ) ∂ w 1 ( 1 ) = δ 1 ( 1 ) a ( 0 ) = δ 1 ( 1 ) x 1 \frac{\partial \mathbf{h}}{\partial \mathbf{w}_{1}^{\left( 1 \right)}}=\frac{\partial \mathbf{h}}{\partial \mathbf{z}_{1}^{\left( 1 \right)}}\frac{\partial \mathbf{z}_{1}^{\left( 1 \right)}}{\partial \mathbf{w}_{1}^{\left( 1 \right)}}=\mathbf{\delta }_{1}^{\left( 1 \right)}\mathbf{a}^{\left( 0 \right)}=\mathbf{\delta }_{1}^{\left( 1 \right)}\mathbf{x}_1 w1(1)h=z1(1)hw1(1)z1(1)=δ1(1)a(0)=δ1(1)x1
  • ∂ h ∂ b 1 ( 1 ) = ∂ h ∂ z 1 ( 1 ) ∂ z 1 ( 1 ) ∂ b 1 ( 1 ) = δ 1 ( 1 ) \frac{\partial \mathbf{h}}{\partial \mathbf{b}_{1}^{\left( 1 \right)}}=\frac{\partial \mathbf{h}}{\partial \mathbf{z}_{1}^{\left( 1 \right)}}\frac{\partial \mathbf{z}_{1}^{\left( 1 \right)}}{\partial \mathbf{b}_{1}^{\left( 1 \right)}}=\mathbf{\delta }_{1}^{\left( 1 \right)} b1(1)h=z1(1)hb1(1)z1(1)=δ1(1)

通过归纳 δ ( j ) \mathbf{\delta }^{\left( \mathbf{j} \right)} δ(j) δ ( j + 1 ) \mathbf{\delta }^{\left( \mathbf{j}+1 \right)} δ(j+1)之间的关系,我们得到了 一个特别重要也是最重要的BP公式

  • δ ( j ) = f ′ ( z i ( j ) ) ∗ [ ∑ k = 1 N j + 1 w k , l ( j + 1 ) δ k ( j + 1 ) ] \mathbf{\delta }^{\left( \mathbf{j} \right)}=\mathbf{f}'\left( \mathbf{z}_{\mathbf{i}}^{\left( \mathbf{j} \right)} \right) *\left[ \sum_{\mathbf{k}=1}^{\mathbf{N}_{\mathbf{j}+1}}{\mathbf{w}_{\mathbf{k},\mathbf{l}}^{\left( \mathbf{j}+1 \right)}\mathbf{\delta }_{\mathbf{k}}^{\left( \mathbf{j}+1 \right)}} \right] δ(j)=f(zi(j))[k=1Nj+1wk,l(j+1)δk(j+1)]

如图所示:

  • 在这里插入图片描述

其中 w k , l ( j + 1 ) \mathbf{w}_{\mathbf{k},\mathbf{l}}^{\left( \mathbf{j}+1 \right)} wk,l(j+1)由记录值直接代入即可, δ k ( j + 1 ) \mathbf{\delta }_{\mathbf{k}}^{\left( \mathbf{j}+1 \right)} δk(j+1)是由 δ k ( j + 2 ) \mathbf{\delta }_{\mathbf{k}}^{\left( \mathbf{j}+2 \right)} δk(j+2)反向传播得到的,而 f ′ ( z i ( j ) ) \mathbf{f}'\left( \mathbf{z}_{\mathbf{i}}^{\left( \mathbf{j} \right)} \right) f(zi(j))是由第j层的激活函数的导数公式代入 z i ( j ) \mathbf{z}_{\mathbf{i}}^{\left( \mathbf{j} \right)} zi(j)计算得到的,以下是常见的几种激活函数以及它们的导数公式:

  • 在这里插入图片描述

但是我们问什么要使用BP算法呢?

  • 在这里插入图片描述

解释:

  • 因为如果没有BP算法,那么我们在计算某一个层的梯度的时候,就需要遍历在它所有的层进行梯度的链式计算,每一个位置的神经元的参数梯度计算都是如此,计算量爆炸!
  • 但是,当我们拥有了BP算法,我们只要从后逐层计算每个位置神经元的参数梯度 δ ( j + 1 ) \mathbf{\delta }^{\left( \mathbf{j+1} \right)} δ(j+1)即可,然后并保存该层所计算出的参数梯度 δ ( j + 1 ) \mathbf{\delta }^{\left( \mathbf{j+1} \right)} δ(j+1),然后接着往前计算出前一层的 δ ( j ) \mathbf{\delta }^{\left( \mathbf{j} \right)} δ(j),依次迭代计算。
  • BP算法的本质是动态规划,核心思想是“之前计算过的结果保存下来,下次计算接着拿出来用,并且发现它们之间的迭代关系,然后大大节省了计算开销。”
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Super__Tiger

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值