深度学习:优化算法之前向传播、链式法则和BP反向传播

本文深入探讨了神经网络中的前向传播和反向传播过程。前向传播将输入数据逐层传递,计算网络的输出,并通过损失函数衡量预测与实际值的差距。反向传播利用链式法则计算损失函数对每个参数的梯度,进而更新权重以减小误差。以一个简单的神经网络为例,展示了从输入到输出的计算过程,以及如何使用链式法则求解导数,最后通过BP算法更新权重。
摘要由CSDN通过智能技术生成

1.前向传播

1.1 概念

  前向传播是指数据输入神经网络中,逐层向前传输,一直运算到输出层为止。
在这里插入图片描述
  经过前向传播,得到的最终结果与真实值之间的误差,这个误差就是损失函数。

1.2 前向传播运算

以一个简单的神经网络为例,激活函数是:sigmoid
在这里插入图片描述
n e t h 1 = w 1 i 1 + w 2 i 2 + b = 0.15 × 0.05 + 0.2 × 0.1 + 0.35 = 0.3775 net_{h1}=w_1i_1+w_2i_2+b=0.15\times0.05+0.2\times0.1+0.35=0.3775 neth1=w1i1+w2i2+b=0.15×0.05+0.2×0.1+0.35=0.3775

n e t h 2 = w 3 i 1 + w 4 i 2 + b = 0.25 × 0.05 + 0.3 × 0.1 + 0.35 = 0.3925 net_{h2}=w_3i_1+w_4i_2+b=0.25\times0.05+0.3\times0.1+0.35=0.3925 neth2=w3i1+w4i2+b=0.25×0.05+0.3×0.1+0.35=0.3925

o u t h 1 = 1 1 + e − x = 1 1 + e − n e t h 1 = 1 1 + e − 0.3775 = 0.5933 out_{h1}=\frac{1}{1+e^{-x}}=\frac{1}{1+e^{-net_{h1}}}=\frac{1}{1+e^{-0.3775}}=0.5933 outh1=1+ex1=1+eneth11=1+e0.37751=0.5933

o u t h 2 = 1 1 + e − x = 1 1 + e − n e t h 2 = 1 1 + e − 0.3925 = 0.5969 out_{h2}=\frac{1}{1+e^{-x}}=\frac{1}{1+e^{-net_{h2}}}=\frac{1}{1+e^{-0.3925}}=0.5969 outh2=1+ex1=1+eneth21=1+e0.39251=0.5969

n e t o 1 = w 5 o u t h 1 + w 6 o u t h 2 + b = 0.4 × 0.5933 + 0.45 × 0.5969 + 0.6 = 1.1059 net_{o1}=w_5out_{h_1}+w_6out_{h_2}+b=0.4\times0.5933+0.45\times0.5969+0.6=1.1059 neto1=w5outh1+w6outh2+b=0.4×0.5933+0.45×0.5969+0.6=1.1059

n e t o 2 = w 7 o u t h 1 + w 8 o u t h 2 + b = 0.5 × 0.5933 + 0.55 × 0.5969 + 0.6 = 1.2249 net_{o2}=w_7out_{h_1}+w_8out_{h_2}+b=0.5\times0.5933+0.55\times0.5969+0.6=1.2249 neto2=w7outh1+w8outh2+b=0.5×0.5933+0.55×0.5969+0.6=1.2249

o u t o 1 = 1 1 + e − x = 1 1 + e − n e t o 1 = 1 1 + e − 1.1059 = 0.7514 out_{o1}=\frac{1}{1+e^{-x}}=\frac{1}{1+e^{-net_{o1}}}=\frac{1}{1+e^{-1.1059}}=0.7514 outo1=1+ex1=1+eneto11=1+e1.10591=0.7514

o u t o 2 = 1 1 + e − x = 1 1 + e − n e t o 2 = 1 1 + e − 1.2249 = 0.7729 out_{o2}=\frac{1}{1+e^{-x}}=\frac{1}{1+e^{-net_{o2}}}=\frac{1}{1+e^{-1.2249}}=0.7729 outo2=1+ex1=1+eneto21=1+e1.22491=0.7729

E t o t a l = ∑ 1 2 ( t a r g e t − o u t p u t ) 2 E_{total}=\sum\frac{1}{2}(target-output)^2 Etotal=21(targetoutput)2

E t o t a l = E o 1 + E o 2 = 0.2748 + 0.0236 = 0.2984 E_{total}=E_{o1}+E_{o2}=0.2748+0.0236=0.2984 Etotal=Eo1+Eo2=0.2748+0.0236=0.2984

2.链式法则

  对于复杂的复合函数,我们将其拆分为一系列的加减乘除或指数、对数、三角函数等差初等函数,通过链式法则完成复合函数的求导。我们这里以神经网络中常见的复合函数为例说明这个过程,令复合函数 f ( x ; w , b ) f(x;w,b) f(x;w,b)为:
f ( x ; w , b ) = 1 e x p ( − ( w x + b ) ) + 1 f(x;w,b)=\frac{1}{exp\left(-(wx+b)\right)+1} f(x;w,b)=exp((wx+b))+11
其中 x x x是输入数据, w w w是权重, b b b是偏置。我们将复合函数分解为:

函数导数
h 1 = x ⋅ w h_1=x\cdot w h1=xw ∂ h 1 ∂ w = x , ∂ h 1 ∂ x = w \frac{\partial h_1}{\partial w}=x,\frac{\partial h_1}{\partial x}=w wh1=x,xh1=w
h 2 = h 1 + b h_2=h_1 +b h2=h1+b ∂ h 2 ∂ h 1 = 1 , ∂ h 2 ∂ b = 1 \frac{\partial h_2}{\partial h_1}=1,\frac{\partial h_2}{\partial b}=1 h1h2=1,bh2=1
h 3 = − h 2 h_3=-h_2 h3=h2 ∂ h 3 ∂ h 2 = − 1 \frac{\partial h_3}{\partial h_2}=-1 h2h3=1
h 4 = e x p ( h 3 ) h_4=exp(h_3) h4=exp(h3) ∂ h 4 ∂ h 3 = e x p ( h 3 ) \frac{\partial h_4}{\partial h_3}=exp(h_3) h3h4=exp(h3)
h 5 = h 4 + 1 h_5=h_4+1 h5=h4+1 ∂ h 5 ∂ h 4 = 1 \frac{\partial h_5}{\partial h_4}=1 h4h5=1
h 6 = 1 h 5 h_6=\frac{1}{h_5} h6=h51 ∂ h 6 ∂ h 5 = − 1 h 5 2 \frac{\partial h_6}{\partial h_5}=\frac {-1}{h_5^2} h5h6=h521

用图形化表示:
在这里插入图片描述
  整个复合函数 f ( x ; w , b ) f(x;w,b) f(x;w,b)关于参数 w w w b b b的导数可以通过 f ( x ; w , b ) f(x;w,b) f(x;w,b)与参数 w w w b b b之间路径上所有的导数连乘得到:
∂ f ( x ; w , b ) ∂ w = ∂ f ( x ; w , b ) ∂ h 6 ⋅ ∂ h 6 ∂ h 5 ⋅ ∂ h 5 ∂ h 4 ⋅ ∂ h 4 ∂ h 3 ⋅ ∂ h 3 ∂ h 2 ⋅ ∂ h 2 ∂ h 1 ⋅ ∂ h 1 ∂ w \frac{\partial f(x;w,b)}{\partial w}=\frac{\partial f(x;w,b)}{\partial h_6}\cdot \frac{\partial h_6}{\partial h_5}\cdot \frac{\partial h_5}{\partial h_4}\cdot \frac{\partial h_4}{\partial h_3}\cdot \frac{\partial h_3}{\partial h_2}\cdot \frac{\partial h_2}{\partial h_1}\cdot \frac{\partial h_1}{\partial w} wf(x;w,b)=h6f(x;w,b)h5h6h4h5h3h4h2h3h1h2wh1
∂ f ( x ; w , b ) ∂ b = ∂ f ( x ; w , b ) ∂ h 6 ⋅ ∂ h 6 ∂ h 5 ⋅ ∂ h 5 ∂ h 4 ⋅ ∂ h 4 ∂ h 3 ⋅ ∂ h 3 ∂ h 2 ⋅ ∂ h 2 ∂ b \frac{\partial f(x;w,b)}{\partial b}=\frac{\partial f(x;w,b)}{\partial h_6}\cdot \frac{\partial h_6}{\partial h_5}\cdot \frac{\partial h_5}{\partial h_4}\cdot \frac{\partial h_4}{\partial h_3}\cdot \frac{\partial h_3}{\partial h_2}\cdot \frac{\partial h_2}{\partial b} bf(x;w,b)=h6f(x;w,b)h5h6h4h5h3h4h2h3bh2
w w w为例,当 x x x=1, w w w=0, b b b=0时,可以得到:

h 1 = x ⋅ w h_1=x\cdot w h1=xw=0
h 2 = h 1 + b = 0 h_2=h_1+b=0 h2=h1+b=0
h 3 = − h 2 = 0 h_3=-h_2=0 h3=h2=0
h 4 = e x p ( h 3 ) = 1 h_4=exp(h_3)=1 h4=exp(h3)=1
h 5 = h 4 + 1 = 2 h_5=h_4+1=2 h5=h4+1=2
h 6 = 1 h 5 = 1 2 h_6=\frac{1}{h_5}=\frac{1}{2} h6=h51=21
f ( x ; w , b ) = h 6 = 1 2 f(x;w,b)=h_6=\frac{1}{2} f(x;w,b)=h6=21

∂ f ( x ; w , b ) ∂ w ∣ x = 1 , w = 0 , b = 0 = ∂ f ( x ; w , b ) ∂ h 6 ⋅ ∂ h 6 ∂ h 5 ⋅ ∂ h 5 ∂ h 4 ⋅ ∂ h 4 ∂ h 3 ⋅ ∂ h 3 ∂ h 2 ⋅ ∂ h 2 ∂ h 1 ⋅ ∂ h 1 ∂ w = 1 × ( − 0.25 ) × 1 × 1 × ( − 1 ) × 1 × 1 = 0.25 \begin{aligned} \frac{\partial f(x;w,b)}{\partial w}|_{x=1,w=0,b=0} & =\frac{\partial f(x;w,b)}{\partial h_6}\cdot \frac{\partial h_6}{\partial h_5}\cdot \frac{\partial h_5}{\partial h_4}\cdot \frac{\partial h_4}{\partial h_3}\cdot \frac{\partial h_3}{\partial h_2}\cdot \frac{\partial h_2}{\partial h_1}\cdot \frac{\partial h_1}{\partial w}\\ & =1\times(-0.25)\times1\times1\times(-1)\times1\times{1} \\ &=0.25 \end{aligned} wf(x;w,b)x=1,w=0,b=0=h6f(x;w,b)h5h6h4h5h3h4h2h3h1h2wh1=1×(0.25)×1×1×(1)×1×1=0.25
∂ f ( x ; w , b ) ∂ b ∣ x = 1 , w = 0 , b = 0 = ∂ f ( x ; w , b ) ∂ h 6 ⋅ ∂ h 6 ∂ h 5 ⋅ ∂ h 5 ∂ h 4 ⋅ ∂ h 4 ∂ h 3 ⋅ ∂ h 3 ∂ h 2 ⋅ ∂ h 2 ∂ b = 1 × ( − 0.25 ) × 1 × 1 × ( − 1 ) × 1 = 0.25 \begin{aligned} \frac{\partial f(x;w,b)}{\partial b}|_{x=1,w=0,b=0} & =\frac{\partial f(x;w,b)}{\partial h_6}\cdot \frac{\partial h_6}{\partial h_5}\cdot \frac{\partial h_5}{\partial h_4}\cdot \frac{\partial h_4}{\partial h_3}\cdot \frac{\partial h_3}{\partial h_2}\cdot \frac{\partial h_2}{\partial b}\\ & =1\times(-0.25)\times1\times1\times(-1)\times{1}\\ &=0.25 \end{aligned} bf(x;w,b)x=1,w=0,b=0=h6f(x;w,b)h5h6h4h5h3h4h2h3bh2=1×(0.25)×1×1×(1)×1=0.25

3.BP反向传播算法

  反向传播算法是利用链式法则对神经网络中的各个节点的权重进行更新。

  • 输出层权重:
    w j k = w j k − η ∂ E ∂ w j k w_{jk}=w_{jk}-\eta \frac{\partial E}{\partial w_{jk}} wjk=wjkηwjkE
  • 隐藏层权重:
    w i j = w i j − η ∂ E ∂ w i j w_{ij}=w_{ij}-\eta \frac{\partial E}{\partial w_{ij}} wij=wijηwijE
  • 偏置更新:
    b j = b j − η ∂ E ∂ b j b_{j}=b_{j}-\eta \frac{\partial E}{\partial b_{j}} bj=bjηbjE
      我们仍旧用前向传播的例子,先求最简单的误差 E E E w 5 w_5 w5的导数。先要明确链式法则的求导过程,要求误差 E E E w 5 w_5 w5 的导数,需要先求误差 E E E o u t o 1 out_{o1} outo1 的导数,再求 o u t o 1 out_{o1} outo1 n e t o 1 net_{o1} neto1 的导数,最后求 n e t o 1 net_{o1} neto1 w 5 w_5 w5 的导数,经过链式法则,我们即求出了误差 E E E w 5 w_5 w5 的导数。如下图所示:
    在这里插入图片描述

3.1 求解导数

E t o t a l = 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 + 1 2 ( t a r g e t o 2 − o u t o 2 ) 2 E_{total}=\frac{1}{2}(target_{o1}-out_{o1})^2+\frac{1}{2}(target_{o2}-out_{o2})^2 Etotal=21(targeto1outo1)2+21(targeto2outo2)2

∂ E t o t a l ∂ o u t o 1 = 2 × 1 2 × ( t a r g e t o 1 − o u t o 1 ) 2 − 1 × ( − 1 ) + 0 = − ( t a r g e t o 1 − o u t o 1 ) = − ( 0.01 − 0.7514 ) = 0.7414 \frac{\partial E_{total}}{\partial out_{o1}}=2\times\frac{1}{2}\times(target_{o1}-out_{o1})^{2-1}\times(-1)+0=-(target_{o1}-out_{o1})=-(0.01-0.7514)=0.7414 outo1Etotal=2×21×(targeto1outo1)21×(1)+0=(targeto1outo1)=(0.010.7514)=0.7414

o u t o 1 = 1 1 + e − n e t o 1 out_{o1}=\frac{1}{1+e^{-net_{o1}}} outo1=1+eneto11

∂ o u t o 1 ∂ n e t o 1 = o u t o 1 ( 1 − o u t o 1 ) = 0.7514 × ( 1 − 0.7514 ) = 0.1868 \frac{\partial out_{o1}}{\partial net_{o1}}=out_{o1}(1-out_{o1})=0.7514\times(1-0.7514)=0.1868 neto1outo1=outo1(1outo1)=0.7514×(10.7514)=0.1868

n e t o 1 = w 5 o u t h 1 + w 6 o u t h 2 + b net_{o1}=w_5out_{h_1}+w_6out_{h_2}+b neto1=w5outh1+w6outh2+b

∂ n e t o 1 ∂ w 5 = o u t h 1 + 0 + 0 = 0.5933 \frac{\partial net_{o1}}{\partial w_5}=out_{h_1}+0+0=0.5933 w5neto1=outh1+0+0=0.5933

因此:

∂ E t o t a l ∂ w 5 = ∂ E t o t a l ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ w 5 = 0.7414 × 0.1868 × 0.5933 = 0.0822 \frac{\partial E_{total}}{\partial w_5} =\frac{\partial E_{total}}{\partial out_{o1}}\cdot\frac{\partial out_{o1}}{\partial net_{o1}}\cdot\frac{\partial net_{o1}}{\partial w_5}=0.7414\times0.1868\times{0.5933} =0.0822 w5Etotal=outo1Etotalneto1outo1w5neto1=0.7414×0.1868×0.5933=0.0822

3.2 参数更新

由上述求导过程可知:
∂ E t o t a l ∂ o u t o 1 = − ( t a r g e t o 1 − o u t o 1 ) ⋅ o u t o 1 ( 1 − o u t o 1 ) ⋅ o u t h 1 = 0.0822 \begin{aligned} \frac{\partial E_{total}}{\partial out_{o1}}&=-(target_{o1}-out_{o1})\cdot out_{o1}(1-out_{o1})\cdot out_{h_1}\\ &=0.0822 \end{aligned} outo1Etotal=(targeto1outo1)outo1(1outo1)outh1=0.0822
∂ E t o t a l ∂ o u t o 2 = − ( t a r g e t o 2 − o u t o 2 ) ⋅ o u t o 2 ( 1 − o u t o 2 ) ⋅ o u t h 2 = − 0.0227 \begin{aligned} \frac{\partial E_{total}}{\partial out_{o2}}&=-(target_{o2}-out_{o2})\cdot out_{o2}(1-out_{o2})\cdot out_{h_2}\\ &=-0.0227 \end{aligned} outo2Etotal=(targeto2outo2)outo2(1outo2)outh2=0.0227
w 5 + = w 5 − η ⋅ ∂ E t o t a l ∂ o u t o 1 = 0.4 − 0.5 × 0.0822 = 0.3589 w_5^+=w_5-\eta\cdot\frac{\partial E_{total}}{\partial out_{o1}}=0.4-0.5\times0.0822=0.3589 w5+=w5ηouto1Etotal=0.40.5×0.0822=0.3589

w 6 + = w 6 − η ⋅ ∂ E t o t a l ∂ o u t o 1 = 0.45 − 0.5 × 0.0822 = 0.4089 w_6^+=w_6-\eta\cdot\frac{\partial E_{total}}{\partial out_{o1}}=0.45-0.5\times0.0822=0.4089 w6+=w6ηouto1Etotal=0.450.5×0.0822=0.4089

w 7 + = w 7 − η ⋅ ∂ E t o t a l ∂ o u t o 2 = 0.50 − 0.5 × ( − 0.0227 ) = 0.5113 w_7^+=w_7-\eta\cdot\frac{\partial E_{total}}{\partial out_{o2}}=0.50-0.5\times(-0.0227)=0.5113 w7+=w7ηouto2Etotal=0.500.5×(0.0227)=0.5113

w 8 + = w 7 − η ⋅ ∂ E t o t a l ∂ o u t o 2 = 0.55 − 0.5 × ( − 0.0227 ) = 0.5614 w_8^+=w_7-\eta\cdot\frac{\partial E_{total}}{\partial out_{o2}}=0.55-0.5\times(-0.0227)=0.5614 w8+=w7ηouto2Etotal=0.550.5×(0.0227)=0.5614

误差 E E E w 1 w_1 w1的导数,求导路径不止一条,计算过程下图所示:
在这里插入图片描述

∂ E t o t a l ∂ w 1 = ∂ E t o t a l o u t h 1 ⋅ ∂ o u t h 1 ∂ n e t h 1 ⋅ ∂ n e t h 1 w 1 \frac{\partial E_{total}}{\partial w_1}=\frac{\partial E_{total}}{out_{h_1}}\cdot\frac{\partial out_{h_1}}{\partial net_{h_1}}\cdot\frac{\partial net_{h_1}}{w_1} w1Etotal=outh1Etotalneth1outh1w1neth1

∂ E t o t a l o u t h 1 = ∂ E o 1 ∂ o u t h 1 + ∂ E 02 ∂ o u t h 1 \frac{\partial E_{total}}{out_{h_1}}=\frac{\partial E_{o1}}{\partial out_{h_1}}+\frac{\partial E_{02}}{\partial out_{h_1}} outh1Etotal=outh1Eo1+outh1E02

∂ E o 1 ∂ o u t h 1 = ∂ E o 1 ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ o u t h 1 \frac{\partial E_{o1}}{\partial out_{h_1}}=\frac{\partial E_{o1}}{\partial out_{o1}}\cdot\frac{\partial out_{o1}}{\partial net_{o1}}\cdot\frac{\partial net_{o1}}{\partial out_{h_1}} outh1Eo1=outo1Eo1neto1outo1outh1neto1

∂ E o 2 ∂ o u t h 1 = ∂ E o 2 ∂ o u t o 2 ⋅ ∂ o u t o 2 ∂ n e t o 2 ⋅ ∂ n e t o 2 ∂ o u t h 1 \frac{\partial E_{o2}}{\partial out_{h_1}}=\frac{\partial E_{o2}}{\partial out_{o2}}\cdot\frac{\partial out_{o2}}{\partial net_{o2}}\cdot\frac{\partial net_{o2}}{\partial out_{h_1}} outh1Eo2=outo2Eo2neto2outo2outh1neto2

∂ E t o t a l ∂ w 1 = ( ∂ E o 1 ∂ o u t o 1 ⋅ ∂ o u t o 1 ∂ n e t o 1 ⋅ ∂ n e t o 1 ∂ o u t h 1 + ∂ E o 2 ∂ o u t o 2 ⋅ ∂ o u t o 2 ∂ n e t o 2 ⋅ ∂ n e t o 2 ∂ o u t h 1 ) ⋅ ∂ o u t h 1 ∂ n e t h 1 ⋅ ∂ n e t h 1 w 1 \frac{\partial E_{total}}{\partial w_1}=\left(\frac{\partial E_{o1}}{\partial out_{o1}}\cdot\frac{\partial out_{o1}}{\partial net_{o1}}\cdot\frac{\partial net_{o1}}{\partial out_{h_1}}+\frac{\partial E_{o2}}{\partial out_{o2}}\cdot\frac{\partial out_{o2}}{\partial net_{o2}}\cdot\frac{\partial net_{o2}}{\partial out_{h_1}}\right)\cdot\frac{\partial out_{h_1}}{\partial net_{h_1}}\cdot\frac{\partial net_{h_1}}{w_1} w1Etotal=(outo1Eo1neto1outo1outh1neto1+outo2Eo2neto2outo2outh1neto2)neth1outh1w1neth1

可得:
w 1 + = w 1 − η ⋅ ∂ E t o t a l ∂ w 1 = 0.15 − 0.5 × 0.000438568 = 0.149780716 w_1^+=w_1-\eta\cdot\frac{\partial E_{total}}{\partial w_1}=0.15-0.5\times0.000438568=0.149780716 w1+=w1ηw1Etotal=0.150.5×0.000438568=0.149780716

w 2 + = 0.19956143 w_2^+=0.19956143 w2+=0.19956143

w 3 + = 0.24975114 w_3^+=0.24975114 w3+=0.24975114

w 4 + = 0.29950229 w_4^+=0.29950229 w4+=0.29950229

  通过以上步骤,更新了所有的权重,最初的前向传播输入是0.05和0.1,网络上的误差是0.298371109。经过第一轮传播之后,总误差下降到0.291027924。重复10000次之后,误差下降到0.000035085。两个输出神经元输出为0.015912196(相对于目标0.01)和0.984065734(相对于目标0.99)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值