BP反向传播算法推导

本文主要是在该文的基础上进行了自己的整理总结,如有不清楚的地方还请移步

1 准备工作

1.1 Sigmoid激活函数的导数

Sigmoid函数表达式:
σ ( x ) = 1 1 + e − x \begin{aligned} \sigma \left( x \right) =\frac{1}{1+e^{-x}} \end{aligned} σ(x)=1+ex1
其导数为:
d σ ( x ) d x = d d x ( 1 1 + e − x ) = e − x ( 1 + e − x ) 2 = ( 1 + e − x ) − 1 ( 1 + e − x ) 2 = 1 + e − x ( 1 + e − x ) 2 − ( 1 1 + e − x ) 2 = σ ( x ) − σ ( x ) 2 = σ ( 1 − σ ) \begin{aligned} \\ \frac{d\sigma \left( x \right)}{dx}&=\frac{d}{dx}\left( \frac{1}{1+e^{-x}} \right) \\ &=\frac{e^{-x}}{\left( 1+e^{-x} \right) ^2}=\frac{\left( 1+e^{-x} \right) -1}{\left( 1+e^{-x} \right) ^2} \\ &=\frac{1+e^{-x}}{\left( 1+e^{-x} \right) ^2}-\left( \frac{1}{1+e^{-x}} \right) ^{\begin{array}{c}2 \\ \end{array}} \\ &=\sigma \left( x \right) -\sigma \left( x \right) ^2=\sigma \left( 1-\sigma \right) \end{aligned} dxdσ(x)=dxd(1+ex1)=(1+ex)2ex=(1+ex)2(1+ex)1=(1+ex)21+ex(1+ex1)2=σ(x)σ(x)2=σ(1σ)

2 神经网络结构图

i i i为输入, h h h o o o为两个全连接层

对神经网络的权重 w w w和偏置 b b b的参数进行初始化,如下图所示:

3 前向传播

1. 对 h h h层:

1.计算 h 1 h_1 h1节点的全部输入
n e t h 1 = w 1 × i 1 + w 2 × i 2 + b 1 × 1 = 0.15 × 0.05 + 0.2 × 0.1 + 0.35 × 1 = 0.3775 \begin{aligned} net_{h1}&=w_1\times i_1+w_2\times i_2+b_1\times 1 \\ &=0.15\times 0.05+0.2\times 0.1+0.35\times 1 \\ &=0.3775 \end{aligned} neth1=w1×i1+w2×i2+b1×1=0.15×0.05+0.2×0.1+0.35×1=0.3775
2.计算 h 1 h_1 h1节点的输出。 h 1 h_1 h1节点的公式应该为 o u t h 1 = σ ( w x + b ) out_{h1}=\sigma \left( wx+b \right) outh1=σ(wx+b)其中, x x x为该节点的输入(在此处即 i i i), w w w为权重, b b b为偏置, σ \sigma σ为激活函数(此处采用的激活函数即准备工作1.1中的Sigmoid激活函数)。则 h 1 h_1 h1节点的输出为:
o u t h 1 = σ ( w x + b ) = σ ( n e t h 1 ) = 1 1 + e − n e t h 1 = 1 1 + e − 0.3775 = 0.593269992 \begin{aligned} out_{h1}&=\sigma \left( wx+b \right) \\ &=\sigma \left( net_{h1} \right) \\ &=\frac{1}{1+e^{-net_{h1}}}=\frac{1}{1+e^{-0.3775}} \\ &=0.593269992 \end{aligned} outh1=σ(wx+b)=σ(neth1)=1+eneth11=1+e0.37751=0.593269992
3.用同样的方法得 o u t h 2 = 0.596884378 out_{h2}=0.596884378 outh2=0.596884378

2.对 o o o

1.对 o o o层重复上述的过程:
n e t o 1 = w 5 × o u t h 1 + w 6 × o u t h 2 + b 2 × 1 = 0.4 × 0.593269992 + 0.45 × 0.596884378 + 0.6 = 1.105905967 \begin{aligned} net_{o1}&=w_5\times out_{h1}+w_6\times out_{h2}+b_2\times 1 \\ &=0.4\times 0.593269992+0.45\times 0.596884378+0.6 \\ &=1.105905967 \end{aligned} neto1=w5×outh1+w6×outh2+b2×1=0.4×0.593269992+0.45×0.596884378+0.6=1.105905967
则输出为:
o u t o 1 = 1 1 + e − n e t o 1 = 0.75136507 out_{o1}=\frac{1}{1+e^{-net_{o1}}}=0.75136507 outo1=1+eneto11=0.75136507
同理可以得到 o u t o 2 = 0.772929456 out_{o2}=0.772929456 outo2=0.772929456

4 计算误差(Loss)

在这里采用均方差损失函数,其表达式为: E t o t a l = 1 2 ∑ k = 1 K ( y k − o k ) 2 E_{total}=\frac{1}{2}\sum_{k=1}^K{\left( y_k-o_k \right) ^2} Etotal=21k=1K(ykok)2其中, y k y_k yk为真实值(期望值), o k o_k ok为输出值。

如上图2,对 o 1 o_1 o1节点,其真实值为0.01,而神经网络经前向传播之后的输出值为0.75136507,则其误差为:
E o 1 = 1 2 ( t a r g e t − o u t p u t ) 2 = 1 2 × ( 0.01 − 0.75136507 ) 2 = 0.274811 \begin{aligned} E_{o1}&=\frac{1}{2}\left( target-output \right) ^2=\frac{1}{2}\times \left( 0.01-0.75136507 \right) ^2 \\ &=0.274811 \end{aligned} Eo1=21(targetoutput)2=21×(0.010.75136507)2=0.274811
同理可得 E o 2 = 0.023560026 E_{o2}=0.023560026 Eo2=0.023560026

综上所述,可以得到总误差为: E t o t a l = E o 1 + E o 2 = 0.274811 + 0.023560025 = 0.298371 E_{total}=E_{o1}+E_{o2}=0.274811+0.023560025=0.298371 Etotal=Eo1+Eo2=0.274811+0.023560025=0.298371

5 反向传播

1.对输出层( o o o层)

对于 w 5 w_5 w5,想知道其改变对于总误差有多大的影响,于是需要计算 ∂ E t o t a l ∂ w 5 \frac{\partial E_{total}}{\partial w_5} w5Etotal

根据链式法则: ∂ E t o t a l ∂ w 5 = ∂ E t o t a l ∂ o u t o 1 × ∂ o u t o 1 ∂ n e t o 1 × ∂ n e t o 1 ∂ w 5 \frac{\partial E_{total}}{\partial w_5}=\frac{\partial E_{total}}{\partial out_{o1}}\times \frac{\partial out_{o1}}{\partial net_{o1}}\times \frac{\partial net_{o1}}{\partial w_5} w5Etotal=outo1Etotal×neto1outo1×w5neto1

1.对 ∂ E t o t a l ∂ o u t o 1 \frac{\partial E_{total}}{\partial out_{o1}} outo1Etotal

E t o t a l = 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 + 1 2 ( t a r g e t o 2 − o u t o 2 ) 2 ∂ E t o t a l ∂ o u t o 1 = 2 × 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 − 1 × ( − 1 ) + 0 = − ( t a r g e t o 1 − o u t o 1 ) = − ( 0.01 − 0.75136507 ) = 0.741365 \begin{aligned} E_{total}&=\frac{1}{2}\left( target_{o1}-out_{o1} \right) ^2+\frac{1}{2}\left( target_{o2}-out_{o2} \right) ^2 \\ \frac{\partial E_{total}}{\partial out_{o1}}&=2\times \frac{1}{2}\left( target_{o1}-out_{o1} \right) ^{2-1}\times \left( -1 \right) +0 \\ &=-\left( target_{o1}-out_{o1} \right) \\ &=-\left( 0.01-0.75136507 \right) =0.741365 \end{aligned} Etotalouto1Etotal=21(targeto1outo1)2+21(targeto2outo2)2=2×21(targeto1outo1)21×(1)+0=(targeto1outo1)=(0.010.75136507)=0.741365
2.对 ∂ o u t o 1 ∂ n e t o 1 \frac{\partial out_{o1}}{\partial net_{o1}} neto1outo1
o u t o 1 = 1 1 + e − n e t o 1 ∂ o u t o 1 ∂ n e t o 1 = o u t o 1 ( 1 − o u t o 1 ) = 0.186815602 \begin{aligned} out_{o1}&=\frac{1}{1+e^{-net_{o1}}} \\ \frac{\partial out_{o1}}{\partial net_{o1}}&=out_{o1}\left( 1-out_{o1} \right) =0.186815602 \end{aligned} outo1neto1outo1=1+eneto11=outo1(1outo1)=0.186815602
3.对 ∂ n e t o 1 ∂ w 5 \frac{\partial net_{o1}}{\partial w_5} w5neto1
n e t o 1 = w 5 × o u t h 1 + w 6 × o u t h 2 + b 2 × 1 ∂ n e t o 1 ∂ w 5 = 1 × o u t h 1 × w 5 ( 1 − 1 ) + 0 + 0 = 0.593269992 \begin{aligned} net_{o1}&=w_5\times out_{h1}+w_6\times out_{h2}+b_2\times 1 \\ \frac{\partial net_{o1}}{\partial w_5}&=1\times out_{h1}\times w_{5}^{\left( 1-1 \right)}+0+0=0.593269992 \end{aligned} neto1w5neto1=w5×outh1+w6×outh2+b2×1=1×outh1×w5(11)+0+0=0.593269992
综上所述,
∂ E t o t a l ∂ w 5 = ∂ E t o t a l ∂ o u t o 1 × ∂ o u t o 1 ∂ n e t o 1 × ∂ n e t o 1 ∂ w 5 = 0.741365 × 0.186815602 × 0.593269992 = 0.082167 \begin{aligned} \frac{\partial E_{total}}{\partial w_5}&=\frac{\partial E_{total}}{\partial out_{o1}}\times \frac{\partial out_{o1}}{\partial net_{o1}}\times \frac{\partial net_{o1}}{\partial w_5} \\ &=0.741365\times 0.186815602\times 0.593269992 \\ &=0.082167 \end{aligned} w5Etotal=outo1Etotal×neto1outo1×w5neto1=0.741365×0.186815602×0.593269992=0.082167

接下来是使用优化器根据该值调整权重 w 5 w_5 w5(关于优化器的更多介绍:Link),在这里采用最基础的标准梯度下降法(GD),设置学习率 η = 0.5 \eta=0.5 η=0.5
w 5 + = w 5 − η × ∂ E t o t a l ∂ w 5 = 0.4 − 0.5 × 0.082167041 = 0.358916 w_{5}^{+}=w_5-\eta \times \frac{\partial E_{total}}{\partial w_5}=0.4-0.5\times 0.082167041=0.358916 w5+=w5η×w5Etotal=0.40.5×0.082167041=0.358916
于是 就得到了权重 w 5 w_5 w5更新之后的值 w 5 + w_{5}^{+} w5+

重复上述相同的步骤即可得到 w 6 + w_{6}^{+} w6+ w 7 + w_{7}^{+} w7+ w 8 + w_{8}^{+} w8+

2.对隐藏层(h层 )

对h层的步骤与对输出层类似,需要计算: ∂ E t o t a l ∂ h 5 = ∂ E t o t a l ∂ o u t h 1 × ∂ o u t h 1 ∂ n e t h 1 × ∂ n e t h 1 ∂ w 5 \frac{\partial E_{total}}{\partial h_5}=\frac{\partial E_{total}}{\partial out_{h1}}\times \frac{\partial out_{h1}}{\partial net_{h1}}\times \frac{\partial net_{h1}}{\partial w_5} h5Etotal=outh1Etotal×neth1outh1×w5neth1

后续的步骤就不再赘述,即可求得 w 1 + w_{1}^{+} w1+ w 2 + w_{2}^{+} w2+ w 3 + w_{3}^{+} w3+ w 4 + w_{4}^{+} w4+

综上,不断的重复上述的操作,就能实现对网络中每一层的参数进行更新,达到训练的效果。


在神经网络训练时,每经过一个batch,就会通过上述的操作对权重进行更新,在经过多个Epoch之后,网络达到收敛(即误差值很小),前向传播的输出值与我们期望值相差很小,整个网络的训练完成。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值