MLP反向传播

一、链式法则

  1. 链式法则用于求符合函数的导数,广泛应用于神经网络中的反向传播算法。
  2. 链式法则: d y d x = d y d u ∗ d u d x \frac{dy}{dx}=\frac{dy}{du}*\frac{du}{dx} dxdy=dudydxdu
  3. 链式法则在神经网络中的应用:
    在这里插入图片描述
    ∂ E ∂ w j k 1 = ∂ E ∂ O k 1 ∂ O k 1 ∂ w j k 1 = ∂ E ∂ O k 2 ∂ O k 2 ∂ O k 1 ∂ O k 1 ∂ w j k 1 \frac{\partial E}{\partial w_{jk}^{1}}=\frac{\partial E}{\partial O_{k}^{1}}\frac{\partial O_{k}^{1}}{\partial w_{jk}^{1}}=\frac{\partial E}{\partial O_{k}^{2}}\frac{\partial O_{k}^{2}}{\partial O_{k}^{1}}\frac{\partial O_{k}^{1}}{\partial w_{jk}^{1}} wjk1E=Ok1Ewjk1Ok1=Ok2EOk1Ok2wjk1Ok1

二、反向传播原理引入(引自知乎)

  1. e = ( a + b ) ∗ ( b + 1 ) e=(a+b)*(b+1) e=(a+b)(b+1)的偏导为例,其复合关系图如下
    在这里插入图片描述
    当a=2,b=1时,e的梯度我们可以用偏导关系来表示
    在这里插入图片描述
  2. 利用链式法则进行推导:
    1. ∂ e ∂ a = ∂ e ∂ c ∂ c ∂ a = 1 ∗ 2 \frac{\partial e}{\partial a}=\frac{\partial e}{\partial c}\frac{\partial c}{\partial a}=1*2 ae=ceac=12
    2. ∂ e ∂ b = ∂ e ∂ c ∂ c ∂ b + ∂ e ∂ d ∂ d ∂ b = 2 ∗ 1 + 3 ∗ 1 = 5 \frac{\partial e}{\partial b}=\frac{\partial e}{\partial c}\frac{\partial c}{\partial b}+\frac{\partial e}{\partial d}\frac{\partial d}{\partial b}=2*1+3*1=5 be=cebc+debd=21+31=5
  3. 规律总结:
    1. 求偏导,可以从当前点,一直累乘到叶子结点,并求和。

三、举个栗子

1、题目描述

  1. 现在有如下网络层
    在这里插入图片描述
  • 第一层:输入层;第二层:隐含层;第三层:输出层。
  • 激活函数为:sigmoid函数
  1. 赋值后如下
    在这里插入图片描述
  2. 目标:给定输入数据i1,i2(0.05,0.10),使输出尽可能与原始输出o1,o2(0.01和0.99)接近。

2、前向传播求解

  1. 输入层——>隐含层
    1. 计算神经元h1的输入加权求和: n e t h 1 = w 1 ∗ i 1 + w 2 ∗ i 2 + b 1 = 0.05 ∗ 0.15 + 0.10 ∗ 0.25 = 0.3775 net_{h1}=w_1*i_1+w_2*i_2+b_1=0.05*0.15+0.10*0.25=0.3775 neth1=w1i1+w2i2+b1=0.050.15+0.100.25=0.3775
    2. 计算神经元h1的输出o1:
      o u t h 1 = 1 1 + e − n e t h 1 = 1 1 + e − 0.3775 = 0.593269992 out_{h1}=\frac{1}{1+e^{-net_{h1}}}=\frac{1}{1+e^{-0.3775}}=0.593269992 outh1=1+eneth11=1+e0.37751=0.593269992
    3. 同理计算出h1的输出o2:
      o u t h 2 = 0.596884378 out_{h2}=0.596884378 outh2=0.596884378
  2. 隐含层——>输出层
    1. 计算输出层神经元o1和o2的值:
      n e t o 1 = w 5 ∗ o u t h 1 + w 6 ∗ o u t h 2 + b 2 = 0.4 ∗ 0.593269992 + 0.45 ∗ 0.596884378 + 0.6 = 1.105905967 net_{o1}=w_5*out_{h1}+w_6*out_{h2}+b_2=0.4*0.593269992+0.45*0.596884378+0.6=1.105905967 neto1=w5outh1+w6outh2+b2=0.40.593269992+0.450.596884378+0.6=1.105905967
      o u t o 1 = 1 1 + e − n e t o 1 = 1 1 + e − 1.105905967 = 0.75136507 out_{o1}=\frac{1}{1+e^{-net_{o1}}}=\frac{1}{1+e^{-1.105905967}}=0.75136507 outo1=1+eneto11=1+e1.1059059671=0.75136507
      o u t o 2 = 0.772928465 out_{o2}=0.772928465 outo2=0.772928465
  3. 检验输出值与实际值的差距
    1. 向前传播的过程得到的输出值为[0.75136079 , 0.772928465],与实际值[0.01,0.99]相差很远,现在对误差进行反向传播,来更新权值w,并重新计算输出

3、反向传播

  1. 计算总误差:
    1. E t o t a l = ∑ 1 2 ( t a r g e t − o u t p u t ) 2 E_{total}=\sum{\frac{1}{2}(target-output)^{2}} Etotal=21(targetoutput)2
    2. E o 1 = ∑ 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 = 1 2 ( 0.01 − 0.75136507 ) 2 = 0.274811083 E_{o1}=\sum{\frac{1}{2}(target_{o1}-out_{o1})^{2}}=\frac{1}{2}(0.01-0.75136507)^2=0.274811083 Eo1=21(targeto1outo1)2=21(0.010.75136507)2=0.274811083
    3. E o 2 = 0.023560026 E_{o2}=0.023560026 Eo2=0.023560026
    4. E t o t a l = E o 1 + E o 2 = 0.274811083 + 0.023560026 = 0.298371109 E_{total}=E_{o1}+E_{o2}=0.274811083+0.023560026=0.298371109 Etotal=Eo1+Eo2=0.274811083+0.023560026=0.298371109
  2. 隐含层——>输出层的权值更新:
    1. 以权重 w 5 w_5 w5为例,用整体误差对 w 5 w_5 w5求偏导,可以知道 w 5 w_5 w5对整体误差产生多少影响。
    2. ∂ E t o t a l ∂ w 5 = ∂ E t o t a l ∂ o u t 1 ∂ o u t 1 ∂ n e t o 1 ∂ n e t o 1 ∂ w 5 \frac{\partial Etotal}{\partial w_5}=\frac{\partial Etotal}{\partial out_1}\frac{\partial out_1}{\partial net_{o1}}\frac{\partial net_{o1}}{\partial w_5} w5Etotal=out1Etotalneto1out1w5neto1,如下图所示
      在这里插入图片描述
    3. 计算 ∂ E t o t a l ∂ o u t o 1 \frac{\partial Etotal}{\partial out_{o1}} outo1Etotal
      1. E t o t a l = 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 + 1 2 ( t a r g e t o 2 − o u t o 2 ) 2 E_{total}=\frac{1}{2}(target_{o1}-out_{o1})^2+\frac{1}{2}(target_{o2}-out_{o2})^2 Etotal=21(targeto1outo1)2+21(targeto2outo2)2
      2. ∂ E t o t a l ∂ o u t o 1 = − ( t a r g e t o 1 − o u t o 1 ) = − ( 0.01 − 0.75136507 ) = 0.74136507 \frac{\partial Etotal}{\partial out_{o1}}=-(target_{o1}-out_{o1})=-(0.01 -0.75136507)=0.74136507 outo1Etotal=(targeto1outo1)=(0.010.75136507)=0.74136507
    4. 计算 ∂ o u t o 1 ∂ n e t o 1 \frac{\partial out_{o1}}{\partial net_{o1}} neto1outo1
      1. o u t o 1 = 1 1 + e − n e t o 1 out_{o1}=\frac{1}{1+e^{-net_{o1}}} outo1=1+eneto11
      2. ∂ o u t o 1 ∂ n e t o 1 = o u t o 1 ( 1 − o u t o 1 ) = 0.75136507 ( 1 − 0.75136507 ) = 0.186815602 \frac{\partial out_{o1}}{\partial net_{o1}}=out_{o1}(1-out_{o1})=0.75136507(1-0.75136507)=0.186815602 neto1outo1=outo1(1outo1)=0.75136507(10.75136507)=0.186815602
    5. 计算 ∂ n e t o 1 ∂ w 5 \frac{\partial net_{o1}}{\partial w_{5}} w5neto1
      1. n e t o 1 = w 5 ∗ o u t h 1 + w 6 ∗ o u t h 2 + b 2 net_{o1}=w_5*out_{h1}+w_6*out_{h2}+b_2 neto1=w5outh1+w6outh2+b2
      2. ∂ n e t o 1 ∂ w 5 = o u t h 1 = 0.593269992 \frac{\partial net_{o1}}{\partial w_{5}}=out_{h1}=0.593269992 w5neto1=outh1=0.593269992
    6. 最后得 ∂ E t o t a l ∂ w 5 = 0.74136507 ∗ 0.186815602 ∗ 0.593269992 = 0.082167041 \frac{\partial E_{total}}{\partial w_5}=0.74136507*0.186815602*0.593269992 = 0.082167041 w5Etotal=0.741365070.1868156020.593269992=0.082167041
    7. 最后可得 ∂ E t o t a l ∂ w 5 = − ( t a r g e t o 1 − o u t o 1 ) ∗ o u t 01 ∗ ( 1 − o u t o 1 ) \frac{\partial E_{total}}{\partial w_5}=-(target_{o1}-out_{o1})*out_{01}*(1-out_{o1}) w5Etotal=(targeto1outo1)out01(1outo1)
      1. δ o 1 \delta o_1 δo1表示输出层误差,即 δ o 1 = ∂ E t o t a l ∂ o u t o 1 ∂ o u t o 1 ∂ n e t o 1 = ∂ E t o t a l ∂ n e t o 1 = − ( t a r g e t o 1 − o u t o 1 ) ∗ o u t o 1 ( 1 − o u t o 1 ) \delta o_1=\frac{\partial E_{total}}{\partial out_{o1}}\frac{\partial out_{o1}}{\partial net_{o1}}=\frac{\partial E_{total}}{\partial net_{o1}}=-(target_{o1}-out_{o1})*out_{o1}(1-out_{o1}) δo1=outo1Etotalneto1outo1=neto1Etotal=(targeto1outo1)outo1(1outo1)
      2. ∂ E t o t a l ∂ w 5 = δ o 1 o u t h 1 \frac{\partial E_{total}}{\partial w_5}=\delta o_1out_{h1} w5Etotal=δo1outh1
    8. 最后更新 w 5 w_5 w5的值
      1. w 5 + = w 5 − η ∗ ∂ E t o t a l ∂ w 5 w_5^+=w_5-\eta * \frac{\partial E_{total}}{\partial w_5} w5+=w5ηw5Etotal
      2. η \eta η是学习速率
  3. 隐含层——>输出层的权值更新:
    方法与上述类似,但是计算out(h1)对w1的偏导时,会接受E(o1)、E(o2)两个地方传来的误差 所以都需要计算一下
    在这里插入图片描述
    1. 计算 ∂ E t o t a l ∂ w 1 = ∂ E t o t a l ∂ o u t h 1 ∂ o u t h 1 ∂ n e t h 1 ∂ n e t h 1 ∂ w 1 \frac{\partial E_{total}}{\partial w_{1}}=\frac{\partial E_{total}}{\partial out_{h1}}\frac{\partial out_{h1}}{\partial net_{h1}}\frac{\partial net_{h1}}{\partial w_{1}} w1Etotal=outh1Etotalneth1outh1w1neth1
    2. 化简式子 ∂ E t o t a l ∂ w 1 = ( ∑ o ∂ E t o t a l ∂ o u t o ∂ o u t o ∂ n e t o ∂ n e t o ∂ o u t h 1 ) ∗ ∂ o u t h 1 ∂ n e t h 1 ∗ ∂ n e t h 1 ∂ w 1 = ( ∑ o δ o w h o ) ∗ o u t h 1 ( 1 − o u t h 1 ) ∗ i 1 = δ h 1 i 1 \frac{\partial E_{total}}{\partial w_{1}}=(\sum_o\frac{\partial E_total}{\partial out_o}\frac{\partial out_o}{\partial net_o}\frac{\partial net_o}{\partial out_{h1}})*\frac{\partial out_{h1}}{\partial net_{h1}}*\frac{\partial net_{h1}}{\partial w_{1}}=(\sum_o\delta_ow_{ho})*out_{h1}(1-out_{h1})*i_1=\delta h_1i_1 w1Etotal=(ooutoEtotalnetooutoouth1neto)neth1outh1w1neth1=(oδowho)outh1(1outh1)i1=δh1i1
    3. 最后更新 w 1 w_1 w1的值
      1. w 1 + = w 1 − η ∗ ∂ E t o t a l ∂ w 1 w_1^+=w_1-\eta * \frac{\partial E_{total}}{\partial w_1} w1+=w1ηw1Etotal
  • 0
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值