前向传播和反向传播

前向传播和反向传播



前言

前向传播就是神经网络的每一层经过权重乘以输入求和再加上偏置项,通过激活函数在得到输出,输出再做为下一层的输入,反复直至的到最后输出的过程。
反向传播就是神经网络通过最终的输出计算每一层的权重对最终输出的影响(用偏导数衡量),再经过梯度下降原理,从当前权值减去学习率乘以偏导数,从而达到更新权值的过程。
下面我通过一个例子来演示这两个过程。

假设我有一个三层的神经网络:

在这里插入图片描述
i1、i2是输入层,h1、h2是隐藏层,o1、o2是输出层,b1、b2是输入层到隐藏层和隐藏层到输出层的偏置,w是每一层到下一层的权值。每一层都是两个神经元,激活函数使用Sigmoid函数。然后我给每个参数赋予初值。

在这里插入图片描述
其中, 输 入 数 据 : i 1 = 0.05 , i 2 = 0.10 输 出 数 据 : o 1 = 0.01 , o 2 = 0.99 初 始 权 重 : w 1 = 0.15 , w 2 = 0.20 , w 3 = 0.25 , w 4 = 0.30 w 5 = 0.40 , w 6 = 0.45 , w 7 = 0.50 , w 8 = 0.55 \begin{aligned} 输入数据:&i1=0.05,i2=0.10\\ 输出数据:&o1=0.01,o2=0.99\\ 初始权重:&w1=0.15,w2=0.20,w3=0.25,w4=0.30\\ &w5=0.40, w6=0.45,w7=0.50,w8=0.55 \end{aligned} i1=0.05i2=0.10o1=0.01o2=0.99w1=0.15w2=0.20w3=0.25w4=0.30w5=0.40w6=0.45w7=0.50w8=0.55
目标:给出输入数据i1=0.05,i2=0.10,使最终输出尽可能与目标输出o1=0.01,o2=0.99接近。

前向传播

  1. 输入层----->隐藏层
    计算隐藏层h1的输入:
    n e t h 1 = i 1 ∗ w 1 + i 2 ∗ w 2 + b 1 ∗ 1 = 0.05 ∗ 0.15 + 0.10 ∗ 0.20 + 0.35 ∗ 1 = 0.3775 \begin{aligned} net_{h1} &=i1*w1+ i2*w2 +b1*1\\ &=0.05 *0.15+0.10*0.20+0.35*1\\ &=0.3775 \end{aligned} neth1=i1w1+i2w2+b11=0.050.15+0.100.20+0.351=0.3775
    神经元h1的输出:
    o u t h 1 = 1 1 + e − n e t h 1 = 1 1 + e − 0.3775 = 0.59326999 out_{h1}=\dfrac{1}{1+e^{-net_{h1}}}=\dfrac{1}{1+e^{-0.3775}} =0.59326999 outh1=1+eneth11=1+e0.37751=0.59326999
    同理,算出神经元h2的输出为:
    o u t h 2 = 0.59688438 out_{h2}=0.59688438 outh2=0.59688438
  2. 隐藏层----->输出层
    计算输出层o1的输入:
    n e t o 1 = o u t h 1 ∗ w 5 + o u t h 2 ∗ w 6 + b 2 ∗ 1 = 0.59326999 ∗ 0.4 + 0.59688438 ∗ 0.45 + 0.60 ∗ 1 = 1.10590597 \begin{aligned} net_{o1} &=out_{h1}*w5+ out_{h2}*w6 +b2*1\\ &=0.59326999*0.4+0.59688438*0.45+0.60*1\\ &=1.10590597 \end{aligned} neto1=outh1w5+outh2w6+b21=0.593269990.4+0.596884380.45+0.601=1.10590597
    神经元o1的输出:
    o u t o 1 = 1 1 + e − n e t o 1 = 1 1 + e − 1.10590597 = 0.75136507 out_{o1}=\dfrac{1}{1+e^{-net_{o1}}}=\dfrac{1}{1+e^{-1.10590597}} =0.75136507 outo1=1+eneto11=1+e1.105905971=0.75136507
    同理,算出神经元o2的输出为:
    o u t o 2 = 0.77292847 out_{o2}=0.77292847 outo2=0.77292847
    至此前向传播结束,得到了输出值 o1=0.75136507,o2=0.77292847,与目标输出o1=0.01,o2=0.99相差较多。我们接着对误差反向传播,更新权值,再重新计算输出。

反向传播

  1. 计算总误差
    总误差等于所有误差之和:
    E t o t a l = ∑ 1 2 ( t a r g e t − o u t p u t ) 2 = E o 1 + E o 2 E o 1 = 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 = 1 2 ( 0.01 − 0.75136507 ) 2 = 0.27481108 E o 2 = 0.02356003 E t o t a l = E o 1 + E o 2 = 0.29837111 \begin{aligned} E_{total} &=\sum\dfrac{1}{2}(target-output)^2=E_{o1}+E_{o2}\\ E_{o1}&=\dfrac{1}{2}(target_{o1}-out_{o1})^2\\ &=\dfrac{1}{2}(0.01-0.75136507)^2\\ &=0.27481108\\ E_{o2}&=0.02356003\\ E_{total} &=E_{o1}+E_{o2}=0.29837111 \end{aligned} EtotalEo1Eo2Etotal=21(targetoutput)2=Eo1+Eo2=21(targeto1outo1)2=21(0.010.75136507)2=0.27481108=0.02356003=Eo1+Eo2=0.29837111

  2. 隐藏层----->输出层的权值更新
    以权重 w 7 w7 w7为例,计算 w 7 w7 w7对整个结果的影响(使用链式法则分开求导):
    下图明确了误差是如何反向传播的:
    在这里插入图片描述

    ∂ E t o t a l ∂ w 7 = ∂ E t o t a l ∂ o u t 02 × ∂ o u t 02 ∂ n e t 02 × ∂ n e t o 2 ∂ w 7 \begin{aligned}\\\\\\ \dfrac {\partial E_{total}}{\partial w_{7}}=\dfrac {\partial E_{total}}{\partial out_{02}}\times \dfrac {\partial out_{02}}{\partial net_{02}}\times \dfrac {\partial net_{o2}} {\partial w_{7}} \end{aligned} w7Etotal=out02Etotal×net02out02×w7neto2

    计算 ∂ E t o t a l ∂ o u t 02 \dfrac {\partial E_{total}}{\partial out_{02}} out02Etotal
    E t o t a l = ∑ 1 2 ( t a r g e t − o u t p u t ) 2 = 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 + 1 2 ( t a r g e t o 2 − o u t p u t o 2 ) 2 ∂ E t o t a l ∂ o u t 02 = 1 2 × 2 × ( t a r g e t o 2 − o u t p u t o 2 ) × ( − 1 ) = − ( t a r g e t o 2 − o u t o 2 ) = − ( 0.99 − 0.77292847 ) = − 0.2170753 \begin{aligned} E_{total} &=\sum\dfrac{1}{2}(target-output)^2\\ &=\dfrac{1}{2}(target_{o1}-out_{o1})^2+\dfrac{1}{2}(target_{o2}-output_{o2})^2\\\\ \dfrac {\partial E_{total}}{\partial out_{02}} &=\dfrac{1}{2}\times2\times(target_{o2}-output_{o2})\times(-1)\\ &=-(target_{o2}-out_{o2})\\ &=-(0.99-0.77292847)\\ &=-0.2170753\\ \end{aligned} Etotalout02Etotal=21(targetoutput)2=21(targeto1outo1)2+21(targeto2outputo2)2=21×2×(targeto2outputo2)×(1)=(targeto2outo2)=(0.990.77292847)=0.2170753
    计算 ∂ o u t o 2 n e t o 2 \dfrac{\partial out_{o2}}{net_{o2}} neto2outo2

    o u t o 2 = 1 1 + e − n e t o 2 ∂ o u t o 2 ∂ n e t o 2 = o u t o 2 ( 1 − o u t o 2 ) ( 这 就 是 S i g m o i d 函 数 的 导 数 ) = 0.77292847 × ( 1 − 0.77292847 ) = 0.17551005 \begin{aligned} out_{o2} &=\dfrac{1}{1+e^{-net_{o2}}}\\\\ \dfrac{\partial out_{o2}}{\partial net_{o2}} &=out_{o2}(1-out_{o2}) (这就是Sigmoid函数的导数)\\ &=0.77292847\times(1-0.77292847)\\ &=0.17551005 \end{aligned} outo2neto2outo2=1+eneto21=outo2(1outo2)(Sigmoid)=0.77292847×(10.77292847)=0.17551005
    计算 ∂ n e t o 2 ∂ w 7 \dfrac{\partial net_{o2}}{\partial w_{7}} w7neto2:
    n e t o 1 = o u t h 1 × w 7 + o u t h 2 × w 8 + b 2 × 1 ∂ n e t o 1 ∂ w 7 = o u t h 1 = 0.59326999 \begin{aligned} net_{o1} &=out_{h1}\times w_{7}+out_{h2}\times w_{8}+b_{2}\times 1\\ \dfrac{\partial net_{o1}}{\partial w_{7}} &=out_{h1}\\ &=0.59326999 \end{aligned} neto1w7neto1=outh1×w7+outh2×w8+b2×1=outh1=0.59326999

    由此我们求得:
    ∂ E t o t a l ∂ w 7 = ∂ E t o t a l ∂ o u t 02 × ∂ o u t 02 ∂ n e t 02 × ∂ n e t o 2 ∂ w 7 = − 0.2170753 × 0.17551005 × 0.59326999 = − 0.02260294 \begin{aligned} \dfrac {\partial E_{total}}{\partial w_{7}} &=\dfrac {\partial E_{total}}{\partial out_{02}}\times \dfrac {\partial out_{02}} {\partial net_{02}}\times \dfrac {\partial net_{o2}} {\partial w_{7}}\\\\ &=-0.2170753 \times 0.17551005\times 0.59326999\\\\ &=-0.02260294 \end{aligned} w7Etotal=out02Etotal×net02out02×w7neto2=0.2170753×0.17551005×0.59326999=0.02260294
    这样我们就求得整体误差 E t o t a l E_{total} Etotal w 7 w_{7} w7的偏导数。

    再看上面的公式,我们发现:
    ∂ E t o t a l ∂ w 7 = − ( t a r g e t o 2 − o u t o 2 ) ∗ o u t o 2 ( 1 − o u t o 2 ) ∗ o u t h 1 \begin{aligned} \dfrac {\partial E_{total}}{\partial w_{7}} &=-(target_{o2}-out_{o2})* out_{o2}(1-out_{o2}) *out_{h1} \end{aligned} w7Etotal=(targeto2outo2)outo2(1outo2)outh1
    现在我们用 δ o 2 \delta_{o2} δo2来表示输出层的误差:
    δ o 2 = ∂ E t o t a l ∂ o u t 02 × ∂ o u t 02 ∂ n e t 02 = ∂ E t o t a l ∂ n e t o 2 δ o 2 = − ( t a r g e t o 2 − o u t o 2 ) ∗ o u t o 2 ( 1 − o u t o 2 ) 所 以 , 整 体 误 差 E t o t a l 对 w 7 的 偏 导 数 简 写 为 : ∂ E t o t a l ∂ w 7 = δ o 1 ∗ o u t h 1 \begin{aligned} \delta_{o2} &=\dfrac {\partial E_{total}}{\partial out_{02}}\times \dfrac {\partial out_{02}} {\partial net_{02}}\\\\ &=\dfrac{\partial E_{total}}{\partial net_{o2}}\\\\ \delta_{o2} &=-(target_{o2}-out_{o2})* out_{o2}(1-out_{o2})\\\\ & 所以,整体误差E_{total}对w_{7}的偏导数简写为:\\\\ \dfrac{\partial E_{total}}{\partial w_{7}} &=\delta_{o1}*out_{h1} \end{aligned} δo2δo2w7Etotal=out02Etotal×net02out02=neto2Etotal=(targeto2outo2)outo2(1outo2)Etotalw7=δo1outh1

    然后我们就可以更新 w 7 w7 w7 的值了,假设 η = 0.5 \eta=0.5 η=0.5
    w 7 ∗ = w 7 − η × ∂ E t o t a l ∂ w 7 = 0.5 − 0.5 × ( − 0.02260294 ) = 0.51130147 \begin{aligned} w_{7}^* &=w_{7}-\eta\times \dfrac{\partial E_{total}}{\partial w_{7}} \\ &=0.5-0.5\times (-0.02260294)\\ &=0.51130147 \end{aligned} w7=w7η×w7Etotal=0.50.5×(0.02260294)=0.51130147
    同理,对 w 5 , w 6 , w 8 , w_{5},w_{6},w_{8}, w5w6w8也进行更新:
    w 5 ∗ = 0.35891648 w 6 ∗ = 0.40866619 w 8 ∗ = 0.56137012 \begin{aligned} w_{5}^* &=0.35891648\\\\ w_{6}^* &=0.40866619\\\\ w_{8}^* &=0.56137012\\\\ \end{aligned} w5w6w8=0.35891648=0.40866619=0.56137012

  3. 输入层----->隐藏层的权值更新 (假设更新 w 3 w_{3} w3
    输入层----->隐藏层的权值更新跟上面 隐藏层----->输出层的权值更新思路差不多,但是隐藏 层----->输出层的权值更新是从 o u t o 2 − > n e t o 2 − > w 7 out_{o2} ->net_{o2}->w_{7} outo2>neto2>w7,输入层----->隐藏层的权值更新是从 o u t h 2 − > n e t h 2 − > w 3 out_{h2}->net_{h2}->w_{3} outh2>neth2>w3,但是 o u t h 2 out_{h2} outh2 会接受 E o 1 , E o 2 E_{o1},E_{o2} Eo1,Eo2 两个方向传来的误差,所以都要计算。下图说明了误差影响的方向。
    在这里插入图片描述
    ∂ E t o t a l ∂ w 3 = ∂ E t o t a l ∂ o u t h 2 ∗ ∂ o u t h 2 ∂ n e t h 2 ∗ ∂ n e t h 2 ∂ w 3 \begin{aligned} \dfrac{\partial E_{total}}{\partial w_{3}} &=\dfrac{\partial E_{total}}{\partial out_{h2}}*\dfrac{\partial out_{h2}}{\partial net_{h2}}*\dfrac{\partial net_{h2}}{\partial w_{3}}\\\\ \end{aligned} w3Etotal=outh2Etotalneth2outh2w3neth2

∂ E t o t a l ∂ o u t h 2 \dfrac{\partial E_{total}}{\partial out_{h2}} outh2Etotal:
E t o t a l = E o 1 + E o 2 ∂ E t o t a l ∂ o u t h 2 = ∂ E o 1 ∂ o u t h 2 + ∂ E o 2 ∂ o u t h 2 \begin{aligned} E_{total} &=E_{o1}+E_{o2}\\\\ \dfrac{\partial E_{total}}{\partial out_{h2}} &=\dfrac{\partial E_{o1}}{\partial out_{h2}}+\dfrac{\partial E_{o2}}{\partial out_{h2}} \end{aligned} Etotalouth2Etotal=Eo1+Eo2=outh2Eo1+outh2Eo2
再计算 ∂ E o 1 ∂ o u t h 2 , ∂ E o 2 ∂ o u t h 2 \dfrac{\partial E_{o1}}{\partial out_{h2}},\dfrac{\partial E_{o2}}{\partial out_{h2}} outh2Eo1,outh2Eo2
∂ E o 1 ∂ o u t h 2 = ∂ E o 1 ∂ o u t o 1 ∗ ∂ o u t o 1 ∂ n e t o 1 ∗ ∂ n e t o 1 ∂ o u t h 2 分 别 计 算 ∂ E o 1 ∂ o u t o 1 、 ∂ o u t o 1 ∂ n e t o 1 、 ∂ n e t o 1 ∂ o u t h 2 E o 1 = 1 2 ∗ ( t a r g e t o 1 − o u t o 1 ) 2 ∂ E o 1 ∂ o u t o 1 = 1 2 ∗ 2 ∗ ( t a r g e t o 1 − o u t o 1 ) ∗ ( − 1 ) = − ( 0.01 − 0.75136507 ) = 0.74136507 o u t o 1 = 1 1 + e − n e t o 1 ∂ o u t o 1 ∂ n e t o 1 = o u t o 1 ∗ ( 1 − o u t o 1 ) = 0.75136507 ∗ ( 1 − 0.75136507 ) = 0.18681560 n e t o 1 = o u t h 1 ∗ w 5 + o u t h 2 ∗ w 6 + 1 ∗ b 2 ∂ n e t o 1 ∂ o u t h 2 = w 6 = 0.45 所 以 , ∂ E o 1 ∂ o u t h 2 = ∂ E o 1 ∂ o u t o 1 ∗ ∂ o u t o 1 ∂ n e t o 1 ∗ ∂ n e t o 1 ∂ o u t h 2 = 0.74136507 ∗ 0.18681560 ∗ 0.45 = 0.06232435 \begin{aligned}\\ \dfrac{\partial E_{o1}}{\partial out_{h2}} &=\dfrac{\partial E_{o1}}{\partial out_{o1}}*\dfrac{\partial out_{o1}} {\partial net_{o1}}*\dfrac{\partial net_{o1}}{\partial out_{h2}}\\\\ 分别计算\dfrac{\partial E_{o1}}{\partial out_{o1}}、\dfrac{\partial out_{o1}}{\partial net_{o1}}、\dfrac{\partial net_{o1}}{\partial out_{h2}}\\\\ E_{o1} &=\dfrac{1}{2}*(target_{o1}-out_{o1})^2\\\\ \dfrac{\partial E_{o1}}{\partial out_{o1}} &=\dfrac{1}{2}*2*(target_{o1}-out_{o1} )*(-1)\\\\ &=-(0.01-0.75136507)\\\\ &=0.74136507\\\\ out_{o1} &=\dfrac{1}{1+e^{-net_{o1}}}\\\\ \dfrac{\partial out_{o1}}{\partial net_{o1}} &=out_{o1}*(1-out_{o1})\\\\ &=0.75136507*(1-0.75136507)\\\\ &=0.18681560\\\\ net_{o1}&=out_{h1}*w_{5}+out_{h2}*w_{6}+1*b_{2}\\\\ \dfrac{\partial net_{o1}}{\partial out_{h2}} &=w_{6}=0.45\\\\ 所以,\dfrac{\partial E_{o1}}{\partial out_{h2}} &=\dfrac{\partial E_{o1}}{\partial out_{o1}}*\dfrac{\partial out_{o1}} {\partial net_{o1}}*\dfrac{\partial net_{o1}}{\partial out_{h2}}\\\\ &=0.74136507*0.18681560*0.45\\\\ &=0.06232435 \end{aligned} outh2Eo1outo1Eo1neto1outo1outh2neto1Eo1outo1Eo1outo1neto1outo1neto1outh2neto1outh2Eo1=outo1Eo1neto1outo1outh2neto1=21(targeto1outo1)2=212(targeto1outo1)(1)=(0.010.75136507)=0.74136507=1+eneto11=outo1(1outo1)=0.75136507(10.75136507)=0.18681560=outh1w5+outh2w6+1b2=w6=0.45=outo1Eo1neto1outo1outh2neto1=0.741365070.186815600.45=0.06232435

同理得: ∂ E o 2 ∂ o u t h 2 = 0.17551005 \dfrac{\partial E_{o2}}{\partial out_{h2}}=0.17551005 outh2Eo2=0.17551005
所以
∂ E t o t a l ∂ o u t h 2 = ∂ E o 1 ∂ o u t h 2 + ∂ E o 2 ∂ o u t h 2 = 0.06232435 + 0.17551005 = 0.23783440 \begin{aligned} \dfrac{\partial E_{total}}{\partial out_{h2}} &=\dfrac{\partial E_{o1}}{\partial out_{h2}}+\dfrac{\partial E_{o2}}{\partial out_{h2}}\\\\ &=0.06232435+0.17551005\\\\ &=0.23783440\\\\ \end{aligned} outh2Etotal=outh2Eo1+outh2Eo2=0.06232435+0.17551005=0.23783440

∂ o u t h 2 n e t h 2 \dfrac{\partial out_{h2}}{net_{h2}} neth2outh2:
o u t h 2 = 1 1 + e − n e t h 2 ∂ o u t h 2 n e t h 2 = o u t h 2 ∗ ( 1 − o u t h 2 ) = 0.59688438 ∗ ( 1 − 0.59688438 ) = 0.24061342 \begin{aligned} out_{h2}&=\dfrac{1}{1+e^{-net_{h2}}}\\\\ \dfrac{\partial out_{h2}}{net_{h2}} &=out_{h2}*(1-out_{h2})\\\\ &=0.59688438*(1-0.59688438)\\\\ &=0.24061342 \end{aligned} outh2neth2outh2=1+eneth21=outh2(1outh2)=0.59688438(10.59688438)=0.24061342

∂ n e t h 2 ∂ w 3 \dfrac{\partial net_{h2}}{\partial w_{3}} w3neth2:
n e t h 2 = i 1 ∗ w 3 + i 2 ∗ w 4 + 1 ∗ b 1 ∂ n e t h 2 ∂ w 3 = i 1 = 0.05 \begin{aligned} net_{h2}&=i_{1}*w_{3}+i_{2}*w_{4}+1*b_{1}\\\\ \dfrac{\partial net_{h2}}{\partial w_{3}}&=i_{1}=0.05 \end{aligned} neth2w3neth2=i1w3+i2w4+1b1=i1=0.05
所以
∂ E t o t a l ∂ w 3 = ∂ E t o t a l ∂ o u t h 2 ∗ ∂ o u t h 2 ∂ n e t h 2 ∗ ∂ n e t h 2 ∂ w 3 = 0.23783440 ∗ 0.24061306 ∗ 0.05 = 0.00286130 \begin{aligned} \dfrac{\partial E_{total}}{\partial w_{3}} &=\dfrac{\partial E_{total}}{\partial out_{h2}}*\dfrac{\partial out_{h2}}{\partial net_{h2}}*\dfrac{\partial net_{h2}}{\partial w_{3}}\\\\ &=0.23783440*0.24061306*0.05\\\\ &=0.00286130 \end{aligned} w3Etotal=outh2Etotalneth2outh2w3neth2=0.237834400.240613060.05=0.00286130
为了简记,用 δ h 1 \delta_{h1} δh1代表隐藏层 h 1 h_{1} h1的误差:
∂ E t o t a l ∂ w 3 = ∂ E t o t a l ∂ o u t h 2 ∗ ∂ o u t h 2 ∂ n e t h 2 ∗ ∂ n e t h 2 ∂ w 3 = ( ∂ E o 1 ∂ o u t h 2 + ∂ E o 2 ∂ o u t h 2 ) ∗ o u t h 2 ∗ ( 1 − o u t h 2 ) ∗ i 1 = ( ∂ E o 1 ∂ o u t o 1 ∗ ∂ o u t o 1 ∂ n e t o 1 ∗ ∂ n e t o 1 ∂ o u t h 2 + ∂ E o 2 ∂ o u t o 2 ∗ ∂ o u t o 2 ∂ n e t o 2 ∗ ∂ n e t o 2 ∂ o u t h 2 ) ∗ o u t h 2 ∗ ( 1 − o u t h 2 ) ∗ i 1 = ( ( o u t o 1 − t a r g e t o 1 ) ∗ ( o u t o 1 ∗ ( 1 − o u t o 1 ) ) ∗ w 6 + ( o u t o 2 − t a r g e t o 2 ) ∗ ( o u t o 2 ∗ ( 1 − o u t o 2 ) ) ∗ w 6 ) ∗ o u t h 2 ∗ ( 1 − o u t h 2 ) ∗ i 1 = δ h 1 ∗ i 1 \begin{aligned} \dfrac{\partial E_{total}}{\partial w_{3}} &=\dfrac{\partial E_{total}}{\partial out_{h2}}*\dfrac{\partial out_{h2}}{\partial net_{h2}}*\dfrac{\partial net_{h2}}{\partial w_{3}}\\\\ &=(\dfrac{\partial E_{o1}}{\partial out_{h2}}+\dfrac{\partial E_{o2}}{\partial out_{h2}})*out_{h2}*(1-out_{h2})*i_{1}\\\\ &=(\dfrac{\partial E_{o1}}{\partial out_{o1}}*\dfrac{\partial out_{o1}} {\partial net_{o1}}*\dfrac{\partial net_{o1}}{\partial out_{h2}}+\dfrac{\partial E_{o2}}{\partial out_{o2}}*\dfrac{\partial out_{o2}}{\partial net_{o2}}*\dfrac{\partial net_{o2}}{\partial out_{h2}}) \\\\&*out_{h2}*(1-out_{h2})*i_{1}\\\\ &=((out_{o1}-target_{o1})*(out_{o1}*(1-out_{o1}))*w_{6}\\\\ &+(out_{o2}-target_{o2})*(out_{o2}*(1-out_{o2}))*w_{6})\\\\ &*out_{h2}*(1-out_{h2})*i_{1}\\\\ &=\delta_{h1}*i_{1} \end{aligned} w3Etotal=outh2Etotalneth2outh2w3neth2=(outh2Eo1+outh2Eo2)outh2(1outh2)i1=(outo1Eo1neto1outo1outh2neto1+outo2Eo2neto2outo2outh2neto2)outh2(1outh2)i1=((outo1targeto1)(outo1(1outo1))w6+(outo2targeto2)(outo2(1outo2))w6)outh2(1outh2)i1=δh1i1
最后更新 w 3 w_{3} w3的值:
w 3 ∗ = w 3 − η ∗ ∂ E t o t a l ∂ w 3 = 0.24856935 w_{3}^*=w_{3}-\eta*\dfrac{\partial E_{total}}{\partial w_{3}}=0.24856935 w3=w3ηw3Etotal=0.24856935
同样的,更新 w 1 、 w 2 、 w 4 w_{1}、w_{2}、w_{4} w1w2w4的权值:
w 1 ∗ = 0.14978072 w 2 ∗ = 0.19956143 w 4 ∗ = 0.29950229 \begin{aligned} w_{1}^*&=0.14978072\\\\ w_{2}^*&=0.19956143\\\\ w_{4}^*&=0.29950229\\\\ \end{aligned} w1w2w4=0.14978072=0.19956143=0.29950229
这样误差反向传播法就完成了,然后把新的权值代入,在进行误差计算,到最后接近目标输出就算完成了。

#coding:utf-8
import random
import math


#   参数解释:
#   "pd_" :偏导的前缀
#   "d_" :导数的前缀
#   "w_ho" :隐含层到输出层的权重系数索引
#   "w_ih" :输入层到隐含层的权重系数的索引

class NeuralNetwork:
    LEARNING_RATE = 0.5

    def __init__(self, num_inputs, num_hidden, num_outputs, hidden_layer_weights = None, hidden_layer_bias = None, output_layer_weights = None, output_layer_bias = None):
        self.num_inputs = num_inputs

        self.hidden_layer = NeuronLayer(num_hidden, hidden_layer_bias)
        self.output_layer = NeuronLayer(num_outputs, output_layer_bias)

        self.init_weights_from_inputs_to_hidden_layer_neurons(hidden_layer_weights)
        self.init_weights_from_hidden_layer_neurons_to_output_layer_neurons(output_layer_weights)

    def init_weights_from_inputs_to_hidden_layer_neurons(self, hidden_layer_weights):
        weight_num = 0
        for h in range(len(self.hidden_layer.neurons)):
            for i in range(self.num_inputs):
                if not hidden_layer_weights:
                    self.hidden_layer.neurons[h].weights.append(random.random())
                else:
                    self.hidden_layer.neurons[h].weights.append(hidden_layer_weights[weight_num])
                weight_num += 1

    def init_weights_from_hidden_layer_neurons_to_output_layer_neurons(self, output_layer_weights):
        weight_num = 0
        for o in range(len(self.output_layer.neurons)):
            for h in range(len(self.hidden_layer.neurons)):
                if not output_layer_weights:
                    self.output_layer.neurons[o].weights.append(random.random())
                else:
                    self.output_layer.neurons[o].weights.append(output_layer_weights[weight_num])
                weight_num += 1

    def inspect(self):
        print('------')
        print('* Inputs: {}'.format(self.num_inputs))
        print('------')
        print('Hidden Layer')
        self.hidden_layer.inspect()
        print('------')
        print('* Output Layer')
        self.output_layer.inspect()
        print('------')

    def feed_forward(self, inputs):
        hidden_layer_outputs = self.hidden_layer.feed_forward(inputs)
        return self.output_layer.feed_forward(hidden_layer_outputs)

    def train(self, training_inputs, training_outputs):
        self.feed_forward(training_inputs)

        # 1. 输出神经元的值
        pd_errors_wrt_output_neuron_total_net_input = [0] * len(self.output_layer.neurons)
        for o in range(len(self.output_layer.neurons)):

            # ∂E/∂zⱼ
            pd_errors_wrt_output_neuron_total_net_input[o] = self.output_layer.neurons[o].calculate_pd_error_wrt_total_net_input(training_outputs[o])

        # 2. 隐含层神经元的值
        pd_errors_wrt_hidden_neuron_total_net_input = [0] * len(self.hidden_layer.neurons)
        for h in range(len(self.hidden_layer.neurons)):

            # dE/dyⱼ = Σ ∂E/∂zⱼ * ∂z/∂yⱼ = Σ ∂E/∂zⱼ * wᵢⱼ
            d_error_wrt_hidden_neuron_output = 0
            for o in range(len(self.output_layer.neurons)):
                d_error_wrt_hidden_neuron_output += pd_errors_wrt_output_neuron_total_net_input[o] * self.output_layer.neurons[o].weights[h]

            # ∂E/∂zⱼ = dE/dyⱼ * ∂zⱼ/∂
            pd_errors_wrt_hidden_neuron_total_net_input[h] = d_error_wrt_hidden_neuron_output * self.hidden_layer.neurons[h].calculate_pd_total_net_input_wrt_input()

        # 3. 更新输出层权重系数
        for o in range(len(self.output_layer.neurons)):
            for w_ho in range(len(self.output_layer.neurons[o].weights)):

                # ∂E/∂wᵢⱼ =E/∂zⱼ * ∂zⱼ/∂wᵢⱼ
                pd_error_wrt_weight = pd_errors_wrt_output_neuron_total_net_input[o] * self.output_layer.neurons[o].calculate_pd_total_net_input_wrt_weight(w_ho)

                # Δw = α *E/∂wᵢ
                self.output_layer.neurons[o].weights[w_ho] -= self.LEARNING_RATE * pd_error_wrt_weight

        # 4. 更新隐含层的权重系数
        for h in range(len(self.hidden_layer.neurons)):
            for w_ih in range(len(self.hidden_layer.neurons[h].weights)):

                # ∂E/∂wᵢ =E/∂zⱼ * ∂zⱼ/∂wᵢ
                pd_error_wrt_weight = pd_errors_wrt_hidden_neuron_total_net_input[h] * self.hidden_layer.neurons[h].calculate_pd_total_net_input_wrt_weight(w_ih)

                # Δw = α *E/∂wᵢ
                self.hidden_layer.neurons[h].weights[w_ih] -= self.LEARNING_RATE * pd_error_wrt_weight

    def calculate_total_error(self, training_sets):
        total_error = 0
        for t in range(len(training_sets)):
            training_inputs, training_outputs = training_sets[t]
            self.feed_forward(training_inputs)
            for o in range(len(training_outputs)):
                total_error += self.output_layer.neurons[o].calculate_error(training_outputs[o])
        return total_error

class NeuronLayer:
    def __init__(self, num_neurons, bias):

        # 同一层的神经元共享一个截距项b
        self.bias = bias if bias else random.random()

        self.neurons = []
        for i in range(num_neurons):
            self.neurons.append(Neuron(self.bias))

    def inspect(self):
        print('Neurons:', len(self.neurons))
        for n in range(len(self.neurons)):
            print(' Neuron', n)
            for w in range(len(self.neurons[n].weights)):
                print('  Weight:', self.neurons[n].weights[w])
            print('  Bias:', self.bias)

    def feed_forward(self, inputs):
        outputs = []
        for neuron in self.neurons:
            outputs.append(neuron.calculate_output(inputs))
        return outputs

    def get_outputs(self):
        outputs = []
        for neuron in self.neurons:
            outputs.append(neuron.output)
        return outputs

class Neuron:
    def __init__(self, bias):
        self.bias = bias
        self.weights = []

    def calculate_output(self, inputs):
        self.inputs = inputs
        self.output = self.squash(self.calculate_total_net_input())
        return self.output

    def calculate_total_net_input(self):
        total = 0
        for i in range(len(self.inputs)):
            total += self.inputs[i] * self.weights[i]
        return total + self.bias

    # 激活函数sigmoid
    def squash(self, total_net_input):
        return 1 / (1 + math.exp(-total_net_input))


    def calculate_pd_error_wrt_total_net_input(self, target_output):
        return self.calculate_pd_error_wrt_output(target_output) * self.calculate_pd_total_net_input_wrt_input();

    # 每一个神经元的误差是由平方差公式计算的
    def calculate_error(self, target_output):
        return 0.5 * (target_output - self.output) ** 2

    
    def calculate_pd_error_wrt_output(self, target_output):
        return -(target_output - self.output)

    
    def calculate_pd_total_net_input_wrt_input(self):
        return self.output * (1 - self.output)


    def calculate_pd_total_net_input_wrt_weight(self, index):
        return self.inputs[index]


# 文中的例子:

nn = NeuralNetwork(2, 2, 2, hidden_layer_weights=[0.15, 0.2, 0.25, 0.3], hidden_layer_bias=0.35, output_layer_weights=[0.4, 0.45, 0.5, 0.55], output_layer_bias=0.6)
for i in range(10000):
    nn.train([0.05, 0.1], [0.01, 0.09])
    print(i, round(nn.calculate_total_error([[[0.05, 0.1], [0.01, 0.09]]]), 9))


#另外一个例子,可以把上面的例子注释掉再运行一下:

# training_sets = [
#     [[0, 0], [0]],
#     [[0, 1], [1]],
#     [[1, 0], [1]],
#     [[1, 1], [0]]
# ]

# nn = NeuralNetwork(len(training_sets[0][0]), 5, len(training_sets[0][1]))
# for i in range(10000):
#     training_inputs, training_outputs = random.choice(training_sets)
#     nn.train(training_inputs, training_outputs)
#     print(i, nn.calculate_total_error(training_sets))

作者:Charlotte77
出处:http://www.cnblogs.com/charlotte77/

  • 8
    点赞
  • 36
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值