



你能使用Python来实现反向传播,我曾经在this Github repo上实现了反向传播算法。


显示神经网络学习时相互作用的可视化,检查我的Neural Network visualization


如果你发现这个教程对你有用并且想继续学习神经网络以及它的应用,我强烈建议你看Adrian Rosebrock优秀的教程 Getting Started with Deep Learning and Python




n e t h 1 = w 1 ∗ i 1 + w 2 ∗ i 2 + b 1 ∗ 1 n e t h 1 = 0.15 ∗ 0.05 + 0.2 ∗ 0.1 + 0.35 ∗ 1 = 0.3775 \begin{array}{l} net_{h 1}=w_{1} * i_{1}+w_{2} * i_{2}+b_{1} * 1 \\ \\ net_{h 1}=0.15 * 0.05+0.2 * 0.1+0.35 * 1=0.3775 \end{array} neth1=w1i1+w2i2+b11neth1=0.150.05+0.20.1+0.351=0.3775
o u t h 1 = 1 1 + e h − n e t 1 = 1 1 + e − 0.3775 = 0.593269992 \text out _{h1}=\frac{1}{1+e^{-n e t}_{h} 1}=\frac{1}{1+e^{-0.3775}}=0.593269992 outh1=1+ehnet11=1+e0.37751=0.593269992
h 2 h_{2} h2进行相同的操作:
o u t h 2 out_{h 2} outh2=0.596884378
这是 o 1 o_{1} o1的输出:
n e t o 1 = w 5 ∗ o u t h 1 + w 6 ∗ o u t h 2 + b 2 ∗ 1 \text net _{o 1}=w_{5} * \text out_{h 1}+w_{6} * \text out_{h 2}+b_{2} * 1 neto1=w5outh1+w6outh2+b21
o 2 o_{2} o2进行相同操作:
o u t o 2 out_{o2} outo2=0.772928465


E total  = ∑ 1 2 (  target  −  output  ) 2 E_{\text {total }}=\sum \frac{1}{2}(\text { target }-\text { output })^{2} Etotal =21( target  output )2
例如,目标输出 o 1 o_{1} o1是0.01,但是神经网络输出是0.75136507,因此误差是:
E o 1 = 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 = 1 2 ( 0.01 − 0.75136507 ) 2 = 0.274811083 E_{o 1}=\frac{1}{2}\left(\text target_{o 1}-\text out_{o 1}\right)^{2}=\frac{1}{2}(0.01-0.75136507)^{2}=0.274811083 Eo1=21(targeto1outo1)2=21(0.010.75136507)2=0.274811083
o 2 o_{2} o2重复这个过程:
E o 2 E_{o2} Eo2=0.023560026
E total  = E o 1 + E o 2 = 0.274811083 + 0.023560026 = 0.298371109 E_{\text {total }}=E_{o 1}+E_{o 2}=0.274811083+0.023560026=0.298371109 Etotal =Eo1+Eo2=0.274811083+0.023560026=0.298371109




考虑 w 5 w_{5} w5,我们想要知道 w 5 w_{5} w5怎样影响整体误差,即 α E total  α w 5 \frac{\alpha E_{\text {total }}}{\alpha w_{5}} αw5αEtotal 
KaTeX parse error: Undefined control sequence: \textnet at position 191: …\frac{\partial \̲t̲e̲x̲t̲n̲e̲t̲_{o 1}}{\partia…
E total  = 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 + 1 2 ( t a r g e t o 2 − o u t o 2 ) 2 ∂ E total  ∂ o u t o 1 = 2 ∗ 1 2 ( target ⁡ o 1 − o u t o 1 ) 2 − 1 ∗ − 1 + 0 ∂ E total  ∂ o u t o 1 = − ( target ⁡ o 1 − o u t o 1 ) = − ( 0.01 − 0.75136507 ) = 0.74136507 \begin{array}{l} E_{\text {total }}=\frac{1}{2}\left(\text target_{o 1}-\text out_{o 1}\right)^{2}+\frac{1}{2}\left(\text target_{o 2}-\text out_{o 2}\right)^{2}\\ \\ \frac{\partial E_{\text {total }}}{\partial \text out_{o 1}}=2 * \frac{1}{2}\left(\operatorname{target}_{o 1}-\text out_{o 1}\right)^{2-1} *-1+0\\ \\ \frac{\partial E_{\text {total }}}{\partial \text out_{o 1}}=-\left(\operatorname{target}_{o 1}-\text out_{o 1}\right)=-(0.01-0.75136507)=0.74136507 \end{array} Etotal =21(targeto1outo1)2+21(targeto2outo2)2outo1Etotal =221(targeto1outo1)211+0outo1Etotal =(targeto1outo1)=(0.010.75136507)=0.74136507
下一步,net input怎样改变 o 1 o_{1} o1输出?
o u t o = 1 1 + e − n e t o 1 ∂ o u t o 1 ∂ n e t o 1 = o u t o 1 ( 1 − o u t o 1 ) = 0.75136507 ( 1 − 0.75136507 ) = 0.186815602 \begin{array}{l} \text out _{o }=\frac{1}{1+e^{-net_{o1}}} \\ \\ \frac{\partial \text out_{o1}}{\partial \text net _{o1}}=\text out _{o 1}\left(1-\text out _{o 1}\right)=0.75136507(1-0.75136507)=0.186815602 \end{array} outo=1+eneto11neto1outo1=outo1(1outo1)=0.75136507(10.75136507)=0.186815602
最后, w 5 w_{5} w5怎样改变 o 1 o_{1} o1的net input?
n e t o 1 = w 5 ∗ o u t h 1 + w 6 ∗ o u t h 2 + b 2 ∗ 1 ∂ n e t o 1 ∂ w 5 = 1 ∗ o u t h 1 ∗ w 5 ( 1 − 1 ) + 0 + 0 = o u t h 1 = 0.593269992 \begin{array}{l} net_{o 1}=w_{5} * \text out_{h 1}+w_{6} * \text out_{h 2}+b_{2} * 1 \\ \\ \frac{\partial \text net_{o 1}}{\partial w_{5}}=1 * \text out_{h 1} * w_{5}^{(1-1)}+0+0=\text out_{h 1}=0.593269992 \end{array} neto1=w5outh1+w6outh2+b21w5neto1=1outh1w5(11)+0+0=outh1=0.593269992
∂ E total  ∂ w 5 = ∂ E total  ∂ o u t o 1 ∗ ∂  out  o 1 ∂ n e t o 1 ∗ ∂ n e t o 1 ∂ w 5 ∂ E total  ∂ w 5 = 0.74136507 ∗ 0.186815602 ∗ 0.593269992 = 0.082167041 \begin{array}{l} \frac{\partial E_{\text {total }}}{\partial w_{5}}=\frac{\partial E_{\text {total }}}{\partial o u t_{o 1}} * \frac{\partial \text { out }_{o 1}}{\partial n e t_{o 1}} * \frac{\partial n e t_{o 1}}{\partial w_{5}} \\ \\ \frac{\partial E_{\text {total }}}{\partial w_{5}}=0.74136507 * 0.186815602 * 0.593269992=0.082167041 \end{array} w5Etotal =outo1Etotal neto1 out o1w5neto1w5Etotal =0.741365070.1868156020.593269992=0.082167041
你常常能看到delta rule的结合形式:
∂ E total  ∂ w 5 = − ( target ⁡ o 1 − o u t o 1 ) ∗ o u t o 1 ( 1 − o u t o 1 ) ∗ o u t h 1 \frac{\partial E_{\text {total }}}{\partial w_{5}}=-\left(\operatorname {target}_{o 1}-\text out_{o 1}\right) * \text out_{o 1}\left(1-\text out_{o 1}\right) * \text out_{h 1} w5Etotal =(targeto1outo1)outo1(1outo1)outh1
我们利用 ∂ E total  ∂ o u t o 1  和  ∂ o u t o 1 ∂ n e t o 1  来重写  ∂ E total  ∂ n e t o 1 , \frac{\partial E_{\text {total }}}{\partial \text out_{o 1}} \text { 和 } \frac{\partial \text out_{o 1}}{\partial net_{o 1}} \text { 来重写 } \frac{\partial E_{\text {total }}}{\partial net_{o 1}}, outo1Etotal   neto1outo1 来重写 neto1Etotal ,,我们使用这个重新上面的表达式:
δ o 1 = ∂ E total  ∂ o u t ol  ∗ ∂ o u t o 1 ∂ n e t o 1 = ∂ E total  ∂ n e t o 1 δ o 1 = − ( t a r g e t o 1 − o u t o 1 ) ∗ o u t o 1 ( 1 − o u t o 1 ) \begin{array}{l} \delta_{o 1}=\frac{\partial E_{\text {total }}}{\partial o u t_{\text {ol }}} * \frac{\partial \text out_{o 1}}{\partial \text net_{o 1}}=\frac{\partial E_{\text {total }}}{\partial \text net_{o 1}} \\ \\ \delta_{o 1}=-\left(\text target_{o 1}-\text out_{o 1}\right) * \text out_{o 1}\left(1-\text out_{o 1}\right) \end{array} δo1=outol Etotal neto1outo1=neto1Etotal δo1=(targeto1outo1)outo1(1outo1)
∂ E total  ∂ w 5 = δ o 1 o u t h 1 \frac{\partial E_{\text {total }}}{\partial w_{5}}=\delta_{o 1} \text out_{h 1} w5Etotal =δo1outh1
w 5 t + 1 = w 5 t − η ∗ ∂ E total  ∂ w 5 t = 0.4 − 0.5 ∗ 0.082167041 = 0.35891648 w_{5}^{t+1}=w_{5}^{t}-\eta * \frac{\partial E_{\text {total }}}{\partial w_{5}^{t}}=0.4-0.5 * 0.082167041=0.35891648 w5t+1=w5tηw5tEtotal =
我们能重复这个过程得到新的权重 w 6 w_{6} w6 w 7 w_{7} w7 w 8 w_{8} w8
w 6 t + 1 = 0.408666186 w 7 t + 1 = 0.511301270 w 8 t + 1 = 0.561370121 \begin{array}{l} w_{6}^{t+1}=0.408666186 \\ \\ w_{7}^{t+1}=0.511301270 \\ \\ w_{8}^{t+1}=0.561370121 \end{array} w6t+1=0.408666186w7t+1=0.511301270w8t+1=0.561370121


下一步,我们将继续向后计算 w 1 , w 2 , w 3 和 w 4 w_{1},w_{2},w_{3}和w_{4} w1w2w3w4新值,这是我们需要理解的:
∂ E total  ∂ w 1 = ∂ E total  ∂ o u t h 1 ∗ ∂ o u t h 1 ∂ n e t h 1 ∗ ∂ n e t h 1 ∂ w 1 \frac{\partial E_{\text {total }}}{\partial w_{1}}=\frac{\partial E_{\text {total }}}{\partial o u t_{h 1}} * \frac{\partial o u t_{h 1}}{\partial n e t_{h 1}} * \frac{\partial n e t_{h 1}}{\partial w_{1}} w1Etotal =outh1Etotal neth1outh1w1neth1
我们将要对隐含层神经元使用相似的过程,但是稍微不同的是,每个隐含层神经元的输出贡献到多个输出层神经元中。我们知道 o u t h 1 影 响 o u t o 1 和 o u t o 2 outh_{1}影响outo_{1}和outo_{2} outh1outo1outo2,因此 ∂ E total  ∂ o u t h 1 \frac{\partial E_{\text {total }}}{\partial \text out_{h 1}} outh1Etotal 需要考虑两个输出层神经元的影响:
∂ E total  ∂ o u t h 1 = ∂ E o 1 ∂ o u t h 1 + ∂ E o 2 ∂ o u t h 1 \frac{\partial E_{\text {total }}}{\partial \text out_{h 1}}=\frac{\partial E_{o 1}}{\partial \text out_{h 1}}+\frac{\partial E_{o 2}}{\partial \text out_{h 1}} outh1Etotal =outh1Eo1+outh1Eo2
先计算 ∂ E o 1 ∂ o u t h 1  :  \frac{\partial E_{o 1}}{\partial o u t_{h 1}} \text { : } outh1Eo1 : 
∂ E o 1 ∂ o u t h 1 = ∂ E o 1 α n e t o 1 ∗ ∂ n e t o 1 ∂ o u t h 1 \frac{\partial E_{o 1}}{\partial o u t_{h 1}}=\frac{\partial E_{o 1}}{\alpha n e t_{o 1}} * \frac{\partial net_{o 1}}{\partial out_{h 1}} outh1Eo1=αneto1Eo1outh1neto1
使用稍早前计算的值来计算 ∂ E o 1 ∂ n e t o 1 \frac{\partial E_{o 1}}{\partial net_{o 1}} neto1Eo1
∂ E o 1 ∂ n e t o 1 = ∂ E o 1 ∂ o u t o 1 ∗ ∂ o u t o 1 ∂ n e t o 1 = 0.74136507 ∗ 0.186815602 \frac{\partial E_{o 1}}{\partial \text net_{o 1}}=\frac{\partial E_{o 1}}{\partial \text out_{o 1}} * \frac{\partial out_{o 1}}{\partial \text net_{o 1}}=0.74136507 * 0.186815602 neto1Eo1=outo1Eo1neto1outo1=0.741365070.186815602
∂ n e t o 1 ∂ o u t h 1 \frac{\partial net_{o 1}}{\partial \text out_{h 1}} outh1neto1 等于 w 5 w_{5} w5 :
n e t o 1 = w 5 ∗ o u t h 1 + w 6 ∗ o u t h 2 + b 2 ∗ 1 ∂ n e t o 1 ∂ o u t h 1 = w 5 = 0.40 \begin{array}{l} n e t_{o 1}=w_{5} * o u t_{h 1}+w_{6} * o u t_{h 2}+b_{2} * 1 \\ \\ \frac{\partial net_{o 1}}{\partial \text out_{h 1}}=w_{5}=0.40 \end{array} neto1=w5outh1+w6outh2+b21outh1neto1=w5=0.40
∂ E o 1 ∂ o u t h 1 = ∂ E o 1 ∂ n e t o 1 ∗ ∂ n e t o 1 ∂ o u t h 1 = 0.138498562 ∗ 0.40 = 0.055399425 \frac{\partial E_{o 1}}{\partial o u t_{h 1}}=\frac{\partial E_{o 1}}{\partial net_{o 1}} * \frac{\partial net_{o 1}}{\partial \text out_{h 1}}=0.138498562 * 0.40=0.055399425 outh1Eo1=neto1Eo1outh1neto1=0.1384985620.40=0.055399425
∂ E o 2 ∂ o u t o 1 \frac{\partial E_{o 2}}{\partial o u t_{o 1}} outo1Eo2 做相同的处理:
∂ E o 2 ∂ o u t h 1 = − 0.019049119 \frac{\partial E_{o2}}{\partial out_{h 1}}=-0.019049119 outh1Eo2=0.019049119
∂ E totat  ∂ o u t h 1 = ∂ E o 1 ∂ o u t h 1 + ∂ E o 3 ∂ o u t h 1 = 0.055399425 + − 0.019049119 = 0.036350306 \frac{\partial E_{\text {totat }}}{\partial \text out_{h1}}=\frac{\partial E_{o 1}}{\partial \text out_{h 1}}+\frac{\partial E_{o 3}}{\partial \text out_{h 1}}=0.055399425+-0.019049119=0.036350306 outh1Etotat =outh1Eo1+outh1Eo3=0.055399425+0.019049119=0.036350306
现在我们有 ∂ E t o t a l ∂ o u t h 1 \frac{\partial E_{t o t a l}}{\partial o u t_{h 1}} outh1Etotal, 我们还需要计算 ∂ o u t h 1 ∂ n e t h 1 \frac{\partial o u t_{h 1}}{\partial n e t_{h 1}} neth1outh1, 然后对每个权重计算 ∂ n e t h 1 ∂ w \frac{\partial n e t_{h 1}}{\partial w} wneth1 :
o u t h 1 = 1 1 + e − n e t h 1 ∂ o u t h 1 ∂ n e t h 1 = o u t h 1 ( 1 − o u t h 1 ) = 0.59326999 ( 1 − 0.59326999 ) = 0.241300709 \begin{array}{l} \text out_{h 1}=\frac{1}{1+e^{-n e t_{h 1}}} \\ \\ \frac{\partial out_{h 1}}{\partial n e t_{h 1}}=\text out_{h 1}\left(1-\text out_{h 1}\right)=0.59326999(1-0.59326999)=0.241300709 \end{array} outh1=1+eneth11neth1outh1=outh1(1outh1)=0.59326999(10.59326999)=0.241300709
我们计算 h 1 h_{1} h1 w 1 w_{1} w1 的偏导数:
n e t h 1 = w 1 ∗ i 1 + w 2 ∗ i 2 + b 1 ∗ 1 ∂ n e t h 1 ∂ w 1 = i 1 = 0.05 \begin{array}{l} n e t_{h 1}=w_{1} * i_{1}+w_{2} * i_{2}+b_{1} * 1 \\ \\ \frac{\partial n e t_{h 1}}{\partial w_{1}}=i_{1}=0.05 \end{array} neth1=w1i1+w2i2+b11w1neth1=i1=0.05
∂ E total  ∂ w 1 = ∂ E total  ∂ o u h 1 ∗ ∂ o u t h 1 ∂ n e t h 1 ∗ ∂ n e t h 1 ∂ w 1 ∂ E total ∂ w 1 = 0.036350306 ∗ 0.241300709 ∗ 0.05 = 0.000438568 \begin{array}{l} \frac{\partial E_{\text {total }}}{\partial w_{1}}=\frac{\partial E_{\text {total }}}{\partial o u_{h 1}} * \frac{\partial o u t_{h 1}}{\partial n e t_{h 1}} * \frac{\partial n e t_{h 1}}{\partial w_{1}} \\ \\ \frac{\partial E_{\text {total}}}{\partial w_{1}}=0.036350306 * 0.241300709 * 0.05=0.000438568 \end{array} w1Etotal =ouh1Etotal neth1outh1w1neth1w1Etotal=0.0363503060.2413007090.05=0.000438568
∂ E total  ∂ w 1 = ( ∑ o ∂ E total  ∂ o u t ∘ ∗ ∂ o u t 0 ∂ n e t o ∗ ∂  net  0 ∂ o u t h 1 ) ∗ ∂ o u t h 1 ∂ n e t h 1 ∗ ∂ n e t h 1 ∂ w 1 ∂ E total  ∂ w 1 = ( ∑ o δ o ∗ w h o ) ∗ out ⁡ h 1 ( 1 − o u t h 1 ) ∗ i 1 ∂ E total  ∂ w 1 = δ h 1 i 1 \begin{array}{l} \frac{\partial E_{\text {total }}}{\partial w_{1}}=\left(\sum_{o} \frac{\partial E_{\text {total }}}{\partial o u t_{\circ}} * \frac{ \partial out_{0}}{\partial n e t_{o}} * \frac{\partial \text { net }_{0}}{\partial o u t_{h 1}}\right) * \frac{\partial o u t_{h 1}}{\partial n e t_{h 1}} * \frac{\partial n e t_{h 1}}{\partial w_{1}} \\ \\ \frac{\partial E_{\text {total }}}{\partial w_{1}}=\left(\sum_{o} \delta_{o} * w_{h o}\right) * \operatorname{out}_{h 1}\left(1-o u t_{h 1}\right) * i_{1} \\ \\ \frac{\partial E_{\text {total }}}{\partial w_{1}}=\delta_{h 1} i_{1} \end{array} w1Etotal =(ooutEtotal netoout0outh1 net 0)neth1outh1w1neth1w1Etotal =(oδowho)outh1(1outh1)i1w1Etotal =δh1i1
现在我们能更新 w 1 w_{1} w1 :
w 1 t + 1 = w 1 t − η ∗ ∂ E total  ∂ w 1 t = 0.15 − 0.5 ∗ 0.000438568 = 0.149780716 w_{1}^{t+1}=w_{1}^{t}-\eta * \frac{\partial E_{\text {total }}}{\partial w_{1}^{t}}=0.15-0.5 * 0.000438568=0.149780716 w1t+1=w1tηw1tEtotal =
w 2 , w 3 w_{2}, w_{3} w2,w3 w 4 w_{4} w4 重复上面过程:
w 2 t + 1 = 0.19956143 w 3 t + 1 = 0.24975114 w 4 t + 1 = 0.29950229 \begin{array}{l} w_{2}^{t+1}=0.19956143 \\ \\ w_{3}^{t+1}=0.24975114 \\ \\ w_{4}^{t+1}=0.29950229 \end{array} w2t+1=0.19956143w3t+1=0.24975114w4t+1=0.29950229
最后,我们更新所有权重,当我们把输入0.05和0.1向前反馈,神经网络的误差为0.298371109,在一次反向传播后,整体误差降到0.291027924,它看似不多,但是重复10000次之后,误差大幅下降到0.000035085,在这之后,我们把输入0.05和0.1向前反馈,那么输出的2个神经元生成0.015912196(vs 目标0.01)和0.984065734(vs 目标0.99)。


  • 0
  • 1
    觉得还不错? 一键收藏
  • 0


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


