前向传播
观察一下:
i 1 i1 i1 与 i 2 i2 i2 分别是两个输入,隐藏层有两个神经元节点 h 1 h1 h1 与 h 2 h2 h2,偏置项 b 1 b1 b1 与 b 2 b2 b2,输出层也有 2 个神经元节点 o 1 o1 o1 与 o 2 o2 o2
1、先计算隐藏层神经元节点线性部分:
l h 1 = w 1 × i 1 + w 2 × i 2 + b 1 l_{h1}=w1\times i1+w2\times i2+b1 lh1=w1×i1+w2×i2+b1
l h 2 = w 3 × i 1 + w 4 × i 2 + b 1 l_{h2}=w3\times i1+w4\times i2+b1 lh2=w3×i1+w4×i2+b1
2、再计算隐藏层神经元节点非线性部分:
o u t h 1 = 1 1 + e − l h 1 out_{h1}=\frac{1}{1+e^{-l_{h1}}} outh1=1+e−lh11
o u t h 2 = 1 1 + e − l h 2 out_{h2}=\frac{1}{1+e^{-l_{h2}}} outh2=1+e−lh21
3、再计算输出层神经元节点线性部分:
l o 1 = w 5 × o u t h 1 + w 6 × o u t h 2 + b 2 l_{o1}=w5\times out_{h1} + w6\times out_{h2} + b2 lo1=w5×outh1+w6×outh2+b2
l o 2 = w 7 × o u t h 1 + w 8 × o u t h 2 + b 2 l_{o2}=w7\times out_{h1} + w8\times out_{h2} + b2 lo2=w7×outh1+w8×outh2+b2
4、再计算输出层神经元节点非线性部分:
o u t o 1 = 1 1 + e − l o 1 out_{o1}=\frac{1}{1+e^{-l_{o1}}} outo1=1+e−lo11
o u t o 2 = 1 1 + e − l o 2 out_{o2}=\frac{1}{1+e^{-l_{o2}}} outo2=1+e−lo21
5、通过输出层的output与真实值(target)做差就是我们的损失函数:
E t o t a l = 1 n ∑ i = 1 n ( t a r g e t − o u t p u t ) 2 E_{total} = \frac{1}{n}\sum^n_{i=1}(target-output)^2 Etotal=n1∑i=1n(target−output)2
6、通过分别对 o u t o l out_{ol} outol和 o u t o 2 out_{o2} outo2计算损失函函数:
E o 1 = 1 2 ( t a r g e t − o u t o 1 ) 2 E_{o1}=\frac{1}{2}(target-out_{o1})^2 Eo1=21(target−outo1)2
E o 2 = 1 2 ( t a r g e t − o u t o 2 ) 2 E_{o2}=\frac{1}{2}(target-out_{o2})^2 Eo2=21(target−outo2)2
E t o t a l = E o 1 + E o 2 E_{total} = E_{o1}+E_{o2} Etotal=Eo1+Eo2
后向传播
一、观察权重w5对损失函数的影响有多大。
1、写出 E t o t a l E_{total} Etotal对 w 5 w5 w5求导的链式公式:
∂ E t o t a l ∂ w 5 = ∂ E t o t a l ∂ o u t o 1 × ∂ o u t o 1 ∂ l o 1 × ∂ l o 1 ∂ w 5 \frac{\partial E_{total}}{\partial w5}=\frac{\partial E_{total}}{\partial out_{o1}}\times \frac{\partial out_{o1}}{\partial l_{o1}}\times \frac{\partial l_{o1}}{\partial w5} ∂w5∂Etotal=∂outo1∂Etotal×∂lo1∂outo1×∂w5∂lo1
2、由前向传播我们已经求出了 E t o t a l E_{total} Etotal值:
E t o t a l = E o 1 + E o 2 = ∑ 1 2 ( t a r g e t − o u t o 1 ) 2 + ∑ 1 2 ( t a r g e t − o u t o 2 ) 2 E_{total} = E_{o1} + E_{o2} = \sum \frac{1}{2}(target-out_{o1})^2 + \sum \frac{1}{2}(target-out_{o2})^2 Etotal=Eo1+Eo2=∑21(target−outo1)2+∑21(target−outo2)2
3、算出链式公式的三个求导值:
因为 o u t o 1 = 1 1 + e l o 1 out_{o1}=\frac{1}{1+e^{l_{o1}}} outo1=1+elo11 和 l o 1 = w 5 × o u t h 1 + w 6 × o u t h 2 + b 2 l_{o1}=w5\times out_{h1}+w6\times out_{h2} +b2 lo1=w5×outh1+w6×outh2+b2
所以 w 5 w5 w5 只和 o u t o 1 out_{o1} outo1 和 l o 1 l_{o1} lo1 有关系:
∂ E t o t a l ∂ o u t o 1 = ∑ ( t a r g e t − o u t o 1 ) × ( − 1 ) \frac{\partial E_{total}}{\partial out_{o1}} = \sum (target-out_{o1})\times(-1) ∂outo1∂Etotal=∑(target−outo1)×(−1)
∂ o u t o 1 ∂ l o 1 = o u t o 1 × ( 1 − o u t o 1 ) \frac{\partial out_{o1}}{\partial l_{o1}}=out_{o1}\times(1-out_{o1}) ∂lo1∂outo1=outo1×(1−outo1)
∂ l o 1 ∂ w 5 = o u t h 1 \frac{\partial l_{o1}}{\partial w5}=out_{h1} ∂w5∂lo1=outh1
4、链式公式的值:
∂ E t o t a l ∂ w 5 = ∑ ( t a r g e t − o u t o 1 ) × ( − 1 ) × o u t o 1 × ( 1 − o u t o 1 ) × o u t h 1 \frac{\partial E_{total}}{\partial w5}=\sum (target-out_{o1}) \times (-1)\times out_{o1} \times (1-out_{o1}) \times out_{h1} ∂w5∂Etotal=∑(target−outo1)×(−1)×outo1×(1−outo1)×outh1
5、使用梯度下降来更新权重:
w 5 + = w 5 − η ∂ E t o t a l ∂ w 5 w^+_5=w_5-\eta\frac{\partial E_{total}}{\partial w5} w5+=w5−η∂w5∂Etotal
二、观察权重w1对损失函数的影响有多大。
1、写出 E t o t a l E_{total} Etotal对 w 1 w1 w1求导的链式公式:
∂ E t o t a l ∂ w 1 = ∂ E t o t a l ∂ o u t h 1 × ∂ o u t h 1 ∂ l h 1 × ∂ l h 1 ∂ w 1 \frac{\partial E_{total}}{\partial w1}=\frac{\partial E_{total}}{\partial out_{h1}}\times \frac{\partial out_{h1}}{\partial l_{h1}}\times \frac{\partial l_{h1}}{\partial w1} ∂w1∂Etotal=∂outh1∂Etotal×∂lh1∂outh1×∂w1∂lh1
2、先求 ∂ E t o t a l ∂ o u t h 1 \frac{\partial E_{total}}{\partial out_{h1}} ∂outh1∂Etotal
因为 E t o t a l = E o 1 + E o 2 E_{total} = E_{o1} + E_{o2} Etotal=Eo1+Eo2
所以 ∂ E t o t a l ∂ o u t h 1 = ∂ E o 1 ∂ o u t h 1 + ∂ E o 2 ∂ o u t h 1 \frac{\partial E_{total}}{\partial out_{h1}}=\frac{\partial E_{o1}}{\partial out_{h1}}+\frac{\partial E_{o2}}{\partial out_{h1}} ∂outh1∂Etotal=∂outh1∂Eo1+∂outh1∂Eo2
3、先求 ∂ E o 1 ∂ o u t h 1 \frac{\partial E_{o1}}{\partial out_{h1}} ∂outh1∂Eo1
E o 1 = ∑ 1 2 ( t a r g e t − o u t o 1 ) 2 E_{o1} = \sum \frac{1}{2}(target-out_{o1})^2 Eo1=∑21(target−outo1)2
o u t o 1 = 1 1 + e l o 1 out_{o1}=\frac{1}{1+e^{l_{o1}}} outo1=1+elo11
∂ E o 1 ∂ o u t h 1 = ∂ E o 1 ∂ o u t o 1 × ∂ o u t o 1 ∂ l o 1 × ∂ l l o 1 ∂ o u t h 1 \frac{\partial E_{o1}}{\partial out_{h1}}=\frac{\partial E_{o1}}{\partial out_{o1}}\times \frac{\partial out_{o1}}{\partial l_{o1}}\times \frac{\partial l_{lo1}}{\partial out_{h1}} ∂outh1∂Eo1=∂outo1∂Eo1×∂lo1∂outo1×∂outh1∂llo1
对上式的各项进行计算出:
∂ E o 1 ∂ o u t o 1 = ∑ ( t a r g e t − o u t o 1 ) × ( − 1 ) \frac{\partial E_{o1}}{\partial out_{o1}} = \sum (target-out_{o1})\times(-1) ∂outo1∂Eo1=∑(target−outo1)×(−1)
∂ o u t o 1 ∂ l o 1 = o u t o 1 × ( 1 − o u t o 1 ) \frac{\partial out_{o1}}{\partial l_{o1}}=out_{o1}\times(1-out_{o1}) ∂lo1∂outo1=outo1×(1−outo1)
因为 l o 1 = w 5 × o u t h 1 + w 6 × o u t h 2 + b 2 l_{o1}=w5\times out_{h1} + w6\times out_{h2} + b2 lo1=w5×outh1+w6×outh2+b2
∂ l o 1 ∂ o u t h 1 = w 5 \frac{\partial l_{o1}}{\partial out_{h1}}=w5 ∂outh1∂lo1=w5
4、与第三步同理可以求出 ∂ E o 2 ∂ o u t h 1 \frac{\partial E_{o2}}{\partial out_{h1}} ∂outh1∂Eo2
5、求出 ∂ o u t h 1 ∂ l h 1 \frac{\partial out_{h1}}{\partial l_{h1}} ∂lh1∂outh1
o u t h 1 = 1 1 + e − l h 1 out_{h1} = \frac{1}{1+e^{-l_{h1}}} outh1=1+e−lh11
∂ o u t h 1 ∂ l h 1 = o u t h 1 ( 1 − o u t h 1 ) \frac{\partial out_{h1}}{\partial l_{h1}}=out_{h1}(1-out_{h1}) ∂lh1∂outh1=outh1(1−outh1)
6、求出 ∂ l h 1 ∂ w 1 \frac{\partial l_{h1}}{\partial w1} ∂w1∂lh1
l h 1 = w 1 × i 1 + w 2 × i 2 + b 1 l_{h1}=w1\times i1+w2\times i2+b1 lh1=w1×i1+w2×i2+b1
∂ l h 1 ∂ w 1 = i 1 \frac{\partial l_{h1}}{\partial w1}=i1 ∂w1∂lh1=i1
7、根据求出的链式公式的三个导数求出链式公式的值: ∂ E t o t a l ∂ w 1 \frac{\partial E_{total}}{\partial w1} ∂w1∂Etotal
8、使用梯度下降来更新权重:
w 1 + = w 1 − η ∂ E t o t a l ∂ w 1 w^+_1=w_1-\eta \frac{\partial E_{total}}{\partial w1} w1+=w1−η∂w1∂Etotal