一、链式法则
- 链式法则用于求符合函数的导数,广泛应用于神经网络中的反向传播算法。
- 链式法则: d y d x = d y d u ∗ d u d x \frac{dy}{dx}=\frac{dy}{du}*\frac{du}{dx} dxdy=dudy∗dxdu
- 链式法则在神经网络中的应用:
∂ E ∂ w j k 1 = ∂ E ∂ O k 1 ∂ O k 1 ∂ w j k 1 = ∂ E ∂ O k 2 ∂ O k 2 ∂ O k 1 ∂ O k 1 ∂ w j k 1 \frac{\partial E}{\partial w_{jk}^{1}}=\frac{\partial E}{\partial O_{k}^{1}}\frac{\partial O_{k}^{1}}{\partial w_{jk}^{1}}=\frac{\partial E}{\partial O_{k}^{2}}\frac{\partial O_{k}^{2}}{\partial O_{k}^{1}}\frac{\partial O_{k}^{1}}{\partial w_{jk}^{1}} ∂wjk1∂E=∂Ok1∂E∂wjk1∂Ok1=∂Ok2∂E∂Ok1∂Ok2∂wjk1∂Ok1
二、反向传播原理引入(引自知乎)
- 以
e
=
(
a
+
b
)
∗
(
b
+
1
)
e=(a+b)*(b+1)
e=(a+b)∗(b+1)的偏导为例,其复合关系图如下
当a=2,b=1时,e的梯度我们可以用偏导关系来表示
- 利用链式法则进行推导:
- ∂ e ∂ a = ∂ e ∂ c ∂ c ∂ a = 1 ∗ 2 \frac{\partial e}{\partial a}=\frac{\partial e}{\partial c}\frac{\partial c}{\partial a}=1*2 ∂a∂e=∂c∂e∂a∂c=1∗2
- ∂ e ∂ b = ∂ e ∂ c ∂ c ∂ b + ∂ e ∂ d ∂ d ∂ b = 2 ∗ 1 + 3 ∗ 1 = 5 \frac{\partial e}{\partial b}=\frac{\partial e}{\partial c}\frac{\partial c}{\partial b}+\frac{\partial e}{\partial d}\frac{\partial d}{\partial b}=2*1+3*1=5 ∂b∂e=∂c∂e∂b∂c+∂d∂e∂b∂d=2∗1+3∗1=5
- 规律总结:
- 求偏导,可以从当前点,一直累乘到叶子结点,并求和。
三、举个栗子
1、题目描述
- 现在有如下网络层
- 第一层:输入层;第二层:隐含层;第三层:输出层。
- 激活函数为:sigmoid函数
- 赋值后如下
- 目标:给定输入数据i1,i2(0.05,0.10),使输出尽可能与原始输出o1,o2(0.01和0.99)接近。
2、前向传播求解
- 输入层——>隐含层
- 计算神经元h1的输入加权求和: n e t h 1 = w 1 ∗ i 1 + w 2 ∗ i 2 + b 1 = 0.05 ∗ 0.15 + 0.10 ∗ 0.25 = 0.3775 net_{h1}=w_1*i_1+w_2*i_2+b_1=0.05*0.15+0.10*0.25=0.3775 neth1=w1∗i1+w2∗i2+b1=0.05∗0.15+0.10∗0.25=0.3775
- 计算神经元h1的输出o1:
o u t h 1 = 1 1 + e − n e t h 1 = 1 1 + e − 0.3775 = 0.593269992 out_{h1}=\frac{1}{1+e^{-net_{h1}}}=\frac{1}{1+e^{-0.3775}}=0.593269992 outh1=1+e−neth11=1+e−0.37751=0.593269992 - 同理计算出h1的输出o2:
o u t h 2 = 0.596884378 out_{h2}=0.596884378 outh2=0.596884378
- 隐含层——>输出层
- 计算输出层神经元o1和o2的值:
n e t o 1 = w 5 ∗ o u t h 1 + w 6 ∗ o u t h 2 + b 2 = 0.4 ∗ 0.593269992 + 0.45 ∗ 0.596884378 + 0.6 = 1.105905967 net_{o1}=w_5*out_{h1}+w_6*out_{h2}+b_2=0.4*0.593269992+0.45*0.596884378+0.6=1.105905967 neto1=w5∗outh1+w6∗outh2+b2=0.4∗0.593269992+0.45∗0.596884378+0.6=1.105905967
o u t o 1 = 1 1 + e − n e t o 1 = 1 1 + e − 1.105905967 = 0.75136507 out_{o1}=\frac{1}{1+e^{-net_{o1}}}=\frac{1}{1+e^{-1.105905967}}=0.75136507 outo1=1+e−neto11=1+e−1.1059059671=0.75136507
o u t o 2 = 0.772928465 out_{o2}=0.772928465 outo2=0.772928465
- 计算输出层神经元o1和o2的值:
- 检验输出值与实际值的差距
- 向前传播的过程得到的输出值为[0.75136079 , 0.772928465],与实际值[0.01,0.99]相差很远,现在对误差进行反向传播,来更新权值w,并重新计算输出
3、反向传播
- 计算总误差:
- E t o t a l = ∑ 1 2 ( t a r g e t − o u t p u t ) 2 E_{total}=\sum{\frac{1}{2}(target-output)^{2}} Etotal=∑21(target−output)2
- E o 1 = ∑ 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 = 1 2 ( 0.01 − 0.75136507 ) 2 = 0.274811083 E_{o1}=\sum{\frac{1}{2}(target_{o1}-out_{o1})^{2}}=\frac{1}{2}(0.01-0.75136507)^2=0.274811083 Eo1=∑21(targeto1−outo1)2=21(0.01−0.75136507)2=0.274811083
- E o 2 = 0.023560026 E_{o2}=0.023560026 Eo2=0.023560026
- E t o t a l = E o 1 + E o 2 = 0.274811083 + 0.023560026 = 0.298371109 E_{total}=E_{o1}+E_{o2}=0.274811083+0.023560026=0.298371109 Etotal=Eo1+Eo2=0.274811083+0.023560026=0.298371109
- 隐含层——>输出层的权值更新:
- 以权重 w 5 w_5 w5为例,用整体误差对 w 5 w_5 w5求偏导,可以知道 w 5 w_5 w5对整体误差产生多少影响。
-
∂
E
t
o
t
a
l
∂
w
5
=
∂
E
t
o
t
a
l
∂
o
u
t
1
∂
o
u
t
1
∂
n
e
t
o
1
∂
n
e
t
o
1
∂
w
5
\frac{\partial Etotal}{\partial w_5}=\frac{\partial Etotal}{\partial out_1}\frac{\partial out_1}{\partial net_{o1}}\frac{\partial net_{o1}}{\partial w_5}
∂w5∂Etotal=∂out1∂Etotal∂neto1∂out1∂w5∂neto1,如下图所示
- 计算
∂
E
t
o
t
a
l
∂
o
u
t
o
1
\frac{\partial Etotal}{\partial out_{o1}}
∂outo1∂Etotal
- E t o t a l = 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 + 1 2 ( t a r g e t o 2 − o u t o 2 ) 2 E_{total}=\frac{1}{2}(target_{o1}-out_{o1})^2+\frac{1}{2}(target_{o2}-out_{o2})^2 Etotal=21(targeto1−outo1)2+21(targeto2−outo2)2
- ∂ E t o t a l ∂ o u t o 1 = − ( t a r g e t o 1 − o u t o 1 ) = − ( 0.01 − 0.75136507 ) = 0.74136507 \frac{\partial Etotal}{\partial out_{o1}}=-(target_{o1}-out_{o1})=-(0.01 -0.75136507)=0.74136507 ∂outo1∂Etotal=−(targeto1−outo1)=−(0.01−0.75136507)=0.74136507
- 计算
∂
o
u
t
o
1
∂
n
e
t
o
1
\frac{\partial out_{o1}}{\partial net_{o1}}
∂neto1∂outo1
- o u t o 1 = 1 1 + e − n e t o 1 out_{o1}=\frac{1}{1+e^{-net_{o1}}} outo1=1+e−neto11
- ∂ o u t o 1 ∂ n e t o 1 = o u t o 1 ( 1 − o u t o 1 ) = 0.75136507 ( 1 − 0.75136507 ) = 0.186815602 \frac{\partial out_{o1}}{\partial net_{o1}}=out_{o1}(1-out_{o1})=0.75136507(1-0.75136507)=0.186815602 ∂neto1∂outo1=outo1(1−outo1)=0.75136507(1−0.75136507)=0.186815602
- 计算
∂
n
e
t
o
1
∂
w
5
\frac{\partial net_{o1}}{\partial w_{5}}
∂w5∂neto1
- n e t o 1 = w 5 ∗ o u t h 1 + w 6 ∗ o u t h 2 + b 2 net_{o1}=w_5*out_{h1}+w_6*out_{h2}+b_2 neto1=w5∗outh1+w6∗outh2+b2
- ∂ n e t o 1 ∂ w 5 = o u t h 1 = 0.593269992 \frac{\partial net_{o1}}{\partial w_{5}}=out_{h1}=0.593269992 ∂w5∂neto1=outh1=0.593269992
- 最后得 ∂ E t o t a l ∂ w 5 = 0.74136507 ∗ 0.186815602 ∗ 0.593269992 = 0.082167041 \frac{\partial E_{total}}{\partial w_5}=0.74136507*0.186815602*0.593269992 = 0.082167041 ∂w5∂Etotal=0.74136507∗0.186815602∗0.593269992=0.082167041
- 最后可得
∂
E
t
o
t
a
l
∂
w
5
=
−
(
t
a
r
g
e
t
o
1
−
o
u
t
o
1
)
∗
o
u
t
01
∗
(
1
−
o
u
t
o
1
)
\frac{\partial E_{total}}{\partial w_5}=-(target_{o1}-out_{o1})*out_{01}*(1-out_{o1})
∂w5∂Etotal=−(targeto1−outo1)∗out01∗(1−outo1)
- 用 δ o 1 \delta o_1 δo1表示输出层误差,即 δ o 1 = ∂ E t o t a l ∂ o u t o 1 ∂ o u t o 1 ∂ n e t o 1 = ∂ E t o t a l ∂ n e t o 1 = − ( t a r g e t o 1 − o u t o 1 ) ∗ o u t o 1 ( 1 − o u t o 1 ) \delta o_1=\frac{\partial E_{total}}{\partial out_{o1}}\frac{\partial out_{o1}}{\partial net_{o1}}=\frac{\partial E_{total}}{\partial net_{o1}}=-(target_{o1}-out_{o1})*out_{o1}(1-out_{o1}) δo1=∂outo1∂Etotal∂neto1∂outo1=∂neto1∂Etotal=−(targeto1−outo1)∗outo1(1−outo1)
- ∂ E t o t a l ∂ w 5 = δ o 1 o u t h 1 \frac{\partial E_{total}}{\partial w_5}=\delta o_1out_{h1} ∂w5∂Etotal=δo1outh1
- 最后更新
w
5
w_5
w5的值
- w 5 + = w 5 − η ∗ ∂ E t o t a l ∂ w 5 w_5^+=w_5-\eta * \frac{\partial E_{total}}{\partial w_5} w5+=w5−η∗∂w5∂Etotal
- η \eta η是学习速率
- 隐含层——>输出层的权值更新:
方法与上述类似,但是计算out(h1)对w1的偏导时,会接受E(o1)、E(o2)两个地方传来的误差 所以都需要计算一下
- 计算 ∂ E t o t a l ∂ w 1 = ∂ E t o t a l ∂ o u t h 1 ∂ o u t h 1 ∂ n e t h 1 ∂ n e t h 1 ∂ w 1 \frac{\partial E_{total}}{\partial w_{1}}=\frac{\partial E_{total}}{\partial out_{h1}}\frac{\partial out_{h1}}{\partial net_{h1}}\frac{\partial net_{h1}}{\partial w_{1}} ∂w1∂Etotal=∂outh1∂Etotal∂neth1∂outh1∂w1∂neth1
- 化简式子 ∂ E t o t a l ∂ w 1 = ( ∑ o ∂ E t o t a l ∂ o u t o ∂ o u t o ∂ n e t o ∂ n e t o ∂ o u t h 1 ) ∗ ∂ o u t h 1 ∂ n e t h 1 ∗ ∂ n e t h 1 ∂ w 1 = ( ∑ o δ o w h o ) ∗ o u t h 1 ( 1 − o u t h 1 ) ∗ i 1 = δ h 1 i 1 \frac{\partial E_{total}}{\partial w_{1}}=(\sum_o\frac{\partial E_total}{\partial out_o}\frac{\partial out_o}{\partial net_o}\frac{\partial net_o}{\partial out_{h1}})*\frac{\partial out_{h1}}{\partial net_{h1}}*\frac{\partial net_{h1}}{\partial w_{1}}=(\sum_o\delta_ow_{ho})*out_{h1}(1-out_{h1})*i_1=\delta h_1i_1 ∂w1∂Etotal=(∑o∂outo∂Etotal∂neto∂outo∂outh1∂neto)∗∂neth1∂outh1∗∂w1∂neth1=(∑oδowho)∗outh1(1−outh1)∗i1=δh1i1
- 最后更新
w
1
w_1
w1的值
- w 1 + = w 1 − η ∗ ∂ E t o t a l ∂ w 1 w_1^+=w_1-\eta * \frac{\partial E_{total}}{\partial w_1} w1+=w1−η∗∂w1∂Etotal