前向传播和反向传播
前言
前向传播就是神经网络的每一层经过权重乘以输入求和再加上偏置项,通过激活函数在得到输出,输出再做为下一层的输入,反复直至的到最后输出的过程。
反向传播就是神经网络通过最终的输出计算每一层的权重对最终输出的影响(用偏导数衡量),再经过梯度下降原理,从当前权值减去学习率乘以偏导数,从而达到更新权值的过程。
下面我通过一个例子来演示这两个过程。
假设我有一个三层的神经网络:
i1、i2是输入层,h1、h2是隐藏层,o1、o2是输出层,b1、b2是输入层到隐藏层和隐藏层到输出层的偏置,w是每一层到下一层的权值。每一层都是两个神经元,激活函数使用Sigmoid函数。然后我给每个参数赋予初值。
其中,
输
入
数
据
:
i
1
=
0.05
,
i
2
=
0.10
输
出
数
据
:
o
1
=
0.01
,
o
2
=
0.99
初
始
权
重
:
w
1
=
0.15
,
w
2
=
0.20
,
w
3
=
0.25
,
w
4
=
0.30
w
5
=
0.40
,
w
6
=
0.45
,
w
7
=
0.50
,
w
8
=
0.55
\begin{aligned} 输入数据:&i1=0.05,i2=0.10\\ 输出数据:&o1=0.01,o2=0.99\\ 初始权重:&w1=0.15,w2=0.20,w3=0.25,w4=0.30\\ &w5=0.40, w6=0.45,w7=0.50,w8=0.55 \end{aligned}
输入数据:输出数据:初始权重:i1=0.05,i2=0.10o1=0.01,o2=0.99w1=0.15,w2=0.20,w3=0.25,w4=0.30w5=0.40,w6=0.45,w7=0.50,w8=0.55
目标:给出输入数据i1=0.05,i2=0.10,使最终输出尽可能与目标输出o1=0.01,o2=0.99接近。
前向传播
- 输入层
----->
隐藏层
计算隐藏层h1的输入:
n e t h 1 = i 1 ∗ w 1 + i 2 ∗ w 2 + b 1 ∗ 1 = 0.05 ∗ 0.15 + 0.10 ∗ 0.20 + 0.35 ∗ 1 = 0.3775 \begin{aligned} net_{h1} &=i1*w1+ i2*w2 +b1*1\\ &=0.05 *0.15+0.10*0.20+0.35*1\\ &=0.3775 \end{aligned} neth1=i1∗w1+i2∗w2+b1∗1=0.05∗0.15+0.10∗0.20+0.35∗1=0.3775
神经元h1的输出:
o u t h 1 = 1 1 + e − n e t h 1 = 1 1 + e − 0.3775 = 0.59326999 out_{h1}=\dfrac{1}{1+e^{-net_{h1}}}=\dfrac{1}{1+e^{-0.3775}} =0.59326999 outh1=1+e−neth11=1+e−0.37751=0.59326999
同理,算出神经元h2的输出为:
o u t h 2 = 0.59688438 out_{h2}=0.59688438 outh2=0.59688438 - 隐藏层
----->
输出层
计算输出层o1的输入:
n e t o 1 = o u t h 1 ∗ w 5 + o u t h 2 ∗ w 6 + b 2 ∗ 1 = 0.59326999 ∗ 0.4 + 0.59688438 ∗ 0.45 + 0.60 ∗ 1 = 1.10590597 \begin{aligned} net_{o1} &=out_{h1}*w5+ out_{h2}*w6 +b2*1\\ &=0.59326999*0.4+0.59688438*0.45+0.60*1\\ &=1.10590597 \end{aligned} neto1=outh1∗w5+outh2∗w6+b2∗1=0.59326999∗0.4+0.59688438∗0.45+0.60∗1=1.10590597
神经元o1的输出:
o u t o 1 = 1 1 + e − n e t o 1 = 1 1 + e − 1.10590597 = 0.75136507 out_{o1}=\dfrac{1}{1+e^{-net_{o1}}}=\dfrac{1}{1+e^{-1.10590597}} =0.75136507 outo1=1+e−neto11=1+e−1.105905971=0.75136507
同理,算出神经元o2的输出为:
o u t o 2 = 0.77292847 out_{o2}=0.77292847 outo2=0.77292847
至此前向传播结束,得到了输出值 o1=0.75136507,o2=0.77292847,与目标输出o1=0.01,o2=0.99相差较多。我们接着对误差反向传播,更新权值,再重新计算输出。
反向传播
-
计算总误差
总误差等于所有误差之和:
E t o t a l = ∑ 1 2 ( t a r g e t − o u t p u t ) 2 = E o 1 + E o 2 E o 1 = 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 = 1 2 ( 0.01 − 0.75136507 ) 2 = 0.27481108 E o 2 = 0.02356003 E t o t a l = E o 1 + E o 2 = 0.29837111 \begin{aligned} E_{total} &=\sum\dfrac{1}{2}(target-output)^2=E_{o1}+E_{o2}\\ E_{o1}&=\dfrac{1}{2}(target_{o1}-out_{o1})^2\\ &=\dfrac{1}{2}(0.01-0.75136507)^2\\ &=0.27481108\\ E_{o2}&=0.02356003\\ E_{total} &=E_{o1}+E_{o2}=0.29837111 \end{aligned} EtotalEo1Eo2Etotal=∑21(target−output)2=Eo1+Eo2=21(targeto1−outo1)2=21(0.01−0.75136507)2=0.27481108=0.02356003=Eo1+Eo2=0.29837111 -
隐藏层
----->
输出层的权值更新
以权重 w 7 w7 w7为例,计算 w 7 w7 w7对整个结果的影响(使用链式法则分开求导):
下图明确了误差是如何反向传播的:
∂ E t o t a l ∂ w 7 = ∂ E t o t a l ∂ o u t 02 × ∂ o u t 02 ∂ n e t 02 × ∂ n e t o 2 ∂ w 7 \begin{aligned}\\\\\\ \dfrac {\partial E_{total}}{\partial w_{7}}=\dfrac {\partial E_{total}}{\partial out_{02}}\times \dfrac {\partial out_{02}}{\partial net_{02}}\times \dfrac {\partial net_{o2}} {\partial w_{7}} \end{aligned} ∂w7∂Etotal=∂out02∂Etotal×∂net02∂out02×∂w7∂neto2
计算 ∂ E t o t a l ∂ o u t 02 \dfrac {\partial E_{total}}{\partial out_{02}} ∂out02∂Etotal:
E t o t a l = ∑ 1 2 ( t a r g e t − o u t p u t ) 2 = 1 2 ( t a r g e t o 1 − o u t o 1 ) 2 + 1 2 ( t a r g e t o 2 − o u t p u t o 2 ) 2 ∂ E t o t a l ∂ o u t 02 = 1 2 × 2 × ( t a r g e t o 2 − o u t p u t o 2 ) × ( − 1 ) = − ( t a r g e t o 2 − o u t o 2 ) = − ( 0.99 − 0.77292847 ) = − 0.2170753 \begin{aligned} E_{total} &=\sum\dfrac{1}{2}(target-output)^2\\ &=\dfrac{1}{2}(target_{o1}-out_{o1})^2+\dfrac{1}{2}(target_{o2}-output_{o2})^2\\\\ \dfrac {\partial E_{total}}{\partial out_{02}} &=\dfrac{1}{2}\times2\times(target_{o2}-output_{o2})\times(-1)\\ &=-(target_{o2}-out_{o2})\\ &=-(0.99-0.77292847)\\ &=-0.2170753\\ \end{aligned} Etotal∂out02∂Etotal=∑21(target−output)2=21(targeto1−outo1)2+21(targeto2−outputo2)2=21×2×(targeto2−outputo2)×(−1)=−(targeto2−outo2)=−(0.99−0.77292847)=−0.2170753
计算 ∂ o u t o 2 n e t o 2 \dfrac{\partial out_{o2}}{net_{o2}} neto2∂outo2:o u t o 2 = 1 1 + e − n e t o 2 ∂ o u t o 2 ∂ n e t o 2 = o u t o 2 ( 1 − o u t o 2 ) ( 这 就 是 S i g m o i d 函 数 的 导 数 ) = 0.77292847 × ( 1 − 0.77292847 ) = 0.17551005 \begin{aligned} out_{o2} &=\dfrac{1}{1+e^{-net_{o2}}}\\\\ \dfrac{\partial out_{o2}}{\partial net_{o2}} &=out_{o2}(1-out_{o2}) (这就是Sigmoid函数的导数)\\ &=0.77292847\times(1-0.77292847)\\ &=0.17551005 \end{aligned} outo2∂neto2∂outo2=1+e−neto21=outo2(1−outo2)(这就是Sigmoid函数的导数)=0.77292847×(1−0.77292847)=0.17551005
计算 ∂ n e t o 2 ∂ w 7 \dfrac{\partial net_{o2}}{\partial w_{7}} ∂w7∂neto2:
n e t o 1 = o u t h 1 × w 7 + o u t h 2 × w 8 + b 2 × 1 ∂ n e t o 1 ∂ w 7 = o u t h 1 = 0.59326999 \begin{aligned} net_{o1} &=out_{h1}\times w_{7}+out_{h2}\times w_{8}+b_{2}\times 1\\ \dfrac{\partial net_{o1}}{\partial w_{7}} &=out_{h1}\\ &=0.59326999 \end{aligned} neto1∂w7∂neto1=outh1×w7+outh2×w8+b2×1=outh1=0.59326999由此我们求得:
∂ E t o t a l ∂ w 7 = ∂ E t o t a l ∂ o u t 02 × ∂ o u t 02 ∂ n e t 02 × ∂ n e t o 2 ∂ w 7 = − 0.2170753 × 0.17551005 × 0.59326999 = − 0.02260294 \begin{aligned} \dfrac {\partial E_{total}}{\partial w_{7}} &=\dfrac {\partial E_{total}}{\partial out_{02}}\times \dfrac {\partial out_{02}} {\partial net_{02}}\times \dfrac {\partial net_{o2}} {\partial w_{7}}\\\\ &=-0.2170753 \times 0.17551005\times 0.59326999\\\\ &=-0.02260294 \end{aligned} ∂w7∂Etotal=∂out02∂Etotal×∂net02∂out02×∂w7∂neto2=−0.2170753×0.17551005×0.59326999=−0.02260294
这样我们就求得整体误差 E t o t a l E_{total} Etotal对 w 7 w_{7} w7的偏导数。再看上面的公式,我们发现:
∂ E t o t a l ∂ w 7 = − ( t a r g e t o 2 − o u t o 2 ) ∗ o u t o 2 ( 1 − o u t o 2 ) ∗ o u t h 1 \begin{aligned} \dfrac {\partial E_{total}}{\partial w_{7}} &=-(target_{o2}-out_{o2})* out_{o2}(1-out_{o2}) *out_{h1} \end{aligned} ∂w7∂Etotal=−(targeto2−outo2)∗outo2(1−outo2)∗outh1
现在我们用 δ o 2 \delta_{o2} δo2来表示输出层的误差:
δ o 2 = ∂ E t o t a l ∂ o u t 02 × ∂ o u t 02 ∂ n e t 02 = ∂ E t o t a l ∂ n e t o 2 δ o 2 = − ( t a r g e t o 2 − o u t o 2 ) ∗ o u t o 2 ( 1 − o u t o 2 ) 所 以 , 整 体 误 差 E t o t a l 对 w 7 的 偏 导 数 简 写 为 : ∂ E t o t a l ∂ w 7 = δ o 1 ∗ o u t h 1 \begin{aligned} \delta_{o2} &=\dfrac {\partial E_{total}}{\partial out_{02}}\times \dfrac {\partial out_{02}} {\partial net_{02}}\\\\ &=\dfrac{\partial E_{total}}{\partial net_{o2}}\\\\ \delta_{o2} &=-(target_{o2}-out_{o2})* out_{o2}(1-out_{o2})\\\\ & 所以,整体误差E_{total}对w_{7}的偏导数简写为:\\\\ \dfrac{\partial E_{total}}{\partial w_{7}} &=\delta_{o1}*out_{h1} \end{aligned} δo2δo2∂w7∂Etotal=∂out02∂Etotal×∂net02∂out02=∂neto2∂Etotal=−(targeto2−outo2)∗outo2(1−outo2)所以,整体误差Etotal对w7的偏导数简写为:=δo1∗outh1然后我们就可以更新 w 7 w7 w7 的值了,假设 η = 0.5 \eta=0.5 η=0.5
w 7 ∗ = w 7 − η × ∂ E t o t a l ∂ w 7 = 0.5 − 0.5 × ( − 0.02260294 ) = 0.51130147 \begin{aligned} w_{7}^* &=w_{7}-\eta\times \dfrac{\partial E_{total}}{\partial w_{7}} \\ &=0.5-0.5\times (-0.02260294)\\ &=0.51130147 \end{aligned} w7∗=w7−η×∂w7∂Etotal=0.5−0.5×(−0.02260294)=0.51130147
同理,对 w 5 , w 6 , w 8 , w_{5},w_{6},w_{8}, w5,w6,w8,也进行更新:
w 5 ∗ = 0.35891648 w 6 ∗ = 0.40866619 w 8 ∗ = 0.56137012 \begin{aligned} w_{5}^* &=0.35891648\\\\ w_{6}^* &=0.40866619\\\\ w_{8}^* &=0.56137012\\\\ \end{aligned} w5∗w6∗w8∗=0.35891648=0.40866619=0.56137012 -
输入层
----->
隐藏层的权值更新 (假设更新 w 3 w_{3} w3)
输入层----->
隐藏层的权值更新跟上面 隐藏层----->
输出层的权值更新思路差不多,但是隐藏 层----->
输出层的权值更新是从 o u t o 2 − > n e t o 2 − > w 7 out_{o2} ->net_{o2}->w_{7} outo2−>neto2−>w7,输入层----->
隐藏层的权值更新是从 o u t h 2 − > n e t h 2 − > w 3 out_{h2}->net_{h2}->w_{3} outh2−>neth2−>w3,但是 o u t h 2 out_{h2} outh2 会接受 E o 1 , E o 2 E_{o1},E_{o2} Eo1,Eo2 两个方向传来的误差,所以都要计算。下图说明了误差影响的方向。
∂ E t o t a l ∂ w 3 = ∂ E t o t a l ∂ o u t h 2 ∗ ∂ o u t h 2 ∂ n e t h 2 ∗ ∂ n e t h 2 ∂ w 3 \begin{aligned} \dfrac{\partial E_{total}}{\partial w_{3}} &=\dfrac{\partial E_{total}}{\partial out_{h2}}*\dfrac{\partial out_{h2}}{\partial net_{h2}}*\dfrac{\partial net_{h2}}{\partial w_{3}}\\\\ \end{aligned} ∂w3∂Etotal=∂outh2∂Etotal∗∂neth2∂outh2∗∂w3∂neth2
∂
E
t
o
t
a
l
∂
o
u
t
h
2
\dfrac{\partial E_{total}}{\partial out_{h2}}
∂outh2∂Etotal:
E
t
o
t
a
l
=
E
o
1
+
E
o
2
∂
E
t
o
t
a
l
∂
o
u
t
h
2
=
∂
E
o
1
∂
o
u
t
h
2
+
∂
E
o
2
∂
o
u
t
h
2
\begin{aligned} E_{total} &=E_{o1}+E_{o2}\\\\ \dfrac{\partial E_{total}}{\partial out_{h2}} &=\dfrac{\partial E_{o1}}{\partial out_{h2}}+\dfrac{\partial E_{o2}}{\partial out_{h2}} \end{aligned}
Etotal∂outh2∂Etotal=Eo1+Eo2=∂outh2∂Eo1+∂outh2∂Eo2
再计算
∂
E
o
1
∂
o
u
t
h
2
,
∂
E
o
2
∂
o
u
t
h
2
\dfrac{\partial E_{o1}}{\partial out_{h2}},\dfrac{\partial E_{o2}}{\partial out_{h2}}
∂outh2∂Eo1,∂outh2∂Eo2
∂
E
o
1
∂
o
u
t
h
2
=
∂
E
o
1
∂
o
u
t
o
1
∗
∂
o
u
t
o
1
∂
n
e
t
o
1
∗
∂
n
e
t
o
1
∂
o
u
t
h
2
分
别
计
算
∂
E
o
1
∂
o
u
t
o
1
、
∂
o
u
t
o
1
∂
n
e
t
o
1
、
∂
n
e
t
o
1
∂
o
u
t
h
2
E
o
1
=
1
2
∗
(
t
a
r
g
e
t
o
1
−
o
u
t
o
1
)
2
∂
E
o
1
∂
o
u
t
o
1
=
1
2
∗
2
∗
(
t
a
r
g
e
t
o
1
−
o
u
t
o
1
)
∗
(
−
1
)
=
−
(
0.01
−
0.75136507
)
=
0.74136507
o
u
t
o
1
=
1
1
+
e
−
n
e
t
o
1
∂
o
u
t
o
1
∂
n
e
t
o
1
=
o
u
t
o
1
∗
(
1
−
o
u
t
o
1
)
=
0.75136507
∗
(
1
−
0.75136507
)
=
0.18681560
n
e
t
o
1
=
o
u
t
h
1
∗
w
5
+
o
u
t
h
2
∗
w
6
+
1
∗
b
2
∂
n
e
t
o
1
∂
o
u
t
h
2
=
w
6
=
0.45
所
以
,
∂
E
o
1
∂
o
u
t
h
2
=
∂
E
o
1
∂
o
u
t
o
1
∗
∂
o
u
t
o
1
∂
n
e
t
o
1
∗
∂
n
e
t
o
1
∂
o
u
t
h
2
=
0.74136507
∗
0.18681560
∗
0.45
=
0.06232435
\begin{aligned}\\ \dfrac{\partial E_{o1}}{\partial out_{h2}} &=\dfrac{\partial E_{o1}}{\partial out_{o1}}*\dfrac{\partial out_{o1}} {\partial net_{o1}}*\dfrac{\partial net_{o1}}{\partial out_{h2}}\\\\ 分别计算\dfrac{\partial E_{o1}}{\partial out_{o1}}、\dfrac{\partial out_{o1}}{\partial net_{o1}}、\dfrac{\partial net_{o1}}{\partial out_{h2}}\\\\ E_{o1} &=\dfrac{1}{2}*(target_{o1}-out_{o1})^2\\\\ \dfrac{\partial E_{o1}}{\partial out_{o1}} &=\dfrac{1}{2}*2*(target_{o1}-out_{o1} )*(-1)\\\\ &=-(0.01-0.75136507)\\\\ &=0.74136507\\\\ out_{o1} &=\dfrac{1}{1+e^{-net_{o1}}}\\\\ \dfrac{\partial out_{o1}}{\partial net_{o1}} &=out_{o1}*(1-out_{o1})\\\\ &=0.75136507*(1-0.75136507)\\\\ &=0.18681560\\\\ net_{o1}&=out_{h1}*w_{5}+out_{h2}*w_{6}+1*b_{2}\\\\ \dfrac{\partial net_{o1}}{\partial out_{h2}} &=w_{6}=0.45\\\\ 所以,\dfrac{\partial E_{o1}}{\partial out_{h2}} &=\dfrac{\partial E_{o1}}{\partial out_{o1}}*\dfrac{\partial out_{o1}} {\partial net_{o1}}*\dfrac{\partial net_{o1}}{\partial out_{h2}}\\\\ &=0.74136507*0.18681560*0.45\\\\ &=0.06232435 \end{aligned}
∂outh2∂Eo1分别计算∂outo1∂Eo1、∂neto1∂outo1、∂outh2∂neto1Eo1∂outo1∂Eo1outo1∂neto1∂outo1neto1∂outh2∂neto1所以,∂outh2∂Eo1=∂outo1∂Eo1∗∂neto1∂outo1∗∂outh2∂neto1=21∗(targeto1−outo1)2=21∗2∗(targeto1−outo1)∗(−1)=−(0.01−0.75136507)=0.74136507=1+e−neto11=outo1∗(1−outo1)=0.75136507∗(1−0.75136507)=0.18681560=outh1∗w5+outh2∗w6+1∗b2=w6=0.45=∂outo1∂Eo1∗∂neto1∂outo1∗∂outh2∂neto1=0.74136507∗0.18681560∗0.45=0.06232435
同理得:
∂
E
o
2
∂
o
u
t
h
2
=
0.17551005
\dfrac{\partial E_{o2}}{\partial out_{h2}}=0.17551005
∂outh2∂Eo2=0.17551005
所以
∂
E
t
o
t
a
l
∂
o
u
t
h
2
=
∂
E
o
1
∂
o
u
t
h
2
+
∂
E
o
2
∂
o
u
t
h
2
=
0.06232435
+
0.17551005
=
0.23783440
\begin{aligned} \dfrac{\partial E_{total}}{\partial out_{h2}} &=\dfrac{\partial E_{o1}}{\partial out_{h2}}+\dfrac{\partial E_{o2}}{\partial out_{h2}}\\\\ &=0.06232435+0.17551005\\\\ &=0.23783440\\\\ \end{aligned}
∂outh2∂Etotal=∂outh2∂Eo1+∂outh2∂Eo2=0.06232435+0.17551005=0.23783440
∂
o
u
t
h
2
n
e
t
h
2
\dfrac{\partial out_{h2}}{net_{h2}}
neth2∂outh2:
o
u
t
h
2
=
1
1
+
e
−
n
e
t
h
2
∂
o
u
t
h
2
n
e
t
h
2
=
o
u
t
h
2
∗
(
1
−
o
u
t
h
2
)
=
0.59688438
∗
(
1
−
0.59688438
)
=
0.24061342
\begin{aligned} out_{h2}&=\dfrac{1}{1+e^{-net_{h2}}}\\\\ \dfrac{\partial out_{h2}}{net_{h2}} &=out_{h2}*(1-out_{h2})\\\\ &=0.59688438*(1-0.59688438)\\\\ &=0.24061342 \end{aligned}
outh2neth2∂outh2=1+e−neth21=outh2∗(1−outh2)=0.59688438∗(1−0.59688438)=0.24061342
∂
n
e
t
h
2
∂
w
3
\dfrac{\partial net_{h2}}{\partial w_{3}}
∂w3∂neth2:
n
e
t
h
2
=
i
1
∗
w
3
+
i
2
∗
w
4
+
1
∗
b
1
∂
n
e
t
h
2
∂
w
3
=
i
1
=
0.05
\begin{aligned} net_{h2}&=i_{1}*w_{3}+i_{2}*w_{4}+1*b_{1}\\\\ \dfrac{\partial net_{h2}}{\partial w_{3}}&=i_{1}=0.05 \end{aligned}
neth2∂w3∂neth2=i1∗w3+i2∗w4+1∗b1=i1=0.05
所以
∂
E
t
o
t
a
l
∂
w
3
=
∂
E
t
o
t
a
l
∂
o
u
t
h
2
∗
∂
o
u
t
h
2
∂
n
e
t
h
2
∗
∂
n
e
t
h
2
∂
w
3
=
0.23783440
∗
0.24061306
∗
0.05
=
0.00286130
\begin{aligned} \dfrac{\partial E_{total}}{\partial w_{3}} &=\dfrac{\partial E_{total}}{\partial out_{h2}}*\dfrac{\partial out_{h2}}{\partial net_{h2}}*\dfrac{\partial net_{h2}}{\partial w_{3}}\\\\ &=0.23783440*0.24061306*0.05\\\\ &=0.00286130 \end{aligned}
∂w3∂Etotal=∂outh2∂Etotal∗∂neth2∂outh2∗∂w3∂neth2=0.23783440∗0.24061306∗0.05=0.00286130
为了简记,用
δ
h
1
\delta_{h1}
δh1代表隐藏层
h
1
h_{1}
h1的误差:
∂
E
t
o
t
a
l
∂
w
3
=
∂
E
t
o
t
a
l
∂
o
u
t
h
2
∗
∂
o
u
t
h
2
∂
n
e
t
h
2
∗
∂
n
e
t
h
2
∂
w
3
=
(
∂
E
o
1
∂
o
u
t
h
2
+
∂
E
o
2
∂
o
u
t
h
2
)
∗
o
u
t
h
2
∗
(
1
−
o
u
t
h
2
)
∗
i
1
=
(
∂
E
o
1
∂
o
u
t
o
1
∗
∂
o
u
t
o
1
∂
n
e
t
o
1
∗
∂
n
e
t
o
1
∂
o
u
t
h
2
+
∂
E
o
2
∂
o
u
t
o
2
∗
∂
o
u
t
o
2
∂
n
e
t
o
2
∗
∂
n
e
t
o
2
∂
o
u
t
h
2
)
∗
o
u
t
h
2
∗
(
1
−
o
u
t
h
2
)
∗
i
1
=
(
(
o
u
t
o
1
−
t
a
r
g
e
t
o
1
)
∗
(
o
u
t
o
1
∗
(
1
−
o
u
t
o
1
)
)
∗
w
6
+
(
o
u
t
o
2
−
t
a
r
g
e
t
o
2
)
∗
(
o
u
t
o
2
∗
(
1
−
o
u
t
o
2
)
)
∗
w
6
)
∗
o
u
t
h
2
∗
(
1
−
o
u
t
h
2
)
∗
i
1
=
δ
h
1
∗
i
1
\begin{aligned} \dfrac{\partial E_{total}}{\partial w_{3}} &=\dfrac{\partial E_{total}}{\partial out_{h2}}*\dfrac{\partial out_{h2}}{\partial net_{h2}}*\dfrac{\partial net_{h2}}{\partial w_{3}}\\\\ &=(\dfrac{\partial E_{o1}}{\partial out_{h2}}+\dfrac{\partial E_{o2}}{\partial out_{h2}})*out_{h2}*(1-out_{h2})*i_{1}\\\\ &=(\dfrac{\partial E_{o1}}{\partial out_{o1}}*\dfrac{\partial out_{o1}} {\partial net_{o1}}*\dfrac{\partial net_{o1}}{\partial out_{h2}}+\dfrac{\partial E_{o2}}{\partial out_{o2}}*\dfrac{\partial out_{o2}}{\partial net_{o2}}*\dfrac{\partial net_{o2}}{\partial out_{h2}}) \\\\&*out_{h2}*(1-out_{h2})*i_{1}\\\\ &=((out_{o1}-target_{o1})*(out_{o1}*(1-out_{o1}))*w_{6}\\\\ &+(out_{o2}-target_{o2})*(out_{o2}*(1-out_{o2}))*w_{6})\\\\ &*out_{h2}*(1-out_{h2})*i_{1}\\\\ &=\delta_{h1}*i_{1} \end{aligned}
∂w3∂Etotal=∂outh2∂Etotal∗∂neth2∂outh2∗∂w3∂neth2=(∂outh2∂Eo1+∂outh2∂Eo2)∗outh2∗(1−outh2)∗i1=(∂outo1∂Eo1∗∂neto1∂outo1∗∂outh2∂neto1+∂outo2∂Eo2∗∂neto2∂outo2∗∂outh2∂neto2)∗outh2∗(1−outh2)∗i1=((outo1−targeto1)∗(outo1∗(1−outo1))∗w6+(outo2−targeto2)∗(outo2∗(1−outo2))∗w6)∗outh2∗(1−outh2)∗i1=δh1∗i1
最后更新
w
3
w_{3}
w3的值:
w
3
∗
=
w
3
−
η
∗
∂
E
t
o
t
a
l
∂
w
3
=
0.24856935
w_{3}^*=w_{3}-\eta*\dfrac{\partial E_{total}}{\partial w_{3}}=0.24856935
w3∗=w3−η∗∂w3∂Etotal=0.24856935
同样的,更新
w
1
、
w
2
、
w
4
w_{1}、w_{2}、w_{4}
w1、w2、w4的权值:
w
1
∗
=
0.14978072
w
2
∗
=
0.19956143
w
4
∗
=
0.29950229
\begin{aligned} w_{1}^*&=0.14978072\\\\ w_{2}^*&=0.19956143\\\\ w_{4}^*&=0.29950229\\\\ \end{aligned}
w1∗w2∗w4∗=0.14978072=0.19956143=0.29950229
这样误差反向传播法就完成了,然后把新的权值代入,在进行误差计算,到最后接近目标输出就算完成了。
#coding:utf-8
import random
import math
# 参数解释:
# "pd_" :偏导的前缀
# "d_" :导数的前缀
# "w_ho" :隐含层到输出层的权重系数索引
# "w_ih" :输入层到隐含层的权重系数的索引
class NeuralNetwork:
LEARNING_RATE = 0.5
def __init__(self, num_inputs, num_hidden, num_outputs, hidden_layer_weights = None, hidden_layer_bias = None, output_layer_weights = None, output_layer_bias = None):
self.num_inputs = num_inputs
self.hidden_layer = NeuronLayer(num_hidden, hidden_layer_bias)
self.output_layer = NeuronLayer(num_outputs, output_layer_bias)
self.init_weights_from_inputs_to_hidden_layer_neurons(hidden_layer_weights)
self.init_weights_from_hidden_layer_neurons_to_output_layer_neurons(output_layer_weights)
def init_weights_from_inputs_to_hidden_layer_neurons(self, hidden_layer_weights):
weight_num = 0
for h in range(len(self.hidden_layer.neurons)):
for i in range(self.num_inputs):
if not hidden_layer_weights:
self.hidden_layer.neurons[h].weights.append(random.random())
else:
self.hidden_layer.neurons[h].weights.append(hidden_layer_weights[weight_num])
weight_num += 1
def init_weights_from_hidden_layer_neurons_to_output_layer_neurons(self, output_layer_weights):
weight_num = 0
for o in range(len(self.output_layer.neurons)):
for h in range(len(self.hidden_layer.neurons)):
if not output_layer_weights:
self.output_layer.neurons[o].weights.append(random.random())
else:
self.output_layer.neurons[o].weights.append(output_layer_weights[weight_num])
weight_num += 1
def inspect(self):
print('------')
print('* Inputs: {}'.format(self.num_inputs))
print('------')
print('Hidden Layer')
self.hidden_layer.inspect()
print('------')
print('* Output Layer')
self.output_layer.inspect()
print('------')
def feed_forward(self, inputs):
hidden_layer_outputs = self.hidden_layer.feed_forward(inputs)
return self.output_layer.feed_forward(hidden_layer_outputs)
def train(self, training_inputs, training_outputs):
self.feed_forward(training_inputs)
# 1. 输出神经元的值
pd_errors_wrt_output_neuron_total_net_input = [0] * len(self.output_layer.neurons)
for o in range(len(self.output_layer.neurons)):
# ∂E/∂zⱼ
pd_errors_wrt_output_neuron_total_net_input[o] = self.output_layer.neurons[o].calculate_pd_error_wrt_total_net_input(training_outputs[o])
# 2. 隐含层神经元的值
pd_errors_wrt_hidden_neuron_total_net_input = [0] * len(self.hidden_layer.neurons)
for h in range(len(self.hidden_layer.neurons)):
# dE/dyⱼ = Σ ∂E/∂zⱼ * ∂z/∂yⱼ = Σ ∂E/∂zⱼ * wᵢⱼ
d_error_wrt_hidden_neuron_output = 0
for o in range(len(self.output_layer.neurons)):
d_error_wrt_hidden_neuron_output += pd_errors_wrt_output_neuron_total_net_input[o] * self.output_layer.neurons[o].weights[h]
# ∂E/∂zⱼ = dE/dyⱼ * ∂zⱼ/∂
pd_errors_wrt_hidden_neuron_total_net_input[h] = d_error_wrt_hidden_neuron_output * self.hidden_layer.neurons[h].calculate_pd_total_net_input_wrt_input()
# 3. 更新输出层权重系数
for o in range(len(self.output_layer.neurons)):
for w_ho in range(len(self.output_layer.neurons[o].weights)):
# ∂Eⱼ/∂wᵢⱼ = ∂E/∂zⱼ * ∂zⱼ/∂wᵢⱼ
pd_error_wrt_weight = pd_errors_wrt_output_neuron_total_net_input[o] * self.output_layer.neurons[o].calculate_pd_total_net_input_wrt_weight(w_ho)
# Δw = α * ∂Eⱼ/∂wᵢ
self.output_layer.neurons[o].weights[w_ho] -= self.LEARNING_RATE * pd_error_wrt_weight
# 4. 更新隐含层的权重系数
for h in range(len(self.hidden_layer.neurons)):
for w_ih in range(len(self.hidden_layer.neurons[h].weights)):
# ∂Eⱼ/∂wᵢ = ∂E/∂zⱼ * ∂zⱼ/∂wᵢ
pd_error_wrt_weight = pd_errors_wrt_hidden_neuron_total_net_input[h] * self.hidden_layer.neurons[h].calculate_pd_total_net_input_wrt_weight(w_ih)
# Δw = α * ∂Eⱼ/∂wᵢ
self.hidden_layer.neurons[h].weights[w_ih] -= self.LEARNING_RATE * pd_error_wrt_weight
def calculate_total_error(self, training_sets):
total_error = 0
for t in range(len(training_sets)):
training_inputs, training_outputs = training_sets[t]
self.feed_forward(training_inputs)
for o in range(len(training_outputs)):
total_error += self.output_layer.neurons[o].calculate_error(training_outputs[o])
return total_error
class NeuronLayer:
def __init__(self, num_neurons, bias):
# 同一层的神经元共享一个截距项b
self.bias = bias if bias else random.random()
self.neurons = []
for i in range(num_neurons):
self.neurons.append(Neuron(self.bias))
def inspect(self):
print('Neurons:', len(self.neurons))
for n in range(len(self.neurons)):
print(' Neuron', n)
for w in range(len(self.neurons[n].weights)):
print(' Weight:', self.neurons[n].weights[w])
print(' Bias:', self.bias)
def feed_forward(self, inputs):
outputs = []
for neuron in self.neurons:
outputs.append(neuron.calculate_output(inputs))
return outputs
def get_outputs(self):
outputs = []
for neuron in self.neurons:
outputs.append(neuron.output)
return outputs
class Neuron:
def __init__(self, bias):
self.bias = bias
self.weights = []
def calculate_output(self, inputs):
self.inputs = inputs
self.output = self.squash(self.calculate_total_net_input())
return self.output
def calculate_total_net_input(self):
total = 0
for i in range(len(self.inputs)):
total += self.inputs[i] * self.weights[i]
return total + self.bias
# 激活函数sigmoid
def squash(self, total_net_input):
return 1 / (1 + math.exp(-total_net_input))
def calculate_pd_error_wrt_total_net_input(self, target_output):
return self.calculate_pd_error_wrt_output(target_output) * self.calculate_pd_total_net_input_wrt_input();
# 每一个神经元的误差是由平方差公式计算的
def calculate_error(self, target_output):
return 0.5 * (target_output - self.output) ** 2
def calculate_pd_error_wrt_output(self, target_output):
return -(target_output - self.output)
def calculate_pd_total_net_input_wrt_input(self):
return self.output * (1 - self.output)
def calculate_pd_total_net_input_wrt_weight(self, index):
return self.inputs[index]
# 文中的例子:
nn = NeuralNetwork(2, 2, 2, hidden_layer_weights=[0.15, 0.2, 0.25, 0.3], hidden_layer_bias=0.35, output_layer_weights=[0.4, 0.45, 0.5, 0.55], output_layer_bias=0.6)
for i in range(10000):
nn.train([0.05, 0.1], [0.01, 0.09])
print(i, round(nn.calculate_total_error([[[0.05, 0.1], [0.01, 0.09]]]), 9))
#另外一个例子,可以把上面的例子注释掉再运行一下:
# training_sets = [
# [[0, 0], [0]],
# [[0, 1], [1]],
# [[1, 0], [1]],
# [[1, 1], [0]]
# ]
# nn = NeuralNetwork(len(training_sets[0][0]), 5, len(training_sets[0][1]))
# for i in range(10000):
# training_inputs, training_outputs = random.choice(training_sets)
# nn.train(training_inputs, training_outputs)
# print(i, nn.calculate_total_error(training_sets))
作者:Charlotte77
出处:http://www.cnblogs.com/charlotte77/