BP神经网络训练实例
1. BP神经网络
关于BP神经网络在我的上一篇博客《CV学习笔记-推理和训练》中已有介绍,在此不做赘述。本篇中涉及的一些关于BP神经网络的概念与基础知识均在《CV学习笔记-推理和训练》中,本篇仅推演实例的过程。
BP的算法基本思想:
- 将训练集数据输入到神经网络的输入层,经过隐藏层,最后达到输出层并输出结果,这就是前
向传播过程。 - 由于神经网络的输出结果与实际结果有误差,则计算估计值与实际值之间的误差,并将该误差
从输出层向隐藏层反向传播,直至传播到输入层; - 在反向传播的过程中,根据误差调整各种参数的值(相连神经元的权重),使得总损失函数减
小。 - 迭代上述三个步骤(即对数据进行反复训练),直到满足停止准则。
2. 训练实例
1. 实例设计
绿色节点为第一层输入层,每个节点代表一个神经元,其中 i 1 i_1 i1、 i 2 i_2 i2表示输入值, b 1 b_1 b1为偏置值,第二层包含 h 1 h_1 h1和 h 2 h_2 h2两个节点,为隐藏层, h 1 h_1 h1和 h 2 h_2 h2为神经元的输入值, b 2 b_2 b2为隐藏层的偏置值,第三层为输出层,包括 o 1 o_1 o1和 o 2 o_2 o2, w 1 w_1 w1~ w 8 w_8 w8为各层之间的权重,激活函数使用sigmoid函数,输入值为 [ i 1 = 0.05 , i 2 = 0.10 ] [i_1=0.05,i_2=0.10] [i1=0.05,i2=0.10],正确的输出值为 [ o 1 = 0.01 , o 2 = 0.99 ] [o_1=0.01,o_2=0.99] [o1=0.01,o2=0.99]。
sigmoid函数是一种激活函数,在笔者上一篇博文《CV学习笔记-推理和训练》中已有介绍,此处不再赘述。
2. 训练过程
1. 前向传播
输入层->隐藏层:
根据网络结构示意图,神经元
h
1
h_1
h1接收前一层
i
1
i_1
i1和
i
2
i_2
i2的加权求和结果作为输入,将此输入用
z
h
1
z_{h1}
zh1表示,则有
z
h
1
=
w
1
×
i
1
+
w
2
×
i
2
+
b
1
×
1
=
0.15
×
0.05
+
0.2
×
0.1
+
0.35
×
1
=
0.3775
\begin{aligned} z_{h1}&=w_1\times i_1+w_2\times i_2 +b_1\times 1\\&=0.15\times0.05+0.2\times0.1+0.35\times 1\\&=0.3775 \end{aligned}
zh1=w1×i1+w2×i2+b1×1=0.15×0.05+0.2×0.1+0.35×1=0.3775
由于激活函数为sigmoid函数,故而神经元
h
1
h_1
h1的输出
a
h
1
a_{h1}
ah1为
a
h
1
=
1
1
+
e
−
z
h
1
=
1
1
+
e
−
0.3775
=
0.593269992
a_{h1}=\frac{1}{1+e^{-z_{h1}}}=\frac{1}{1+e^{-0.3775}}=0.593269992
ah1=1+e−zh11=1+e−0.37751=0.593269992
同理可得,神经元
h
2
h_2
h2的输出
a
h
2
a_{h2}
ah2为
a
h
2
=
0.596884378
a_{h2}=0.596884378
ah2=0.596884378
隐藏层->输出层:
根据网络结构示意图,神经元
o
1
o_1
o1的输入
z
o
1
z_{o1}
zo1来源于前一层的
h
1
h_1
h1和
h
2
h_2
h2的加权求和结果,故
z
o
1
=
w
5
×
a
h
1
+
w
6
×
a
h
2
+
b
2
×
1
=
0.4
×
0.593269992
+
0.45
×
0.596884378
+
0.6
×
1
=
1.105905967
\begin{aligned} z_{o1}&=w_5\times a_{h1}+w_6\times a_{h2}+b_2\times1\\&=0.4\times 0.593269992+0.45\times 0.596884378+0.6\times 1\\&=1.105905967 \end{aligned}
zo1=w5×ah1+w6×ah2+b2×1=0.4×0.593269992+0.45×0.596884378+0.6×1=1.105905967
同理可以计算出
z
o
2
z_{o2}
zo2
由于网络使用sigmoid函数为激活函数,那么
o
1
o_1
o1的输出
a
o
1
a_{o1}
ao1为
a
o
1
=
1
1
+
e
−
z
o
1
=
1
1
+
e
−
1.105905967
=
0.751365069
\begin{aligned} a_{o1}&=\frac{1}{1+e^{-z_{o1}}}\\&=\frac{1}{1+e^{-1.105905967}}\\&=0.751365069 \end{aligned}
ao1=1+e−zo11=1+e−1.1059059671=0.751365069
同理可以计算出
a
o
2
=
0.772928465
a_{o2}=0.772928465
ao2=0.772928465
至此,一个完整的前向传播过程结束输出值为 [ 0.751365069 , 0.772928465 ] [0.751365069,0.772928465] [0.751365069,0.772928465],与实际值 [ 0.01 , 0.99 ] [0.01,0.99] [0.01,0.99]误差还比较大,需要对误差进行反向传播,更新权值后重新计算。
2. 反向传播
计算损失函数:
传递误差需要经过损失函数的处理,来估计出合适的传递值进行反向传播并合理的更新权值。
E
t
o
t
a
l
=
∑
1
2
(
t
a
r
g
e
t
−
o
u
t
p
u
t
)
2
E
o
1
=
1
2
(
0.01
−
0.751365069
)
2
=
0.274811083
E
o
2
=
1
2
(
0.99
−
0.772928465
)
2
=
0.023560026
E
t
o
t
a
l
=
E
o
1
+
E
o
2
=
0.298371109
E_{total}=\sum\frac{1}{2}(target-output)^2\\ E_{o1}=\frac{1}{2}(0.01-0.751365069)^2=0.274811083\\ E_{o2}=\frac{1}{2}(0.99-0.772928465)^2=0.023560026\\ E_{total}=E_{o1}+E_{o2}=0.298371109
Etotal=∑21(target−output)2Eo1=21(0.01−0.751365069)2=0.274811083Eo2=21(0.99−0.772928465)2=0.023560026Etotal=Eo1+Eo2=0.298371109
隐藏层->输出层的权值更新:
以权重参数
w
5
w_5
w5为例,用整体损失对
w
5
w_5
w5求偏导后即可得到
w
5
w_5
w5对于整体损失的贡献,即
∂
E
t
o
t
a
l
∂
w
5
=
∂
E
t
o
t
a
l
∂
a
o
1
×
∂
a
o
1
∂
z
o
1
×
∂
z
o
1
∂
w
5
\frac{\partial E_{total}}{\partial w_5}=\frac{\partial E_{total}}{\partial a_{o1}}\times\frac{\partial a_{o1}}{\partial z_{o1}}\times\frac{\partial z_{o1}}{\partial w_5}
∂w5∂Etotal=∂ao1∂Etotal×∂zo1∂ao1×∂w5∂zo1
∂
E
t
o
t
a
l
∂
E
a
o
1
\frac{\partial E_{total}}{\partial E_{a_{o1}}}
∂Eao1∂Etotal:由于总体损失是由两个输出(
a
o
1
a_{o1}
ao1和
a
o
2
a_o2
ao2)计算得来,故总体损失可以对
a
o
1
、
a
o
2
a_{o1}、a_{o2}
ao1、ao2求偏导。
∂ a o 1 ∂ z o 1 \frac{\partial a_{o1}}{\partial z_{o1}} ∂zo1∂ao1:由于输出 a o 1 a_{o1} ao1是输入 z o 1 z_{o1} zo1通过sigmoid函数激活得来,故 a o 1 a_{o1} ao1可以对 z o 1 z_{o1} zo1求偏导。
∂ z o 1 ∂ w 5 \frac{\partial z_{o1}}{\partial w_5} ∂w5∂zo1:由于 z o 1 z_{o1} zo1是由前一层网络的 h 1 h_1 h1的输出根据权值 w 5 w_5 w5加权求和得来,故 z o 1 z_{o1} zo1可以对 w 5 w_5 w5求偏导。
要捋清上述公式的关系, w 5 w_5 w5贡献了 z o 1 z_{o1} zo1, z o 1 z_{o1} zo1贡献了 a o 1 a_{o1} ao1,而 a o 1 a_{o1} ao1又贡献了 E t o t a l E_{total} Etotal,所有层级关系均为唯一分支,故直接拆成上面公式的求法,而这种层级关系后面章节中所描述的隐藏层->隐藏层的权值更新过程中就会复杂一点。
经过上述推导,可以计算得:
∂
E
t
o
t
a
l
∂
a
o
1
\frac{\partial E_{total}}{\partial a_{o1}}
∂ao1∂Etotal :
E
t
o
t
a
l
=
1
2
(
t
a
r
g
e
t
o
1
−
a
o
1
)
2
+
1
2
(
t
a
r
g
e
t
o
2
−
a
o
2
)
2
∂
E
t
o
t
a
l
∂
a
o
1
=
2
×
1
2
(
t
a
r
g
e
t
o
1
−
a
o
1
)
×
(
−
1
)
=
−
(
t
a
r
g
e
t
o
1
−
a
o
1
)
=
0.751365069
−
0.01
=
0.741365069
\begin{aligned} E_{total}&=\frac{1}{2}(target_{o1}-a_{o1})^2+\frac{1}{2}(target_{o2}-a_{o2})^2\\ \frac{\partial E_{total}}{\partial a_{o1}}&=2\times\frac{1}{2}(target_{o1}-a_{o1})\times(-1)\\&=-(target_{o1}-a_{o1})\\&=0.751365069-0.01\\&=0.741365069 \end{aligned}
Etotal∂ao1∂Etotal=21(targeto1−ao1)2+21(targeto2−ao2)2=2×21(targeto1−ao1)×(−1)=−(targeto1−ao1)=0.751365069−0.01=0.741365069
∂
a
o
1
∂
z
o
1
\frac{\partial a_{o1}}{\partial z_{o1}}
∂zo1∂ao1:
a
o
1
=
1
1
+
e
−
z
o
1
∂
a
o
1
∂
z
o
1
=
a
o
1
×
(
1
−
a
o
1
)
=
0.751365069
×
(
1
−
0.751365069
)
=
0.186815602
\begin{aligned} a_{o1}&=\frac{1}{1+e^{-z_{o1}}}\\ \frac{\partial a_{o1}}{\partial z_{o1}}&=a_{o1}\times(1-a_{o1})\\&=0.751365069\times(1-0.751365069)\\&=0.186815602 \end{aligned}
ao1∂zo1∂ao1=1+e−zo11=ao1×(1−ao1)=0.751365069×(1−0.751365069)=0.186815602
∂
z
o
1
∂
w
5
\frac{\partial z_{o1}}{\partial w_5}
∂w5∂zo1:
z
o
1
=
w
5
×
a
h
1
+
w
6
×
a
h
2
+
b
2
×
1
∂
z
o
1
∂
w
5
=
a
h
1
=
0.593269992
\begin{aligned} z_{o1}&=w_5\times a_{h1}+w_6\times a_{h2}+b_2\times1\\ \frac{\partial z_{o1}}{\partial w_5}&=a_{h1}\\&=0.593269992 \end{aligned}
zo1∂w5∂zo1=w5×ah1+w6×ah2+b2×1=ah1=0.593269992
由上述的三个结果,可得:
∂
E
t
o
t
a
l
∂
w
5
=
0.741365069
×
0.186815602
×
0.593269992
=
0.082167041
\frac{\partial E_{total}}{\partial w_5}=0.741365069\times0.186815602\times0.593269992=0.082167041
∂w5∂Etotal=0.741365069×0.186815602×0.593269992=0.082167041
如果我们将上述的步骤去除具体数值,抽象出来
则得到
∂
E
t
o
t
a
l
∂
w
5
=
−
(
t
a
r
g
e
t
o
1
−
a
o
1
)
×
a
o
1
×
(
1
−
a
o
1
)
×
a
h
1
∂
E
∂
w
j
k
=
−
(
t
k
−
o
k
)
⋅
s
i
g
m
o
i
d
(
∑
j
w
j
k
⋅
o
j
)
(
I
−
s
i
g
m
o
i
d
(
∑
j
w
j
k
⋅
o
j
)
)
⋅
o
j
\frac{\partial E_{total}}{\partial w_5}=-(target_{o1}-a_{o1})\times a_{o1}\times(1-a_{o1})\times a_{h1}\\ \frac{\partial E}{\partial w_{jk}}=-(t_k-o_k)\cdot sigmoid(\sum_jw_{jk}\cdot o_j)(I-sigmoid(\sum_jw_{jk}\cdot o_j))\cdot o_j
∂w5∂Etotal=−(targeto1−ao1)×ao1×(1−ao1)×ah1∂wjk∂E=−(tk−ok)⋅sigmoid(j∑wjk⋅oj)(I−sigmoid(j∑wjk⋅oj))⋅oj
第二行的公式在笔者的上一篇博客中提到过,现作了推导。
为了表达的方便,用
δ
o
1
\delta_{o1}
δo1来表示输出层的误差:
δ
o
1
=
∂
E
t
o
t
a
l
∂
a
o
1
×
∂
a
o
1
∂
z
o
1
=
∂
E
t
o
t
a
l
∂
z
o
1
δ
o
1
=
−
(
t
a
r
g
e
t
o
1
−
a
o
1
)
×
a
o
1
×
(
1
−
a
o
1
)
\delta_{o1}=\frac{\partial E_{total}}{\partial a_{o1}}\times\frac{\partial a_{o1}}{\partial z_{o1}}=\frac{\partial E_{total}}{\partial z_{o1}}\\ \delta_{o1}=-(target_{o1}-a_{o1})\times a_{o1}\times(1-a_{o1})
δo1=∂ao1∂Etotal×∂zo1∂ao1=∂zo1∂Etotalδo1=−(targeto1−ao1)×ao1×(1−ao1)
因此整体损失对于
w
5
w_5
w5的偏导值可以简化的表示为
∂
E
t
o
t
a
l
∂
w
5
=
δ
o
1
×
a
h
1
\frac{\partial E_{total}}{\partial w_5}=\delta_{o1}\times a_{h1}
∂w5∂Etotal=δo1×ah1
则
w
5
w_5
w5的权值更新为:
w
5
+
=
w
5
−
η
×
∂
E
t
o
t
a
l
∂
w
5
=
0.4
−
0.5
×
0.082167041
=
0.35891648
\begin{aligned} w_5^+&=w_5-\eta\times\frac{\partial E_{total}}{\partial w_5}\\&=0.4-0.5\times0.082167041\\&=0.35891648 \end{aligned}
w5+=w5−η×∂w5∂Etotal=0.4−0.5×0.082167041=0.35891648
η \eta η为学习率,在笔者的上一篇博文《CV学习笔记-推理和训练》中介绍过,不再赘述。
同理,可更新
w
6
,
w
7
,
w
8
w_6,w_7,w_8
w6,w7,w8:
w
6
+
=
0.408666186
w
7
+
=
0.511301270
w
8
+
=
0.561370121
w_6^+=0.408666186\\ w_7^+=0.511301270\\ w_8^+=0.561370121
w6+=0.408666186w7+=0.511301270w8+=0.561370121
隐藏层->隐藏层的权值更新:
其思想大致相同,但不同的是
h
1
h_1
h1的输出
a
h
1
a_{h1}
ah1对
E
o
1
、
E
o
2
E_{o1}、E_{o2}
Eo1、Eo2都有贡献,故损失总体对
a
h
1
a_{h1}
ah1求偏导时,根据全微分的准则,要分成对
E
o
1
、
E
o
2
E_{o1}、E_{o2}
Eo1、Eo2对
a
h
1
a_{h1}
ah1的偏导,即
∂
E
t
o
t
a
l
∂
w
1
=
∂
E
t
o
t
a
l
∂
a
h
1
×
∂
a
h
1
∂
z
h
1
×
∂
z
h
1
∂
w
1
其中:
∂
E
t
o
t
a
l
∂
a
h
1
=
∂
E
o
1
∂
a
h
1
+
∂
E
o
2
∂
a
h
1
\frac{\partial E_{total}}{\partial w_1}=\frac{\partial E_{total}}{\partial a_{h1}}\times\frac{\partial a_{h1}}{\partial z_{h1}}\times\frac{\partial z_{h1}}{\partial w_1}\\ 其中:\frac{\partial E_{total}}{\partial a_{h1}}=\frac{\partial E_{o1}}{\partial a_{h1}}+\frac{\partial E_{o2}}{\partial a_{h1}}
∂w1∂Etotal=∂ah1∂Etotal×∂zh1∂ah1×∂w1∂zh1其中:∂ah1∂Etotal=∂ah1∂Eo1+∂ah1∂Eo2
由上述推导,计算得:
∂
E
t
o
t
a
l
∂
a
h
1
\frac{\partial E_{total}}{\partial a_{h1}}
∂ah1∂Etotal:
∂
E
t
o
t
a
l
∂
a
h
1
=
∂
E
o
1
∂
a
h
1
+
∂
E
o
2
∂
a
h
1
\frac{\partial E_{total}}{\partial a_{h1}}=\frac{\partial E_{o1}}{\partial a_{h1}}+\frac{\partial E_{o2}}{\partial a_{h1}}
∂ah1∂Etotal=∂ah1∂Eo1+∂ah1∂Eo2
∂
E
o
1
∂
a
h
1
\frac{\partial E_{o1}}{\partial a_{h1}}
∂ah1∂Eo1:
∂
E
o
1
∂
a
h
1
=
∂
E
o
1
∂
a
o
1
×
∂
a
o
1
∂
z
o
1
×
∂
z
o
1
∂
a
h
1
=
0.741365069
×
0.186815602
×
0.4
=
0.055399425
\begin{aligned} \frac{\partial E_{o1}}{\partial a_{h1}}&=\frac{\partial E_{o1}}{\partial a_{o1}}\times\frac{\partial a_{o1}}{\partial z_{o1}}\times\frac{\partial z_{o1}}{\partial a_{h1}}\\&=0.741365069\times0.186815602\times0.4\\&=0.055399425 \end{aligned}
∂ah1∂Eo1=∂ao1∂Eo1×∂zo1∂ao1×∂ah1∂zo1=0.741365069×0.186815602×0.4=0.055399425
同理可得:
∂
E
o
2
∂
a
h
1
=
−
0.019049119
\frac{\partial E_{o2}}{\partial a_{h1}}=-0.019049119
∂ah1∂Eo2=−0.019049119
两者相加得:
∂
E
t
o
t
a
l
∂
a
h
1
=
∂
E
o
1
∂
a
h
1
+
∂
E
o
2
∂
a
h
1
=
0.055399435
−
0.019049119
=
0.036350306
\begin{aligned} \frac{\partial E_{total}}{\partial a_{h1}}&=\frac{\partial E_{o1}}{\partial a_{h1}}+\frac{\partial E_{o2}}{\partial a_{h1}}\\&=0.055399435-0.019049119\\&=0.036350306 \end{aligned}
∂ah1∂Etotal=∂ah1∂Eo1+∂ah1∂Eo2=0.055399435−0.019049119=0.036350306
∂
a
h
1
∂
z
h
1
\frac{\partial a_{h1}}{\partial z_{h1}}
∂zh1∂ah1:
∂
a
h
1
∂
z
h
1
=
a
h
1
×
(
1
−
a
h
1
)
=
0.593269992
×
(
1
−
0.593269992
)
=
0.2413007086
\begin{aligned} \frac{\partial a_{h1}}{\partial z_{h1}}&=a_{h1}\times(1-a_{h1})\\&=0.593269992\times(1-0.593269992)\\&=0.2413007086 \end{aligned}
∂zh1∂ah1=ah1×(1−ah1)=0.593269992×(1−0.593269992)=0.2413007086
∂
z
h
1
∂
w
1
\frac{\partial z_{h1}}{\partial w_1}
∂w1∂zh1:
∂
z
h
1
∂
w
1
=
i
1
=
0.05
\frac{\partial z_{h1}}{\partial w_1}=i_1=0.05
∂w1∂zh1=i1=0.05
最终结果:
∂
E
t
o
t
a
l
∂
w
1
=
0.036350306
×
0.2413007086
×
0.05
=
0.000438568
\frac{\partial E_{total}}{\partial w_1}=0.036350306\times0.2413007086\times0.05=0.000438568
∂w1∂Etotal=0.036350306×0.2413007086×0.05=0.000438568
同上节的简化方法,用
δ
h
1
\delta_{h1}
δh1表示隐藏层单元
h
1
h_1
h1的误差:
∂
E
t
o
t
a
l
∂
w
1
=
(
∑
i
∂
E
t
o
t
a
l
∂
a
i
×
∂
a
i
∂
z
i
×
∂
z
i
∂
a
h
1
)
×
∂
a
h
1
∂
z
h
1
×
∂
z
h
1
∂
w
1
=
(
∑
i
δ
i
×
w
h
i
)
×
a
h
1
×
(
1
−
a
h
1
)
×
i
1
=
δ
h
1
×
i
1
\begin{aligned} \frac{\partial E_{total}}{\partial w_1}&=(\sum_i\frac{\partial E_{total}}{\partial a_{i}}\times\frac{\partial a_{i}}{\partial z_{i}}\times\frac{\partial z_{i}}{\partial a_{h1}})\times\frac{\partial a_{h1}}{\partial z_{h1}}\times\frac{\partial z_{h1}}{\partial w_1}\\&=(\sum_i\delta_i\times w_{hi})\times a_{h1}\times(1-a_{h1})\times i_1\\&=\delta_{h_1}\times i_1 \end{aligned}
∂w1∂Etotal=(i∑∂ai∂Etotal×∂zi∂ai×∂ah1∂zi)×∂zh1∂ah1×∂w1∂zh1=(i∑δi×whi)×ah1×(1−ah1)×i1=δh1×i1
w
1
w_1
w1的权值更新为:
w
1
+
=
w
1
−
η
×
∂
E
t
o
t
a
l
∂
w
1
=
0.15
−
0.5
×
0.000438568
=
0.149780716
w_1^+=w_1-\eta\times\frac{\partial E_{total}}{\partial w_1}=0.15-0.5\times0.000438568=0.149780716
w1+=w1−η×∂w1∂Etotal=0.15−0.5×0.000438568=0.149780716
同理,更新
w
2
,
w
3
,
w
4
w_2,w_3,w_4
w2,w3,w4:
w
2
+
=
0.19956143
w
3
+
=
0.24975114
w
4
+
=
0.29950229
w_2^+=0.19956143\\ w_3^+=0.24975114\\ w_4^+=0.29950229
w2+=0.19956143w3+=0.24975114w4+=0.29950229
至此,一次反向传播的过程结束。
训练过程就是这样反复迭代,正向传播后得误差,在反向传播更新权值,再正向传播,这样反复进行,本例再第一次迭代后总误差从0.298371109下降到了0.291027924,在迭代10000次后,总误差降至0.000035085。输出为[0.015912196,0.984065734]
个人学习笔记,仅学习交流,转载请注明出处!