CV学习笔记-BP神经网络训练实例(含详细计算过程与公式推导)

BP神经网络训练实例

1. BP神经网络

关于BP神经网络在我的上一篇博客《CV学习笔记-推理和训练》中已有介绍,在此不做赘述。本篇中涉及的一些关于BP神经网络的概念与基础知识均在《CV学习笔记-推理和训练》中,本篇仅推演实例的过程。

BP的算法基本思想:

  • 将训练集数据输入到神经网络的输入层,经过隐藏层,最后达到输出层并输出结果,这就是前
    向传播过程。
  • 由于神经网络的输出结果与实际结果有误差,则计算估计值与实际值之间的误差,并将该误差
    从输出层向隐藏层反向传播,直至传播到输入层;
  • 在反向传播的过程中,根据误差调整各种参数的值(相连神经元的权重),使得总损失函数减
    小。
  • 迭代上述三个步骤(即对数据进行反复训练),直到满足停止准则。

2. 训练实例

1. 实例设计

绿色节点为第一层输入层,每个节点代表一个神经元,其中 i 1 i_1 i1 i 2 i_2 i2表示输入值, b 1 b_1 b1为偏置值,第二层包含 h 1 h_1 h1 h 2 h_2 h2两个节点,为隐藏层, h 1 h_1 h1 h 2 h_2 h2为神经元的输入值, b 2 b_2 b2为隐藏层的偏置值,第三层为输出层,包括 o 1 o_1 o1 o 2 o_2 o2 w 1 w_1 w1~ w 8 w_8 w8为各层之间的权重,激活函数使用sigmoid函数,输入值为 [ i 1 = 0.05 , i 2 = 0.10 ] [i_1=0.05,i_2=0.10] [i1=0.05,i2=0.10],正确的输出值为 [ o 1 = 0.01 , o 2 = 0.99 ] [o_1=0.01,o_2=0.99] [o1=0.01,o2=0.99]

sigmoid函数是一种激活函数,在笔者上一篇博文《CV学习笔记-推理和训练》中已有介绍,此处不再赘述。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IZfBuEQM-1639470630420)(./imgs/image-20211214120357087.png)]

2. 训练过程

1. 前向传播

输入层->隐藏层:

根据网络结构示意图,神经元 h 1 h_1 h1接收前一层 i 1 i_1 i1 i 2 i_2 i2的加权求和结果作为输入,将此输入用 z h 1 z_{h1} zh1表示,则有
z h 1 = w 1 × i 1 + w 2 × i 2 + b 1 × 1 = 0.15 × 0.05 + 0.2 × 0.1 + 0.35 × 1 = 0.3775 \begin{aligned} z_{h1}&=w_1\times i_1+w_2\times i_2 +b_1\times 1\\&=0.15\times0.05+0.2\times0.1+0.35\times 1\\&=0.3775 \end{aligned} zh1=w1×i1+w2×i2+b1×1=0.15×0.05+0.2×0.1+0.35×1=0.3775
由于激活函数为sigmoid函数,故而神经元 h 1 h_1 h1的输出 a h 1 a_{h1} ah1
a h 1 = 1 1 + e − z h 1 = 1 1 + e − 0.3775 = 0.593269992 a_{h1}=\frac{1}{1+e^{-z_{h1}}}=\frac{1}{1+e^{-0.3775}}=0.593269992 ah1=1+ezh11=1+e0.37751=0.593269992
同理可得,神经元 h 2 h_2 h2的输出 a h 2 a_{h2} ah2
a h 2 = 0.596884378 a_{h2}=0.596884378 ah2=0.596884378
隐藏层->输出层:

根据网络结构示意图,神经元 o 1 o_1 o1的输入 z o 1 z_{o1} zo1来源于前一层的 h 1 h_1 h1 h 2 h_2 h2的加权求和结果,故
z o 1 = w 5 × a h 1 + w 6 × a h 2 + b 2 × 1 = 0.4 × 0.593269992 + 0.45 × 0.596884378 + 0.6 × 1 = 1.105905967 \begin{aligned} z_{o1}&=w_5\times a_{h1}+w_6\times a_{h2}+b_2\times1\\&=0.4\times 0.593269992+0.45\times 0.596884378+0.6\times 1\\&=1.105905967 \end{aligned} zo1=w5×ah1+w6×ah2+b2×1=0.4×0.593269992+0.45×0.596884378+0.6×1=1.105905967
同理可以计算出 z o 2 z_{o2} zo2

由于网络使用sigmoid函数为激活函数,那么 o 1 o_1 o1的输出 a o 1 a_{o1} ao1
a o 1 = 1 1 + e − z o 1 = 1 1 + e − 1.105905967 = 0.751365069 \begin{aligned} a_{o1}&=\frac{1}{1+e^{-z_{o1}}}\\&=\frac{1}{1+e^{-1.105905967}}\\&=0.751365069 \end{aligned} ao1=1+ezo11=1+e1.1059059671=0.751365069
同理可以计算出 a o 2 = 0.772928465 a_{o2}=0.772928465 ao2=0.772928465

至此,一个完整的前向传播过程结束输出值为 [ 0.751365069 , 0.772928465 ] [0.751365069,0.772928465] [0.751365069,0.772928465],与实际值 [ 0.01 , 0.99 ] [0.01,0.99] [0.01,0.99]误差还比较大,需要对误差进行反向传播,更新权值后重新计算。

2. 反向传播

计算损失函数:

传递误差需要经过损失函数的处理,来估计出合适的传递值进行反向传播并合理的更新权值。
E t o t a l = ∑ 1 2 ( t a r g e t − o u t p u t ) 2 E o 1 = 1 2 ( 0.01 − 0.751365069 ) 2 = 0.274811083 E o 2 = 1 2 ( 0.99 − 0.772928465 ) 2 = 0.023560026 E t o t a l = E o 1 + E o 2 = 0.298371109 E_{total}=\sum\frac{1}{2}(target-output)^2\\ E_{o1}=\frac{1}{2}(0.01-0.751365069)^2=0.274811083\\ E_{o2}=\frac{1}{2}(0.99-0.772928465)^2=0.023560026\\ E_{total}=E_{o1}+E_{o2}=0.298371109 Etotal=21(targetoutput)2Eo1=21(0.010.751365069)2=0.274811083Eo2=21(0.990.772928465)2=0.023560026Etotal=Eo1+Eo2=0.298371109
隐藏层->输出层的权值更新:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-65e1egtQ-1639470630421)(./imgs/image-20211214140502737.png)]

以权重参数 w 5 w_5 w5为例,用整体损失对 w 5 w_5 w5求偏导后即可得到 w 5 w_5 w5对于整体损失的贡献,即
∂ E t o t a l ∂ w 5 = ∂ E t o t a l ∂ a o 1 × ∂ a o 1 ∂ z o 1 × ∂ z o 1 ∂ w 5 \frac{\partial E_{total}}{\partial w_5}=\frac{\partial E_{total}}{\partial a_{o1}}\times\frac{\partial a_{o1}}{\partial z_{o1}}\times\frac{\partial z_{o1}}{\partial w_5} w5Etotal=ao1Etotal×zo1ao1×w5zo1
∂ E t o t a l ∂ E a o 1 \frac{\partial E_{total}}{\partial E_{a_{o1}}} Eao1Etotal:由于总体损失是由两个输出( a o 1 a_{o1} ao1 a o 2 a_o2 ao2)计算得来,故总体损失可以对 a o 1 、 a o 2 a_{o1}、a_{o2} ao1ao2求偏导。

∂ a o 1 ∂ z o 1 \frac{\partial a_{o1}}{\partial z_{o1}} zo1ao1:由于输出 a o 1 a_{o1} ao1是输入 z o 1 z_{o1} zo1通过sigmoid函数激活得来,故 a o 1 a_{o1} ao1可以对 z o 1 z_{o1} zo1求偏导。

∂ z o 1 ∂ w 5 \frac{\partial z_{o1}}{\partial w_5} w5zo1:由于 z o 1 z_{o1} zo1是由前一层网络的 h 1 h_1 h1的输出根据权值 w 5 w_5 w5加权求和得来,故 z o 1 z_{o1} zo1可以对 w 5 w_5 w5求偏导。

要捋清上述公式的关系, w 5 w_5 w5贡献了 z o 1 z_{o1} zo1 z o 1 z_{o1} zo1贡献了 a o 1 a_{o1} ao1,而 a o 1 a_{o1} ao1又贡献了 E t o t a l E_{total} Etotal,所有层级关系均为唯一分支,故直接拆成上面公式的求法,而这种层级关系后面章节中所描述的隐藏层->隐藏层的权值更新过程中就会复杂一点。

经过上述推导,可以计算得:

∂ E t o t a l ∂ a o 1 \frac{\partial E_{total}}{\partial a_{o1}} ao1Etotal :
E t o t a l = 1 2 ( t a r g e t o 1 − a o 1 ) 2 + 1 2 ( t a r g e t o 2 − a o 2 ) 2 ∂ E t o t a l ∂ a o 1 = 2 × 1 2 ( t a r g e t o 1 − a o 1 ) × ( − 1 ) = − ( t a r g e t o 1 − a o 1 ) = 0.751365069 − 0.01 = 0.741365069 \begin{aligned} E_{total}&=\frac{1}{2}(target_{o1}-a_{o1})^2+\frac{1}{2}(target_{o2}-a_{o2})^2\\ \frac{\partial E_{total}}{\partial a_{o1}}&=2\times\frac{1}{2}(target_{o1}-a_{o1})\times(-1)\\&=-(target_{o1}-a_{o1})\\&=0.751365069-0.01\\&=0.741365069 \end{aligned} Etotalao1Etotal=21(targeto1ao1)2+21(targeto2ao2)2=2×21(targeto1ao1)×(1)=(targeto1ao1)=0.7513650690.01=0.741365069
∂ a o 1 ∂ z o 1 \frac{\partial a_{o1}}{\partial z_{o1}} zo1ao1
a o 1 = 1 1 + e − z o 1 ∂ a o 1 ∂ z o 1 = a o 1 × ( 1 − a o 1 ) = 0.751365069 × ( 1 − 0.751365069 ) = 0.186815602 \begin{aligned} a_{o1}&=\frac{1}{1+e^{-z_{o1}}}\\ \frac{\partial a_{o1}}{\partial z_{o1}}&=a_{o1}\times(1-a_{o1})\\&=0.751365069\times(1-0.751365069)\\&=0.186815602 \end{aligned} ao1zo1ao1=1+ezo11=ao1×(1ao1)=0.751365069×(10.751365069)=0.186815602
∂ z o 1 ∂ w 5 \frac{\partial z_{o1}}{\partial w_5} w5zo1
z o 1 = w 5 × a h 1 + w 6 × a h 2 + b 2 × 1 ∂ z o 1 ∂ w 5 = a h 1 = 0.593269992 \begin{aligned} z_{o1}&=w_5\times a_{h1}+w_6\times a_{h2}+b_2\times1\\ \frac{\partial z_{o1}}{\partial w_5}&=a_{h1}\\&=0.593269992 \end{aligned} zo1w5zo1=w5×ah1+w6×ah2+b2×1=ah1=0.593269992
由上述的三个结果,可得:
∂ E t o t a l ∂ w 5 = 0.741365069 × 0.186815602 × 0.593269992 = 0.082167041 \frac{\partial E_{total}}{\partial w_5}=0.741365069\times0.186815602\times0.593269992=0.082167041 w5Etotal=0.741365069×0.186815602×0.593269992=0.082167041
如果我们将上述的步骤去除具体数值,抽象出来

则得到
∂ E t o t a l ∂ w 5 = − ( t a r g e t o 1 − a o 1 ) × a o 1 × ( 1 − a o 1 ) × a h 1 ∂ E ∂ w j k = − ( t k − o k ) ⋅ s i g m o i d ( ∑ j w j k ⋅ o j ) ( I − s i g m o i d ( ∑ j w j k ⋅ o j ) ) ⋅ o j \frac{\partial E_{total}}{\partial w_5}=-(target_{o1}-a_{o1})\times a_{o1}\times(1-a_{o1})\times a_{h1}\\ \frac{\partial E}{\partial w_{jk}}=-(t_k-o_k)\cdot sigmoid(\sum_jw_{jk}\cdot o_j)(I-sigmoid(\sum_jw_{jk}\cdot o_j))\cdot o_j w5Etotal=(targeto1ao1)×ao1×(1ao1)×ah1wjkE=(tkok)sigmoid(jwjkoj)(Isigmoid(jwjkoj))oj

第二行的公式在笔者的上一篇博客中提到过,现作了推导。

为了表达的方便,用 δ o 1 \delta_{o1} δo1来表示输出层的误差:
δ o 1 = ∂ E t o t a l ∂ a o 1 × ∂ a o 1 ∂ z o 1 = ∂ E t o t a l ∂ z o 1 δ o 1 = − ( t a r g e t o 1 − a o 1 ) × a o 1 × ( 1 − a o 1 ) \delta_{o1}=\frac{\partial E_{total}}{\partial a_{o1}}\times\frac{\partial a_{o1}}{\partial z_{o1}}=\frac{\partial E_{total}}{\partial z_{o1}}\\ \delta_{o1}=-(target_{o1}-a_{o1})\times a_{o1}\times(1-a_{o1}) δo1=ao1Etotal×zo1ao1=zo1Etotalδo1=(targeto1ao1)×ao1×(1ao1)
因此整体损失对于 w 5 w_5 w5的偏导值可以简化的表示为
∂ E t o t a l ∂ w 5 = δ o 1 × a h 1 \frac{\partial E_{total}}{\partial w_5}=\delta_{o1}\times a_{h1} w5Etotal=δo1×ah1
w 5 w_5 w5的权值更新为:
w 5 + = w 5 − η × ∂ E t o t a l ∂ w 5 = 0.4 − 0.5 × 0.082167041 = 0.35891648 \begin{aligned} w_5^+&=w_5-\eta\times\frac{\partial E_{total}}{\partial w_5}\\&=0.4-0.5\times0.082167041\\&=0.35891648 \end{aligned} w5+=w5η×w5Etotal=0.40.5×0.082167041=0.35891648

η \eta η为学习率,在笔者的上一篇博文《CV学习笔记-推理和训练》中介绍过,不再赘述。

同理,可更新 w 6 , w 7 , w 8 w_6,w_7,w_8 w6,w7,w8
w 6 + = 0.408666186 w 7 + = 0.511301270 w 8 + = 0.561370121 w_6^+=0.408666186\\ w_7^+=0.511301270\\ w_8^+=0.561370121 w6+=0.408666186w7+=0.511301270w8+=0.561370121
隐藏层->隐藏层的权值更新:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TZLCxQ3x-1639470630422)(./imgs/image-20211214151121680.png)]

其思想大致相同,但不同的是 h 1 h_1 h1的输出 a h 1 a_{h1} ah1 E o 1 、 E o 2 E_{o1}、E_{o2} Eo1Eo2都有贡献,故损失总体对 a h 1 a_{h1} ah1求偏导时,根据全微分的准则,要分成对 E o 1 、 E o 2 E_{o1}、E_{o2} Eo1Eo2 a h 1 a_{h1} ah1的偏导,即
∂ E t o t a l ∂ w 1 = ∂ E t o t a l ∂ a h 1 × ∂ a h 1 ∂ z h 1 × ∂ z h 1 ∂ w 1 其中: ∂ E t o t a l ∂ a h 1 = ∂ E o 1 ∂ a h 1 + ∂ E o 2 ∂ a h 1 \frac{\partial E_{total}}{\partial w_1}=\frac{\partial E_{total}}{\partial a_{h1}}\times\frac{\partial a_{h1}}{\partial z_{h1}}\times\frac{\partial z_{h1}}{\partial w_1}\\ 其中:\frac{\partial E_{total}}{\partial a_{h1}}=\frac{\partial E_{o1}}{\partial a_{h1}}+\frac{\partial E_{o2}}{\partial a_{h1}} w1Etotal=ah1Etotal×zh1ah1×w1zh1其中:ah1Etotal=ah1Eo1+ah1Eo2

由上述推导,计算得:

∂ E t o t a l ∂ a h 1 \frac{\partial E_{total}}{\partial a_{h1}} ah1Etotal
∂ E t o t a l ∂ a h 1 = ∂ E o 1 ∂ a h 1 + ∂ E o 2 ∂ a h 1 \frac{\partial E_{total}}{\partial a_{h1}}=\frac{\partial E_{o1}}{\partial a_{h1}}+\frac{\partial E_{o2}}{\partial a_{h1}} ah1Etotal=ah1Eo1+ah1Eo2
∂ E o 1 ∂ a h 1 \frac{\partial E_{o1}}{\partial a_{h1}} ah1Eo1
∂ E o 1 ∂ a h 1 = ∂ E o 1 ∂ a o 1 × ∂ a o 1 ∂ z o 1 × ∂ z o 1 ∂ a h 1 = 0.741365069 × 0.186815602 × 0.4 = 0.055399425 \begin{aligned} \frac{\partial E_{o1}}{\partial a_{h1}}&=\frac{\partial E_{o1}}{\partial a_{o1}}\times\frac{\partial a_{o1}}{\partial z_{o1}}\times\frac{\partial z_{o1}}{\partial a_{h1}}\\&=0.741365069\times0.186815602\times0.4\\&=0.055399425 \end{aligned} ah1Eo1=ao1Eo1×zo1ao1×ah1zo1=0.741365069×0.186815602×0.4=0.055399425

同理可得:
∂ E o 2 ∂ a h 1 = − 0.019049119 \frac{\partial E_{o2}}{\partial a_{h1}}=-0.019049119 ah1Eo2=0.019049119
两者相加得:
∂ E t o t a l ∂ a h 1 = ∂ E o 1 ∂ a h 1 + ∂ E o 2 ∂ a h 1 = 0.055399435 − 0.019049119 = 0.036350306 \begin{aligned} \frac{\partial E_{total}}{\partial a_{h1}}&=\frac{\partial E_{o1}}{\partial a_{h1}}+\frac{\partial E_{o2}}{\partial a_{h1}}\\&=0.055399435-0.019049119\\&=0.036350306 \end{aligned} ah1Etotal=ah1Eo1+ah1Eo2=0.0553994350.019049119=0.036350306
∂ a h 1 ∂ z h 1 \frac{\partial a_{h1}}{\partial z_{h1}} zh1ah1
∂ a h 1 ∂ z h 1 = a h 1 × ( 1 − a h 1 ) = 0.593269992 × ( 1 − 0.593269992 ) = 0.2413007086 \begin{aligned} \frac{\partial a_{h1}}{\partial z_{h1}}&=a_{h1}\times(1-a_{h1})\\&=0.593269992\times(1-0.593269992)\\&=0.2413007086 \end{aligned} zh1ah1=ah1×(1ah1)=0.593269992×(10.593269992)=0.2413007086
∂ z h 1 ∂ w 1 \frac{\partial z_{h1}}{\partial w_1} w1zh1
∂ z h 1 ∂ w 1 = i 1 = 0.05 \frac{\partial z_{h1}}{\partial w_1}=i_1=0.05 w1zh1=i1=0.05
最终结果:
∂ E t o t a l ∂ w 1 = 0.036350306 × 0.2413007086 × 0.05 = 0.000438568 \frac{\partial E_{total}}{\partial w_1}=0.036350306\times0.2413007086\times0.05=0.000438568 w1Etotal=0.036350306×0.2413007086×0.05=0.000438568
同上节的简化方法,用 δ h 1 \delta_{h1} δh1表示隐藏层单元 h 1 h_1 h1的误差:
∂ E t o t a l ∂ w 1 = ( ∑ i ∂ E t o t a l ∂ a i × ∂ a i ∂ z i × ∂ z i ∂ a h 1 ) × ∂ a h 1 ∂ z h 1 × ∂ z h 1 ∂ w 1 = ( ∑ i δ i × w h i ) × a h 1 × ( 1 − a h 1 ) × i 1 = δ h 1 × i 1 \begin{aligned} \frac{\partial E_{total}}{\partial w_1}&=(\sum_i\frac{\partial E_{total}}{\partial a_{i}}\times\frac{\partial a_{i}}{\partial z_{i}}\times\frac{\partial z_{i}}{\partial a_{h1}})\times\frac{\partial a_{h1}}{\partial z_{h1}}\times\frac{\partial z_{h1}}{\partial w_1}\\&=(\sum_i\delta_i\times w_{hi})\times a_{h1}\times(1-a_{h1})\times i_1\\&=\delta_{h_1}\times i_1 \end{aligned} w1Etotal=(iaiEtotal×ziai×ah1zi)×zh1ah1×w1zh1=(iδi×whi)×ah1×(1ah1)×i1=δh1×i1
w 1 w_1 w1的权值更新为:
w 1 + = w 1 − η × ∂ E t o t a l ∂ w 1 = 0.15 − 0.5 × 0.000438568 = 0.149780716 w_1^+=w_1-\eta\times\frac{\partial E_{total}}{\partial w_1}=0.15-0.5\times0.000438568=0.149780716 w1+=w1η×w1Etotal=0.150.5×0.000438568=0.149780716
同理,更新 w 2 , w 3 , w 4 w_2,w_3,w_4 w2,w3,w4
w 2 + = 0.19956143 w 3 + = 0.24975114 w 4 + = 0.29950229 w_2^+=0.19956143\\ w_3^+=0.24975114\\ w_4^+=0.29950229 w2+=0.19956143w3+=0.24975114w4+=0.29950229
至此,一次反向传播的过程结束。

训练过程就是这样反复迭代,正向传播后得误差,在反向传播更新权值,再正向传播,这样反复进行,本例再第一次迭代后总误差从0.298371109下降到了0.291027924,在迭代10000次后,总误差降至0.000035085。输出为[0.015912196,0.984065734]


个人学习笔记,仅学习交流,转载请注明出处!

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Moresweet猫甜

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值