背景
反向传播(Backpropagation)是训练神经网络最通用的方法之一,网上有许多文章尝试解释反向传播是如何工作的,但是很少有包括真实数字的例子,这篇博文尝试通过离散的数据解释它是怎样工作的。
Python实现的反向传播
你能使用Python来实现反向传播,我曾经在this Github repo上实现了反向传播算法。
反向传播的可视化
显示神经网络学习时相互作用的可视化,检查我的Neural Network visualization。
另外的资源
如果你发现这个教程对你有用并且想继续学习神经网络以及它的应用,我强烈建议你看Adrian Rosebrock优秀的教程 Getting Started with Deep Learning and Python。
概述
对于这个教程,我们将使用2个输入神经元、2个隐含层神经元以及2个输出层神经元组成一个神经网络,另外,隐含层和输出层神经元各包含一个偏差。
这是基本结构:
目的让神经网络工作,我们对权重、偏差和训练的输入/输出设置一个初始值:
反向传播的目的是优化权重,以便于让神经网络学习怎样正确的把任意的输入映射到输出中。
这篇教程的剩余部分我们将要和单一的训练集工作:输入0.05和0.10,我们想要神经网络输出0.01和0.99。
前向反馈
为了开始,当前给定权重和偏差以及输入值0.05和0.10,神经网络预测结果是什么,我们需要把输入值向前传给网络。
我们知道全部的输入值传到每个隐含层神经元中,使用激活函数挤压全部的输入值(在这里,我们使用logistic函数),对输出层神经元重复这一过程。
计算h1的输入:
n
e
t
h
1
=
w
1
∗
i
1
+
w
2
∗
i
2
+
b
1
∗
1
n
e
t
h
1
=
0.15
∗
0.05
+
0.2
∗
0.1
+
0.35
∗
1
=
0.3775
\begin{array}{l} net_{h 1}=w_{1} * i_{1}+w_{2} * i_{2}+b_{1} * 1 \\ \\ net_{h 1}=0.15 * 0.05+0.2 * 0.1+0.35 * 1=0.3775 \end{array}
neth1=w1∗i1+w2∗i2+b1∗1neth1=0.15∗0.05+0.2∗0.1+0.35∗1=0.3775
然后我们利用logistic函数把neth1挤压到h1的输出:
o
u
t
h
1
=
1
1
+
e
h
−
n
e
t
1
=
1
1
+
e
−
0.3775
=
0.593269992
\text out _{h1}=\frac{1}{1+e^{-n e t}_{h} 1}=\frac{1}{1+e^{-0.3775}}=0.593269992
outh1=1+eh−net11=1+e−0.37751=0.593269992
对
h
2
h_{2}
h2进行相同的操作:
o
u
t
h
2
out_{h 2}
outh2=0.596884378
对输出层神经元重复操作,使用隐含层神经元的输出作为输出层神经元的输入。
这是
o
1
o_{1}
o1的输出:
n
e
t
o
1
=
w
5
∗
o
u
t
h
1
+
w
6
∗
o
u
t
h
2
+
b
2
∗
1
\text net _{o 1}=w_{5} * \text out_{h 1}+w_{6} * \text out_{h 2}+b_{2} * 1
neto1=w5∗outh1+w6∗outh2+b2∗1
对
o
2
o_{2}
o2进行相同操作:
o
u
t
o
2
out_{o2}
outo2=0.772928465
计算整体误差
利用平方和误差,我们能计算每个输出层神经元的误差:
E
total
=
∑
1
2
(
target
−
output
)
2
E_{\text {total }}=\sum \frac{1}{2}(\text { target }-\text { output })^{2}
Etotal =∑21( target − output )2
例如,目标输出
o
1
o_{1}
o1是0.01,但是神经网络输出是0.75136507,因此误差是:
E
o
1
=
1
2
(
t
a
r
g
e
t
o
1
−
o
u
t
o
1
)
2
=
1
2
(
0.01
−
0.75136507
)
2
=
0.274811083
E_{o 1}=\frac{1}{2}\left(\text target_{o 1}-\text out_{o 1}\right)^{2}=\frac{1}{2}(0.01-0.75136507)^{2}=0.274811083
Eo1=21(targeto1−outo1)2=21(0.01−0.75136507)2=0.274811083
对
o
2
o_{2}
o2重复这个过程:
E
o
2
E_{o2}
Eo2=0.023560026
神经网络整体误差:
E
total
=
E
o
1
+
E
o
2
=
0.274811083
+
0.023560026
=
0.298371109
E_{\text {total }}=E_{o 1}+E_{o 2}=0.274811083+0.023560026=0.298371109
Etotal =Eo1+Eo2=0.274811083+0.023560026=0.298371109
反向传播
反向传播的目的是更新网络中每个权重,以便他们真实的输出值是接近目标输出,从而最小化输出层神经元的误差。
输出层
考虑
w
5
w_{5}
w5,我们想要知道
w
5
w_{5}
w5怎样影响整体误差,即
α
E
total
α
w
5
\frac{\alpha E_{\text {total }}}{\alpha w_{5}}
αw5αEtotal
应用链式规则:
KaTeX parse error: Undefined control sequence: \textnet at position 191: …\frac{\partial \̲t̲e̲x̲t̲n̲e̲t̲_{o 1}}{\partia…
可视化我们正在做的:
我们需要理解这个公式的每一步。
首先,output怎样改变整体误差?
E
total
=
1
2
(
t
a
r
g
e
t
o
1
−
o
u
t
o
1
)
2
+
1
2
(
t
a
r
g
e
t
o
2
−
o
u
t
o
2
)
2
∂
E
total
∂
o
u
t
o
1
=
2
∗
1
2
(
target
o
1
−
o
u
t
o
1
)
2
−
1
∗
−
1
+
0
∂
E
total
∂
o
u
t
o
1
=
−
(
target
o
1
−
o
u
t
o
1
)
=
−
(
0.01
−
0.75136507
)
=
0.74136507
\begin{array}{l} E_{\text {total }}=\frac{1}{2}\left(\text target_{o 1}-\text out_{o 1}\right)^{2}+\frac{1}{2}\left(\text target_{o 2}-\text out_{o 2}\right)^{2}\\ \\ \frac{\partial E_{\text {total }}}{\partial \text out_{o 1}}=2 * \frac{1}{2}\left(\operatorname{target}_{o 1}-\text out_{o 1}\right)^{2-1} *-1+0\\ \\ \frac{\partial E_{\text {total }}}{\partial \text out_{o 1}}=-\left(\operatorname{target}_{o 1}-\text out_{o 1}\right)=-(0.01-0.75136507)=0.74136507 \end{array}
Etotal =21(targeto1−outo1)2+21(targeto2−outo2)2∂outo1∂Etotal =2∗21(targeto1−outo1)2−1∗−1+0∂outo1∂Etotal =−(targeto1−outo1)=−(0.01−0.75136507)=0.74136507
下一步,net input怎样改变
o
1
o_{1}
o1输出?
logistic函数的偏导数是输出乘以1减输出:
o
u
t
o
=
1
1
+
e
−
n
e
t
o
1
∂
o
u
t
o
1
∂
n
e
t
o
1
=
o
u
t
o
1
(
1
−
o
u
t
o
1
)
=
0.75136507
(
1
−
0.75136507
)
=
0.186815602
\begin{array}{l} \text out _{o }=\frac{1}{1+e^{-net_{o1}}} \\ \\ \frac{\partial \text out_{o1}}{\partial \text net _{o1}}=\text out _{o 1}\left(1-\text out _{o 1}\right)=0.75136507(1-0.75136507)=0.186815602 \end{array}
outo=1+e−neto11∂neto1∂outo1=outo1(1−outo1)=0.75136507(1−0.75136507)=0.186815602
最后,
w
5
w_{5}
w5怎样改变
o
1
o_{1}
o1的net input?
n
e
t
o
1
=
w
5
∗
o
u
t
h
1
+
w
6
∗
o
u
t
h
2
+
b
2
∗
1
∂
n
e
t
o
1
∂
w
5
=
1
∗
o
u
t
h
1
∗
w
5
(
1
−
1
)
+
0
+
0
=
o
u
t
h
1
=
0.593269992
\begin{array}{l} net_{o 1}=w_{5} * \text out_{h 1}+w_{6} * \text out_{h 2}+b_{2} * 1 \\ \\ \frac{\partial \text net_{o 1}}{\partial w_{5}}=1 * \text out_{h 1} * w_{5}^{(1-1)}+0+0=\text out_{h 1}=0.593269992 \end{array}
neto1=w5∗outh1+w6∗outh2+b2∗1∂w5∂neto1=1∗outh1∗w5(1−1)+0+0=outh1=0.593269992
把它们结合起来:
∂
E
total
∂
w
5
=
∂
E
total
∂
o
u
t
o
1
∗
∂
out
o
1
∂
n
e
t
o
1
∗
∂
n
e
t
o
1
∂
w
5
∂
E
total
∂
w
5
=
0.74136507
∗
0.186815602
∗
0.593269992
=
0.082167041
\begin{array}{l} \frac{\partial E_{\text {total }}}{\partial w_{5}}=\frac{\partial E_{\text {total }}}{\partial o u t_{o 1}} * \frac{\partial \text { out }_{o 1}}{\partial n e t_{o 1}} * \frac{\partial n e t_{o 1}}{\partial w_{5}} \\ \\ \frac{\partial E_{\text {total }}}{\partial w_{5}}=0.74136507 * 0.186815602 * 0.593269992=0.082167041 \end{array}
∂w5∂Etotal =∂outo1∂Etotal ∗∂neto1∂ out o1∗∂w5∂neto1∂w5∂Etotal =0.74136507∗0.186815602∗0.593269992=0.082167041
你常常能看到delta rule的结合形式:
∂
E
total
∂
w
5
=
−
(
target
o
1
−
o
u
t
o
1
)
∗
o
u
t
o
1
(
1
−
o
u
t
o
1
)
∗
o
u
t
h
1
\frac{\partial E_{\text {total }}}{\partial w_{5}}=-\left(\operatorname {target}_{o 1}-\text out_{o 1}\right) * \text out_{o 1}\left(1-\text out_{o 1}\right) * \text out_{h 1}
∂w5∂Etotal =−(targeto1−outo1)∗outo1(1−outo1)∗outh1
我们利用
∂
E
total
∂
o
u
t
o
1
和
∂
o
u
t
o
1
∂
n
e
t
o
1
来重写
∂
E
total
∂
n
e
t
o
1
,
\frac{\partial E_{\text {total }}}{\partial \text out_{o 1}} \text { 和 } \frac{\partial \text out_{o 1}}{\partial net_{o 1}} \text { 来重写 } \frac{\partial E_{\text {total }}}{\partial net_{o 1}},
∂outo1∂Etotal 和 ∂neto1∂outo1 来重写 ∂neto1∂Etotal ,,我们使用这个重新上面的表达式:
δ
o
1
=
∂
E
total
∂
o
u
t
ol
∗
∂
o
u
t
o
1
∂
n
e
t
o
1
=
∂
E
total
∂
n
e
t
o
1
δ
o
1
=
−
(
t
a
r
g
e
t
o
1
−
o
u
t
o
1
)
∗
o
u
t
o
1
(
1
−
o
u
t
o
1
)
\begin{array}{l} \delta_{o 1}=\frac{\partial E_{\text {total }}}{\partial o u t_{\text {ol }}} * \frac{\partial \text out_{o 1}}{\partial \text net_{o 1}}=\frac{\partial E_{\text {total }}}{\partial \text net_{o 1}} \\ \\ \delta_{o 1}=-\left(\text target_{o 1}-\text out_{o 1}\right) * \text out_{o 1}\left(1-\text out_{o 1}\right) \end{array}
δo1=∂outol ∂Etotal ∗∂neto1∂outo1=∂neto1∂Etotal δo1=−(targeto1−outo1)∗outo1(1−outo1)
因此:
∂
E
total
∂
w
5
=
δ
o
1
o
u
t
h
1
\frac{\partial E_{\text {total }}}{\partial w_{5}}=\delta_{o 1} \text out_{h 1}
∂w5∂Etotal =δo1outh1
为了减少误差,我们从当前权重减去这个值(乘以一个学习率,设置成0.5):
w
5
t
+
1
=
w
5
t
−
η
∗
∂
E
total
∂
w
5
t
=
0.4
−
0.5
∗
0.082167041
=
0.35891648
w_{5}^{t+1}=w_{5}^{t}-\eta * \frac{\partial E_{\text {total }}}{\partial w_{5}^{t}}=0.4-0.5 * 0.082167041=0.35891648
w5t+1=w5t−η∗∂w5t∂Etotal =0.4−0.5∗0.082167041=0.35891648
我们能重复这个过程得到新的权重
w
6
w_{6}
w6,
w
7
w_{7}
w7和
w
8
w_{8}
w8:
w
6
t
+
1
=
0.408666186
w
7
t
+
1
=
0.511301270
w
8
t
+
1
=
0.561370121
\begin{array}{l} w_{6}^{t+1}=0.408666186 \\ \\ w_{7}^{t+1}=0.511301270 \\ \\ w_{8}^{t+1}=0.561370121 \end{array}
w6t+1=0.408666186w7t+1=0.511301270w8t+1=0.561370121
当我们继续下面的反向传输算法时,我们使用初始权重,而不是更新过的权重。
隐含层
下一步,我们将继续向后计算
w
1
,
w
2
,
w
3
和
w
4
w_{1},w_{2},w_{3}和w_{4}
w1,w2,w3和w4新值,这是我们需要理解的:
∂
E
total
∂
w
1
=
∂
E
total
∂
o
u
t
h
1
∗
∂
o
u
t
h
1
∂
n
e
t
h
1
∗
∂
n
e
t
h
1
∂
w
1
\frac{\partial E_{\text {total }}}{\partial w_{1}}=\frac{\partial E_{\text {total }}}{\partial o u t_{h 1}} * \frac{\partial o u t_{h 1}}{\partial n e t_{h 1}} * \frac{\partial n e t_{h 1}}{\partial w_{1}}
∂w1∂Etotal =∂outh1∂Etotal ∗∂neth1∂outh1∗∂w1∂neth1
可视化:
我们将要对隐含层神经元使用相似的过程,但是稍微不同的是,每个隐含层神经元的输出贡献到多个输出层神经元中。我们知道
o
u
t
h
1
影
响
o
u
t
o
1
和
o
u
t
o
2
outh_{1}影响outo_{1}和outo_{2}
outh1影响outo1和outo2,因此
∂
E
total
∂
o
u
t
h
1
\frac{\partial E_{\text {total }}}{\partial \text out_{h 1}}
∂outh1∂Etotal 需要考虑两个输出层神经元的影响:
∂
E
total
∂
o
u
t
h
1
=
∂
E
o
1
∂
o
u
t
h
1
+
∂
E
o
2
∂
o
u
t
h
1
\frac{\partial E_{\text {total }}}{\partial \text out_{h 1}}=\frac{\partial E_{o 1}}{\partial \text out_{h 1}}+\frac{\partial E_{o 2}}{\partial \text out_{h 1}}
∂outh1∂Etotal =∂outh1∂Eo1+∂outh1∂Eo2
先计算
∂
E
o
1
∂
o
u
t
h
1
:
\frac{\partial E_{o 1}}{\partial o u t_{h 1}} \text { : }
∂outh1∂Eo1 :
∂
E
o
1
∂
o
u
t
h
1
=
∂
E
o
1
α
n
e
t
o
1
∗
∂
n
e
t
o
1
∂
o
u
t
h
1
\frac{\partial E_{o 1}}{\partial o u t_{h 1}}=\frac{\partial E_{o 1}}{\alpha n e t_{o 1}} * \frac{\partial net_{o 1}}{\partial out_{h 1}}
∂outh1∂Eo1=αneto1∂Eo1∗∂outh1∂neto1
使用稍早前计算的值来计算
∂
E
o
1
∂
n
e
t
o
1
\frac{\partial E_{o 1}}{\partial net_{o 1}}
∂neto1∂Eo1:
∂
E
o
1
∂
n
e
t
o
1
=
∂
E
o
1
∂
o
u
t
o
1
∗
∂
o
u
t
o
1
∂
n
e
t
o
1
=
0.74136507
∗
0.186815602
\frac{\partial E_{o 1}}{\partial \text net_{o 1}}=\frac{\partial E_{o 1}}{\partial \text out_{o 1}} * \frac{\partial out_{o 1}}{\partial \text net_{o 1}}=0.74136507 * 0.186815602
∂neto1∂Eo1=∂outo1∂Eo1∗∂neto1∂outo1=0.74136507∗0.186815602
∂
n
e
t
o
1
∂
o
u
t
h
1
\frac{\partial net_{o 1}}{\partial \text out_{h 1}}
∂outh1∂neto1 等于
w
5
w_{5}
w5 :
n
e
t
o
1
=
w
5
∗
o
u
t
h
1
+
w
6
∗
o
u
t
h
2
+
b
2
∗
1
∂
n
e
t
o
1
∂
o
u
t
h
1
=
w
5
=
0.40
\begin{array}{l} n e t_{o 1}=w_{5} * o u t_{h 1}+w_{6} * o u t_{h 2}+b_{2} * 1 \\ \\ \frac{\partial net_{o 1}}{\partial \text out_{h 1}}=w_{5}=0.40 \end{array}
neto1=w5∗outh1+w6∗outh2+b2∗1∂outh1∂neto1=w5=0.40
合在一起:
∂
E
o
1
∂
o
u
t
h
1
=
∂
E
o
1
∂
n
e
t
o
1
∗
∂
n
e
t
o
1
∂
o
u
t
h
1
=
0.138498562
∗
0.40
=
0.055399425
\frac{\partial E_{o 1}}{\partial o u t_{h 1}}=\frac{\partial E_{o 1}}{\partial net_{o 1}} * \frac{\partial net_{o 1}}{\partial \text out_{h 1}}=0.138498562 * 0.40=0.055399425
∂outh1∂Eo1=∂neto1∂Eo1∗∂outh1∂neto1=0.138498562∗0.40=0.055399425
对
∂
E
o
2
∂
o
u
t
o
1
\frac{\partial E_{o 2}}{\partial o u t_{o 1}}
∂outo1∂Eo2 做相同的处理:
∂
E
o
2
∂
o
u
t
h
1
=
−
0.019049119
\frac{\partial E_{o2}}{\partial out_{h 1}}=-0.019049119
∂outh1∂Eo2=−0.019049119
因此:
∂
E
totat
∂
o
u
t
h
1
=
∂
E
o
1
∂
o
u
t
h
1
+
∂
E
o
3
∂
o
u
t
h
1
=
0.055399425
+
−
0.019049119
=
0.036350306
\frac{\partial E_{\text {totat }}}{\partial \text out_{h1}}=\frac{\partial E_{o 1}}{\partial \text out_{h 1}}+\frac{\partial E_{o 3}}{\partial \text out_{h 1}}=0.055399425+-0.019049119=0.036350306
∂outh1∂Etotat =∂outh1∂Eo1+∂outh1∂Eo3=0.055399425+−0.019049119=0.036350306
现在我们有
∂
E
t
o
t
a
l
∂
o
u
t
h
1
\frac{\partial E_{t o t a l}}{\partial o u t_{h 1}}
∂outh1∂Etotal, 我们还需要计算
∂
o
u
t
h
1
∂
n
e
t
h
1
\frac{\partial o u t_{h 1}}{\partial n e t_{h 1}}
∂neth1∂outh1, 然后对每个权重计算
∂
n
e
t
h
1
∂
w
\frac{\partial n e t_{h 1}}{\partial w}
∂w∂neth1 :
o
u
t
h
1
=
1
1
+
e
−
n
e
t
h
1
∂
o
u
t
h
1
∂
n
e
t
h
1
=
o
u
t
h
1
(
1
−
o
u
t
h
1
)
=
0.59326999
(
1
−
0.59326999
)
=
0.241300709
\begin{array}{l} \text out_{h 1}=\frac{1}{1+e^{-n e t_{h 1}}} \\ \\ \frac{\partial out_{h 1}}{\partial n e t_{h 1}}=\text out_{h 1}\left(1-\text out_{h 1}\right)=0.59326999(1-0.59326999)=0.241300709 \end{array}
outh1=1+e−neth11∂neth1∂outh1=outh1(1−outh1)=0.59326999(1−0.59326999)=0.241300709
我们计算
h
1
h_{1}
h1 对
w
1
w_{1}
w1 的偏导数:
n
e
t
h
1
=
w
1
∗
i
1
+
w
2
∗
i
2
+
b
1
∗
1
∂
n
e
t
h
1
∂
w
1
=
i
1
=
0.05
\begin{array}{l} n e t_{h 1}=w_{1} * i_{1}+w_{2} * i_{2}+b_{1} * 1 \\ \\ \frac{\partial n e t_{h 1}}{\partial w_{1}}=i_{1}=0.05 \end{array}
neth1=w1∗i1+w2∗i2+b1∗1∂w1∂neth1=i1=0.05
把它们结合起来:
∂
E
total
∂
w
1
=
∂
E
total
∂
o
u
h
1
∗
∂
o
u
t
h
1
∂
n
e
t
h
1
∗
∂
n
e
t
h
1
∂
w
1
∂
E
total
∂
w
1
=
0.036350306
∗
0.241300709
∗
0.05
=
0.000438568
\begin{array}{l} \frac{\partial E_{\text {total }}}{\partial w_{1}}=\frac{\partial E_{\text {total }}}{\partial o u_{h 1}} * \frac{\partial o u t_{h 1}}{\partial n e t_{h 1}} * \frac{\partial n e t_{h 1}}{\partial w_{1}} \\ \\ \frac{\partial E_{\text {total}}}{\partial w_{1}}=0.036350306 * 0.241300709 * 0.05=0.000438568 \end{array}
∂w1∂Etotal =∂ouh1∂Etotal ∗∂neth1∂outh1∗∂w1∂neth1∂w1∂Etotal=0.036350306∗0.241300709∗0.05=0.000438568
你也可以如下写:
∂
E
total
∂
w
1
=
(
∑
o
∂
E
total
∂
o
u
t
∘
∗
∂
o
u
t
0
∂
n
e
t
o
∗
∂
net
0
∂
o
u
t
h
1
)
∗
∂
o
u
t
h
1
∂
n
e
t
h
1
∗
∂
n
e
t
h
1
∂
w
1
∂
E
total
∂
w
1
=
(
∑
o
δ
o
∗
w
h
o
)
∗
out
h
1
(
1
−
o
u
t
h
1
)
∗
i
1
∂
E
total
∂
w
1
=
δ
h
1
i
1
\begin{array}{l} \frac{\partial E_{\text {total }}}{\partial w_{1}}=\left(\sum_{o} \frac{\partial E_{\text {total }}}{\partial o u t_{\circ}} * \frac{ \partial out_{0}}{\partial n e t_{o}} * \frac{\partial \text { net }_{0}}{\partial o u t_{h 1}}\right) * \frac{\partial o u t_{h 1}}{\partial n e t_{h 1}} * \frac{\partial n e t_{h 1}}{\partial w_{1}} \\ \\ \frac{\partial E_{\text {total }}}{\partial w_{1}}=\left(\sum_{o} \delta_{o} * w_{h o}\right) * \operatorname{out}_{h 1}\left(1-o u t_{h 1}\right) * i_{1} \\ \\ \frac{\partial E_{\text {total }}}{\partial w_{1}}=\delta_{h 1} i_{1} \end{array}
∂w1∂Etotal =(∑o∂out∘∂Etotal ∗∂neto∂out0∗∂outh1∂ net 0)∗∂neth1∂outh1∗∂w1∂neth1∂w1∂Etotal =(∑oδo∗who)∗outh1(1−outh1)∗i1∂w1∂Etotal =δh1i1
现在我们能更新
w
1
w_{1}
w1 :
w
1
t
+
1
=
w
1
t
−
η
∗
∂
E
total
∂
w
1
t
=
0.15
−
0.5
∗
0.000438568
=
0.149780716
w_{1}^{t+1}=w_{1}^{t}-\eta * \frac{\partial E_{\text {total }}}{\partial w_{1}^{t}}=0.15-0.5 * 0.000438568=0.149780716
w1t+1=w1t−η∗∂w1t∂Etotal =0.15−0.5∗0.000438568=0.149780716
对
w
2
,
w
3
w_{2}, w_{3}
w2,w3 和
w
4
w_{4}
w4 重复上面过程:
w
2
t
+
1
=
0.19956143
w
3
t
+
1
=
0.24975114
w
4
t
+
1
=
0.29950229
\begin{array}{l} w_{2}^{t+1}=0.19956143 \\ \\ w_{3}^{t+1}=0.24975114 \\ \\ w_{4}^{t+1}=0.29950229 \end{array}
w2t+1=0.19956143w3t+1=0.24975114w4t+1=0.29950229
最后,我们更新所有权重,当我们把输入0.05和0.1向前反馈,神经网络的误差为0.298371109,在一次反向传播后,整体误差降到0.291027924,它看似不多,但是重复10000次之后,误差大幅下降到0.000035085,在这之后,我们把输入0.05和0.1向前反馈,那么输出的2个神经元生成0.015912196(vs 目标0.01)和0.984065734(vs 目标0.99)。