1 Brief Introduction of Deep Learning
Neural Network
确定参数:
一个神经元中所有的w和b记为参数θ
确定神经元的连接方式:
Fully Connect Feedforward Network
用矩阵形式表示:
goodness of function
使用梯度下降求解使loss最小的θ,loss为交叉熵
Backpropagation反向传播
假设只有1层隐含层,激活函数默认为sigmoid。
当我们通过前向传播得到y_hat后,此时可以得到误差,为缩小误差,我们需要采用反向传播更新参数,这边以w5和w1的更新为例:
E
=
1
2
(
y
t
r
u
e
−
y
p
r
e
)
2
E=\frac{1}{2}(y_{true}-y_{pre})^2
E=21(ytrue−ypre)2
∂
E
∂
w
5
=
∂
E
∂
o
u
t
∂
o
u
t
∂
n
e
t
∂
n
e
t
∂
w
5
\frac{\partial E}{\partial w5}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial net}\frac{\partial net}{\partial w5}
∂w5∂E=∂out∂E∂net∂out∂w5∂net
∂
E
∂
o
u
t
=
−
(
y
t
r
u
e
−
y
p
r
e
)
=
−
(
0.01
−
0.751
)
=
0.741
\frac{\partial E}{\partial out}=-(y_{true}-y_{pre})=-(0.01-0.751)=0.741
∂out∂E=−(ytrue−ypre)=−(0.01−0.751)=0.741
∵
o
u
t
=
1
1
+
e
−
n
e
t
\because out=\frac{1}{1+e^{-net}}
∵out=1+e−net1
∂
o
u
t
∂
n
e
t
=
o
u
t
(
1
−
o
u
t
)
=
0.751
(
1
−
0.751
)
=
0.1868
\frac{\partial out}{\partial net}=out(1-out)=0.751(1-0.751)=0.1868
∂net∂out=out(1−out)=0.751(1−0.751)=0.1868
∵
n
e
t
=
w
5
h
1
+
w
6
h
2
\because net=w_{5}h1+w_{6}h2
∵net=w5h1+w6h2
∂
n
e
t
∂
w
5
=
h
1
=
0.5932
\frac{\partial net}{\partial w5}=h1=0.5932
∂w5∂net=h1=0.5932
∴
∂
E
∂
w
5
=
0.741
∗
0.1868
∗
0.5932
=
0.0821
\therefore \frac{\partial E}{\partial w5}=0.741*0.1868*0.5932=0.0821
∴∂w5∂E=0.741∗0.1868∗0.5932=0.0821
再回过来看公式
∂
E
∂
w
5
=
−
(
y
t
r
u
e
−
y
p
r
e
)
o
u
t
(
1
−
o
u
t
)
h
1
\frac{\partial E}{\partial w5}=-(y_{true}-y_{pre})out(1-out)h1
∂w5∂E=−(ytrue−ypre)out(1−out)h1
用
δ
o
l
\delta_{ol}
δol表示输出层的误差,
δ
o
l
=
=
∂
E
t
o
t
a
l
∂
o
u
t
o
l
∂
o
u
t
o
l
∂
n
e
t
o
l
\delta_{ol}==\frac{\partial E_{total}}{\partial out_{ol}}\frac{\partial out_{ol}}{\partial net_{ol}}
δol==∂outol∂Etotal∂netol∂outol,所以整体误差E(total)对w5的偏导公式可以写成:
∂
E
t
o
t
a
l
∂
w
5
=
δ
o
l
o
u
t
h
1
\frac{\partial E_{total}}{\partial w5}=\delta_{ol}out_{h1}
∂w5∂Etotal=δolouth1.
最后我们来更新w5的值:
w
5
′
=
w
5
−
η
∂
E
t
o
t
a
l
∂
w
5
=
0.4
−
0.5
∗
0.0821
w_{5'} =w_{5}-\eta \frac{\partial E_{total}}{\partial w5}=0.4-0.5*0.0821
w5′=w5−η∂w5∂Etotal=0.4−0.5∗0.0821
更新隐含层参数w1:
∂
E
o
1
∂
w
1
=
(
∂
E
o
1
∂
o
u
t
o
1
∂
o
u
t
o
1
∂
n
e
t
o
1
∂
n
e
t
o
1
∂
h
1
+
∂
E
o
2
∂
o
u
t
o
2
∂
o
u
t
o
2
∂
n
e
t
o
2
∂
n
e
t
o
2
∂
h
2
)
∂
h
1
∂
n
e
t
h
1
∂
n
e
t
h
1
∂
w
1
\frac{\partial E_{o1}}{\partial w1}=(\frac{\partial E_{o1}}{\partial out_{o1}}\frac{\partial out_{o1}}{\partial net_{o1}}\frac{\partial net_{o1}}{\partial h_{1}}+\frac{\partial E_{o2}}{\partial out_{o2}}\frac{\partial out_{o2}}{\partial net_{o2}}\frac{\partial net_{o2}}{\partial h_{2}})\frac{\partial h_{1}}{\partial net_{h1}}\frac{\partial net_{h1}}{\partial w1}
∂w1∂Eo1=(∂outo1∂Eo1∂neto1∂outo1∂h1∂neto1+∂outo2∂Eo2∂neto2∂outo2∂h2∂neto2)∂neth1∂h1∂w1∂neth1
∵
n
e
t
o
1
=
w
5
h
1
+
w
6
h
2
+
b
\because net_{o1}=w_{5}h_{1}+w_{6}h_{2}+b
∵neto1=w5h1+w6h2+b
∴
∂
n
e
t
o
1
∂
h
1
=
w
5
=
0.4
\therefore \frac{\partial net_{o1}}{\partial h1}=w_{5}=0.4
∴∂h1∂neto1=w5=0.4
∴
∂
E
o
1
∂
h
1
=
0.741
∗
0.1868
∗
0.4
=
0.055
\therefore \frac{\partial E_{o1}}{\partial h1}=0.741*0.1868*0.4=0.055
∴∂h1∂Eo1=0.741∗0.1868∗0.4=0.055
同理可求出:
∂
E
o
2
∂
h
2
=
∂
E
o
2
∂
o
u
t
o
2
∂
o
u
t
o
2
∂
n
e
t
o
2
∂
n
e
t
o
2
∂
h
2
=
−
0.019
\frac{\partial E_{o2}}{\partial h2}=\frac{\partial E_{o2}}{\partial out_{o2}}\frac{\partial out_{o2}}{\partial net_{o2}}\frac{\partial net_{o2}}{\partial h_{2}}=-0.019
∂h2∂Eo2=∂outo2∂Eo2∂neto2∂outo2∂h2∂neto2=−0.019
∂ h 1 ∂ n e t h 1 = n e t h 1 ( 1 − n e t h 1 ) = 0.5932 ( 1 − 0.5932 ) = 0.2413 \frac{\partial h_{1}}{\partial net_{h1}}=net_{h1}(1-net_{h1})=0.5932(1-0.5932)=0.2413 ∂neth1∂h1=neth1(1−neth1)=0.5932(1−0.5932)=0.2413
∂ n e t h 1 ∂ w 1 = i 1 = 0.05 \frac{\partial net_{h1}}{\partial w_{1}}=i_{1}=0.05 ∂w1∂neth1=i1=0.05
∴ ∂ E t o t a l ∂ w 1 = ( 0.055 − 0.019 ) ∗ 0.2413 ∗ 0.05 = 0.0004 \therefore \frac{\partial E_{total}}{\partial w1}=(0.055−0.019)*0.2413*0.05=0.0004 ∴∂w1∂Etotal=(0.055−0.019)∗0.2413∗0.05=0.0004
最后更新权重w1:
w
1
=
w
1
−
η
∂
E
t
o
t
a
l
∂
w
1
=
0.15
−
0.5
∗
0.0004
=
0.1497
w_{1}=w_{1}-\eta \frac{\partial E_{total}}{\partial w1}=0.15-0.5*0.0004=0.1497
w1=w1−η∂w1∂Etotal=0.15−0.5∗0.0004=0.1497