导语
这是我写的第一篇关于机器学习的文章,以后还会有更多有关机器学习以及深度学习的总结,敬请期待。
给定一个方程
f
(
x
)
=
3
x
2
+
4
x
+
5
f(x) = 3x^2+4x+5
f(x)=3x2+4x+5,未知参数为
x
x
x,求
x
x
x等于多少时,
f
(
x
)
f(x)
f(x)有最小值,我们首先想到的是求倒数,令其等于
0
0
0:
f
′
(
x
)
=
6
x
+
4
=
0
,
x
=
−
2
3
f^\prime(x)=6x+4=0, x=-\frac{2}{3}
f′(x)=6x+4=0,x=−32
即当
x
=
−
2
3
x=-\frac{2}{3}
x=−32时
f
(
x
)
f(x)
f(x)有最小值,为
f
(
−
2
3
)
=
3.6
f(-\frac{2}{3})=3.6
f(−32)=3.6
其次我们还可以用梯度下降法,赋给
x
x
x一个初始值(例如
x
=
10
x=10
x=10),又已知
f
′
(
x
)
=
6
x
+
4
f^\prime(x)=6x+4
f′(x)=6x+4,梯度下降法的本质就是用
x
x
x的原坐标减去
x
x
x在某一点的斜率,使
f
(
x
)
f(x)
f(x)往最小值方向走,从而得到最小值。设学习率为
η
=
0.1
\eta=0.1
η=0.1,即
x
x
x每走一步的幅度大小,例如
f
′
(
10
)
=
6
×
10
+
4
=
64
,
x
′
=
x
−
η
f
′
(
10
)
=
10
−
0.1
×
64
=
3.6
,
f^\prime(10)=6\times10+4=64,\\ x^\prime=x-\eta f^\prime(10)=10-0.1\times64=3.6,
f′(10)=6×10+4=64,x′=x−ηf′(10)=10−0.1×64=3.6,
依次类推,直到
x
x
x逼近
−
2
3
-\frac{2}{3}
−32。
了解了梯度下降法的定义后,我们利用它来解决神经网络中的反向传播为题,需要定义一个损失函数cost_function
,即上面的f(x)
神经网络结构如下图所示:
其中包括1层输入层(输入量为
x
1
x_1
x1和
x
2
x_2
x2),1层中间层(输入为
i
n
2
in^2
in2,输出为
o
u
t
2
out^2
out2),1层输出层(输入为
i
n
3
in^3
in3,输出为
o
u
t
3
out^3
out3),我们用输出层的输出,即
o
u
t
3
out^3
out3与标签
y
^
\hat{y}
y^构造损失函数cost_function
,里面的参数为
ω
\omega
ω和
b
b
b,通过反向传播算法,得到当
ω
\omega
ω和
b
b
b取何值时cost_function
有最小值。
明确了目标之后,再来看神经网络涉及到的所有参数:
前向传播过程
[ X 1 X 2 ] ⋅ [ W 11 2 W 12 2 W 13 2 W 21 2 W 22 2 W 23 2 ] + [ b 1 2 b 2 2 b 3 2 ] → [ i n 1 2 i n 2 2 i n 3 2 ] ⟶ s i g m o i d [ o u t 1 2 o u t 2 2 o u t 3 2 ] \begin{bmatrix} X_1 & X_2 \end{bmatrix} \cdot \begin{bmatrix} W_{11}^{2} & W_{12}^{2} & W_{13}^{2} \\ W_{21}^{2} & W_{22}^{2} & W_{23}^{2} \end{bmatrix} + \begin{bmatrix} b_{1}^{2} & b_{2}^{2} & b_{3}^{2} \end{bmatrix} \rightarrow \begin{bmatrix} in_{1}^{2} & in_{2}^{2} & in_{3}^{2} \end{bmatrix} \mathop{\longrightarrow}\limits^{sigmoid} \begin{bmatrix} out_{1}^{2} & out_{2}^{2} & out_{3}^{2} \end{bmatrix} [X1X2]⋅[W112W212W122W222W132W232]+[b12b22b32]→[in12in22in32]⟶sigmoid[out12out22out32]
[ o u t 1 2 o u t 2 2 o u t 3 2 ] ⋅ [ W 11 2 W 12 2 W 21 2 W 22 2 W 31 2 W 32 2 ] + [ b 1 3 b 2 3 ] → [ i n 1 3 i n 2 3 ] ⟶ s i g m o i d [ o u t 1 3 o u t 2 3 ] \begin{bmatrix} out_{1}^{2} & out_{2}^{2} & out_{3}^{2} \end{bmatrix} \cdot \begin{bmatrix} W_{11}^{2} & W_{12}^{2} \\ W_{21}^{2} & W_{22}^{2} \\ W_{31}^{2} & W_{32}^{2} \end{bmatrix} + \begin{bmatrix} b_{1}^{3} & b_{2}^{3} \end{bmatrix} \rightarrow \begin{bmatrix} in_{1}^{3} & in_{2}^{3} \end{bmatrix} \mathop{\longrightarrow}\limits^{sigmoid} \begin{bmatrix} out_{1}^{3} & out_{2}^{3} \end{bmatrix} [out12out22out32]⋅⎣⎡W112W212W312W122W222W322⎦⎤+[b13b23]→[in13in23]⟶sigmoid[out13out23]
分别计算
i
n
1
2
in_{1}^{2}
in12、
i
n
2
2
in_{2}^{2}
in22、
i
n
3
2
in_{3}^{2}
in32以及
i
n
1
3
in_{1}^{3}
in13、
i
n
2
3
in_{2}^{3}
in23和cost_function
:
i n 1 2 = W 11 2 ⋅ X 1 + W 21 2 X 2 + b 1 2 , o u t 1 2 = s i g m o i d ( i n 1 2 ) i n 2 2 = W 12 2 ⋅ X 1 + W 22 2 X 2 + b 2 2 , o u t 2 2 = s i g m o i d ( i n 2 2 ) i n 3 2 = W 13 2 ⋅ X 1 + W 23 2 X 2 + b 3 2 , o u t 3 2 = s i g m o i d ( i n 3 2 ) i n 1 3 = W 11 3 ⋅ o u t 1 2 + W 21 3 o u t 2 2 + W 31 3 o u t 3 2 + b 1 3 , o u t 1 3 = s i g m o i d ( i n 1 2 ) i n 2 3 = W 12 3 ⋅ o u t 1 2 + W 22 3 o u t 2 2 + W 32 3 o u t 3 2 + b 2 3 , o u t 2 3 = s i g m o i d ( i n 2 3 ) c o s t _ f u n c t i o n = 1 2 [ ( o u t 1 3 − y 1 ) 2 + ( o u t 2 3 − y 2 ) 2 ] in_{1}^{2}=W_{11}^{2} \cdot X_{1}+W_{21}^{2}X_{2} + b_{1}^{2}, out_{1}^{2}=sigmoid(in_{1}^{2}) \\ in_{2}^{2}=W_{12}^{2} \cdot X_{1}+W_{22}^{2}X_{2} + b_{2}^{2}, out_{2}^{2}=sigmoid(in_{2}^{2}) \\ in_{3}^{2}=W_{13}^{2} \cdot X_{1}+W_{23}^{2}X_{2} + b_{3}^{2}, out_{3}^{2}=sigmoid(in_{3}^{2}) \\ in_{1}^{3}=W_{11}^{3} \cdot out_{1}^{2} + W_{21}^{3} out_{2}^{2} + W_{31}^{3} out_{3}^{2} + b_{1}^{3}, out_{1}^{3}=sigmoid(in_{1}^{2}) \\ in_{2}^{3}=W_{12}^{3} \cdot out_{1}^{2}+W_{22}^{3} out_{2}^{2} + W_{32}^{3} out_{3}^{2} + b_{2}^{3}, out_{2}^{3}=sigmoid(in_{2}^{3}) \\ cost\_function = \frac{1}{2}[(out_{1}^{3}-y_{1})^{2}+(out_{2}^{3}-y_{2})^{2}] in12=W112⋅X1+W212X2+b12,out12=sigmoid(in12)in22=W122⋅X1+W222X2+b22,out22=sigmoid(in22)in32=W132⋅X1+W232X2+b32,out32=sigmoid(in32)in13=W113⋅out12+W213out22+W313out32+b13,out13=sigmoid(in12)in23=W123⋅out12+W223out22+W323out32+b23,out23=sigmoid(in23)cost_function=21[(out13−y1)2+(out23−y2)2]
对应代码
# training samples 2 inputs and 2 outputs
X = np.random.rand(m, 2)
Y = np.random.rand(m, 2)
#layer 2
W2 = np.ones((2, 3))
b2 = np.ones((1, 3))
in2 = np.dot(X, W2) + b2
out2 = sigmoid(in2)
#layer 3
W3 = np.ones((3, 2))
b3 = np.ones((1, 2))
in3 = np.dot(out2, W3) + b3
out3 = sigmoid(in3)
#initial cost
cost = cost_function(out3, Y)
print("start:", cost)
反向传播过程
反向传播主要是求
cost_function
对于各个 ω \omega ω和 b b b的偏导,要得到它们之前,需要求 ∂ C ∂ i n 1 3 \dfrac{\partial C}{\partial in_{1}^{3}} ∂in13∂C、 ∂ C ∂ i n 2 3 \dfrac{\partial C}{\partial in_{2}^{3}} ∂in23∂C以及 ∂ C ∂ i n 1 2 \dfrac{\partial C}{\partial in_{1}^{2}} ∂in12∂C、 ∂ C ∂ i n 2 2 \dfrac{\partial C}{\partial in_{2}^{2}} ∂in22∂C、 ∂ C ∂ i n 3 2 \dfrac{\partial C}{\partial in_{3}^{2}} ∂in32∂C,得到了这些值,就可以求损失函数对于任意 ω \omega ω和 b b b的偏导了。
∂ C ∂ i n 1 3 = ∂ C ∂ o u t 1 3 ∂ o u t 1 3 ∂ i n 1 3 = ( o u t 1 3 − y 1 ) e − i n 1 3 ( 1 + e − i n 1 3 ) 2 = ( o u t 1 3 − y 1 ) 1 1 + e − i n 1 3 ( 1 − 1 1 + e − i n 1 3 ) \dfrac{\partial C}{\partial in_{1}^{3}}=\dfrac{\partial C}{\partial out_{1}^{3}} \dfrac{\partial out_{1}^{3}}{\partial in_{1}^{3}}=(out_{1}^{3}-y_{1})\frac{e^{-in_{1}^{3}}}{(1+e^{-in_{1}^{3}})^{2}} \\ =(out_{1}^{3}-y_{1})\frac{1}{1+e^{-in_{1}^{3}}}(1-\frac{1}{1+e^{-in_{1}^{3}}}) ∂in13∂C=∂out13∂C∂in13∂out13=(out13−y1)(1+e−in13)2e−in13=(out13−y1)1+e−in131(1−1+e−in131)
同理可得 ∂ C ∂ i n 2 3 \dfrac{\partial C}{\partial in_{2}^{3}} ∂in23∂C的值。
∂ C ∂ i n 1 2 = ∂ C ∂ o u t 1 3 ∂ o u t 1 2 ∂ i n 1 2 = ∂ C ∂ i n 1 3 ∂ i n 1 3 ∂ o u t 1 2 ∂ o u t 1 2 ∂ i n 1 2 + ∂ C ∂ i n 2 3 ∂ i n 2 3 ∂ o u t 1 2 ∂ o u t 1 2 ∂ i n 1 2 其 中 ∂ C ∂ i n 1 3 和 ∂ C ∂ i n 2 3 已 知 \dfrac{\partial C}{\partial in_{1}^{2}}= \dfrac{\partial C}{\partial out_{1}^{3}} \dfrac{\partial out_{1}^{2}}{\partial in_{1}^{2}}= \dfrac{\partial C}{\partial in_{1}^{3}} \dfrac{\partial in_{1}^{3}}{\partial out_{1}^{2}} \dfrac{\partial out_{1}^{2}}{\partial in_{1}^{2}} + \dfrac{\partial C}{\partial in_{2}^{3}} \dfrac{\partial in_{2}^{3}}{\partial out_{1}^{2}} \dfrac{\partial out_{1}^{2}}{\partial in_{1}^{2}} \\ 其中\dfrac{\partial C}{\partial in_{1}^{3}}和\dfrac{\partial C}{\partial in_{2}^{3}}已知 ∂in12∂C=∂out13∂C∂in12∂out12=∂in13∂C∂out12∂in13∂in12∂out12+∂in23∂C∂out12∂in23∂in12∂out12其中∂in13∂C和∂in23∂C已知