机器学习之反向传播BP算法

导语

这是我写的第一篇关于机器学习的文章,以后还会有更多有关机器学习以及深度学习的总结,敬请期待。

给定一个方程 f ( x ) = 3 x 2 + 4 x + 5 f(x) = 3x^2+4x+5 f(x)=3x2+4x+5,未知参数为 x x x,求 x x x等于多少时, f ( x ) f(x) f(x)有最小值,我们首先想到的是求倒数,令其等于 0 0 0
f ′ ( x ) = 6 x + 4 = 0 , x = − 2 3 f^\prime(x)=6x+4=0, x=-\frac{2}{3} f(x)=6x+4=0,x=32
即当 x = − 2 3 x=-\frac{2}{3} x=32 f ( x ) f(x) f(x)有最小值,为 f ( − 2 3 ) = 3.6 f(-\frac{2}{3})=3.6 f(32)=3.6

其次我们还可以用梯度下降法,赋给 x x x一个初始值(例如 x = 10 x=10 x=10),又已知 f ′ ( x ) = 6 x + 4 f^\prime(x)=6x+4 f(x)=6x+4,梯度下降法的本质就是用 x x x的原坐标减去 x x x在某一点的斜率,使 f ( x ) f(x) f(x)往最小值方向走,从而得到最小值。设学习率为 η = 0.1 \eta=0.1 η=0.1,即 x x x每走一步的幅度大小,例如
f ′ ( 10 ) = 6 × 10 + 4 = 64 , x ′ = x − η f ′ ( 10 ) = 10 − 0.1 × 64 = 3.6 , f^\prime(10)=6\times10+4=64,\\ x^\prime=x-\eta f^\prime(10)=10-0.1\times64=3.6, f(10)=6×10+4=64,x=xηf(10)=100.1×64=3.6,
依次类推,直到 x x x逼近 − 2 3 -\frac{2}{3} 32

了解了梯度下降法的定义后,我们利用它来解决神经网络中的反向传播为题,需要定义一个损失函数cost_function,即上面的f(x)
神经网络结构如下图所示:2层神经网络结构
其中包括1层输入层(输入量为 x 1 x_1 x1 x 2 x_2 x2),1层中间层(输入为 i n 2 in^2 in2,输出为 o u t 2 out^2 out2),1层输出层(输入为 i n 3 in^3 in3,输出为 o u t 3 out^3 out3),我们用输出层的输出,即 o u t 3 out^3 out3与标签 y ^ \hat{y} y^构造损失函数cost_function,里面的参数为 ω \omega ω b b b,通过反向传播算法,得到当 ω \omega ω b b b取何值时cost_function有最小值。

明确了目标之后,再来看神经网络涉及到的所有参数:
神经网络参数

前向传播过程

[ X 1 X 2 ] ⋅ [ W 11 2 W 12 2 W 13 2 W 21 2 W 22 2 W 23 2 ] + [ b 1 2 b 2 2 b 3 2 ] → [ i n 1 2 i n 2 2 i n 3 2 ] ⟶ s i g m o i d [ o u t 1 2 o u t 2 2 o u t 3 2 ] \begin{bmatrix} X_1 & X_2 \end{bmatrix} \cdot \begin{bmatrix} W_{11}^{2} & W_{12}^{2} & W_{13}^{2} \\ W_{21}^{2} & W_{22}^{2} & W_{23}^{2} \end{bmatrix} + \begin{bmatrix} b_{1}^{2} & b_{2}^{2} & b_{3}^{2} \end{bmatrix} \rightarrow \begin{bmatrix} in_{1}^{2} & in_{2}^{2} & in_{3}^{2} \end{bmatrix} \mathop{\longrightarrow}\limits^{sigmoid} \begin{bmatrix} out_{1}^{2} & out_{2}^{2} & out_{3}^{2} \end{bmatrix} [X1X2][W112W212W122W222W132W232]+[b12b22b32][in12in22in32]sigmoid[out12out22out32]

[ o u t 1 2 o u t 2 2 o u t 3 2 ] ⋅ [ W 11 2 W 12 2 W 21 2 W 22 2 W 31 2 W 32 2 ] + [ b 1 3 b 2 3 ] → [ i n 1 3 i n 2 3 ] ⟶ s i g m o i d [ o u t 1 3 o u t 2 3 ] \begin{bmatrix} out_{1}^{2} & out_{2}^{2} & out_{3}^{2} \end{bmatrix} \cdot \begin{bmatrix} W_{11}^{2} & W_{12}^{2} \\ W_{21}^{2} & W_{22}^{2} \\ W_{31}^{2} & W_{32}^{2} \end{bmatrix} + \begin{bmatrix} b_{1}^{3} & b_{2}^{3} \end{bmatrix} \rightarrow \begin{bmatrix} in_{1}^{3} & in_{2}^{3} \end{bmatrix} \mathop{\longrightarrow}\limits^{sigmoid} \begin{bmatrix} out_{1}^{3} & out_{2}^{3} \end{bmatrix} [out12out22out32]W112W212W312W122W222W322+[b13b23][in13in23]sigmoid[out13out23]

分别计算 i n 1 2 in_{1}^{2} in12 i n 2 2 in_{2}^{2} in22 i n 3 2 in_{3}^{2} in32以及 i n 1 3 in_{1}^{3} in13 i n 2 3 in_{2}^{3} in23cost_function

i n 1 2 = W 11 2 ⋅ X 1 + W 21 2 X 2 + b 1 2 , o u t 1 2 = s i g m o i d ( i n 1 2 ) i n 2 2 = W 12 2 ⋅ X 1 + W 22 2 X 2 + b 2 2 , o u t 2 2 = s i g m o i d ( i n 2 2 ) i n 3 2 = W 13 2 ⋅ X 1 + W 23 2 X 2 + b 3 2 , o u t 3 2 = s i g m o i d ( i n 3 2 ) i n 1 3 = W 11 3 ⋅ o u t 1 2 + W 21 3 o u t 2 2 + W 31 3 o u t 3 2 + b 1 3 , o u t 1 3 = s i g m o i d ( i n 1 2 ) i n 2 3 = W 12 3 ⋅ o u t 1 2 + W 22 3 o u t 2 2 + W 32 3 o u t 3 2 + b 2 3 , o u t 2 3 = s i g m o i d ( i n 2 3 ) c o s t _ f u n c t i o n = 1 2 [ ( o u t 1 3 − y 1 ) 2 + ( o u t 2 3 − y 2 ) 2 ] in_{1}^{2}=W_{11}^{2} \cdot X_{1}+W_{21}^{2}X_{2} + b_{1}^{2}, out_{1}^{2}=sigmoid(in_{1}^{2}) \\ in_{2}^{2}=W_{12}^{2} \cdot X_{1}+W_{22}^{2}X_{2} + b_{2}^{2}, out_{2}^{2}=sigmoid(in_{2}^{2}) \\ in_{3}^{2}=W_{13}^{2} \cdot X_{1}+W_{23}^{2}X_{2} + b_{3}^{2}, out_{3}^{2}=sigmoid(in_{3}^{2}) \\ in_{1}^{3}=W_{11}^{3} \cdot out_{1}^{2} + W_{21}^{3} out_{2}^{2} + W_{31}^{3} out_{3}^{2} + b_{1}^{3}, out_{1}^{3}=sigmoid(in_{1}^{2}) \\ in_{2}^{3}=W_{12}^{3} \cdot out_{1}^{2}+W_{22}^{3} out_{2}^{2} + W_{32}^{3} out_{3}^{2} + b_{2}^{3}, out_{2}^{3}=sigmoid(in_{2}^{3}) \\ cost\_function = \frac{1}{2}[(out_{1}^{3}-y_{1})^{2}+(out_{2}^{3}-y_{2})^{2}] in12=W112X1+W212X2+b12,out12=sigmoid(in12)in22=W122X1+W222X2+b22,out22=sigmoid(in22)in32=W132X1+W232X2+b32,out32=sigmoid(in32)in13=W113out12+W213out22+W313out32+b13,out13=sigmoid(in12)in23=W123out12+W223out22+W323out32+b23,out23=sigmoid(in23)cost_function=21[(out13y1)2+(out23y2)2]

对应代码

# training samples 2 inputs and 2 outputs
X = np.random.rand(m, 2)
Y = np.random.rand(m, 2)

#layer 2
W2 = np.ones((2, 3))
b2 = np.ones((1, 3))
in2 = np.dot(X, W2) + b2
out2 = sigmoid(in2)

#layer 3
W3 = np.ones((3, 2))
b3 = np.ones((1, 2))
in3 = np.dot(out2, W3) + b3
out3 = sigmoid(in3)

#initial cost
cost = cost_function(out3, Y) 
print("start:", cost)

反向传播过程

反向传播主要是求cost_function对于各个 ω \omega ω b b b的偏导,要得到它们之前,需要求 ∂ C ∂ i n 1 3 \dfrac{\partial C}{\partial in_{1}^{3}} in13C ∂ C ∂ i n 2 3 \dfrac{\partial C}{\partial in_{2}^{3}} in23C以及 ∂ C ∂ i n 1 2 \dfrac{\partial C}{\partial in_{1}^{2}} in12C ∂ C ∂ i n 2 2 \dfrac{\partial C}{\partial in_{2}^{2}} in22C ∂ C ∂ i n 3 2 \dfrac{\partial C}{\partial in_{3}^{2}} in32C,得到了这些值,就可以求损失函数对于任意 ω \omega ω b b b的偏导了。

∂ C ∂ i n 1 3 = ∂ C ∂ o u t 1 3 ∂ o u t 1 3 ∂ i n 1 3 = ( o u t 1 3 − y 1 ) e − i n 1 3 ( 1 + e − i n 1 3 ) 2 = ( o u t 1 3 − y 1 ) 1 1 + e − i n 1 3 ( 1 − 1 1 + e − i n 1 3 ) \dfrac{\partial C}{\partial in_{1}^{3}}=\dfrac{\partial C}{\partial out_{1}^{3}} \dfrac{\partial out_{1}^{3}}{\partial in_{1}^{3}}=(out_{1}^{3}-y_{1})\frac{e^{-in_{1}^{3}}}{(1+e^{-in_{1}^{3}})^{2}} \\ =(out_{1}^{3}-y_{1})\frac{1}{1+e^{-in_{1}^{3}}}(1-\frac{1}{1+e^{-in_{1}^{3}}}) in13C=out13Cin13out13=(out13y1)(1+ein13)2ein13=(out13y1)1+ein131(11+ein131)

同理可得 ∂ C ∂ i n 2 3 \dfrac{\partial C}{\partial in_{2}^{3}} in23C的值。

∂ C ∂ i n 1 2 = ∂ C ∂ o u t 1 3 ∂ o u t 1 2 ∂ i n 1 2 = ∂ C ∂ i n 1 3 ∂ i n 1 3 ∂ o u t 1 2 ∂ o u t 1 2 ∂ i n 1 2 + ∂ C ∂ i n 2 3 ∂ i n 2 3 ∂ o u t 1 2 ∂ o u t 1 2 ∂ i n 1 2 其 中 ∂ C ∂ i n 1 3 和 ∂ C ∂ i n 2 3 已 知 \dfrac{\partial C}{\partial in_{1}^{2}}= \dfrac{\partial C}{\partial out_{1}^{3}} \dfrac{\partial out_{1}^{2}}{\partial in_{1}^{2}}= \dfrac{\partial C}{\partial in_{1}^{3}} \dfrac{\partial in_{1}^{3}}{\partial out_{1}^{2}} \dfrac{\partial out_{1}^{2}}{\partial in_{1}^{2}} + \dfrac{\partial C}{\partial in_{2}^{3}} \dfrac{\partial in_{2}^{3}}{\partial out_{1}^{2}} \dfrac{\partial out_{1}^{2}}{\partial in_{1}^{2}} \\ 其中\dfrac{\partial C}{\partial in_{1}^{3}}和\dfrac{\partial C}{\partial in_{2}^{3}}已知 in12C=out13Cin12out12=in13Cout12in13in12out12+in23Cout12in23in12out12in13Cin23C

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值