BP算法实例及代码实现

1. 计算图及参数设置

1.1 计算图

image-20210413171147408

2022.10.17日更正:
参考其他网站,图中 W 12 W_{12} W12 W 21 W_{21} W21应当位置互换,下角标的后位 n n n统一标识从 X n X_{n} Xn节点的输出。这样才能再后面矩阵 W 0 W_0 W0运算的时候保持统一。
感谢评论区的盆友指出问题!

  • 偏置项简化省略
  • h输入为全连接,输出经过sigmoid层,得到中间的logits记为 Z Z Z,得到 Y ^ \hat{Y} Y^值再sigmoid层激活一下。
  • Loss function 选取MSE
  • σ ( ) \sigma() σ()函数的导数 σ ′ ( x ) = σ ( x ) ( 1 − σ ( x ) ) \sigma^{\prime}(x)=\sigma(x)(1-\sigma(x)) σ(x)=σ(x)(1σ(x))

1.2 参数设置

X = [ 0.35 , 0.9 ] T X=[0.35,0.9]^T X=[0.35,0.9]T

y t r u e = 0.5 y_{true}=0.5 ytrue=0.5

W 0 = [ w 11 w 12 w 21 w 22 ] = [ 0.1 0.8 0.4 0.6 ] W 0=\left[\begin{array}{ll}w_{11} & w_{12} \\ w_{21} & w_{22}\end{array}\right]=\left[\begin{array}{ll}0.1 & 0.8 \\ 0.4 & 0.6\end{array}\right] W0=[w11w21w12w22]=[0.10.40.80.6]

W 1 = [ w h 1 , w h 2 ] = [ 0.3 , 0.9 ] W_1 = [w_{h1},w_{h2}]=[0.3,0.9] W1=[wh1,wh2]=[0.3,0.9]

步长 α = 0.01 \alpha=0.01 α=0.01

2. 正向传播过程

  1. h = W 0 ⋅ X = [ 0.755 , 0.68 ] T h = W_0\cdot X=[0.755,0.68]^T h=W0X=[0.755,0.68]T
  2. Z = σ ( h ) = [ 0.680 , 0.664 ] T Z = \sigma(h)=[0.680,0.664]^T Z=σ(h)=[0.680,0.664]T
  3. Z y = W h ⋅ Z = [ 0.8014 ] Z_y=W_h\cdot Z=[0.8014] Zy=WhZ=[0.8014]
  4. Y ^ = σ ( Z y ) = 0.6903 \hat{Y}=\sigma(Z_y)=0.6903 Y^=σ(Zy)=0.6903
  5. L = 1 2 ( Y t r u e − Y ^ ) 2 = 1 2 ( 0.5 − 0.6903 ) 2 = 0.0181 L = \frac{1}{2}(Y_{true}-\hat{Y})^2 =\frac{1}{2}(0.5-0.6903)^2 =0.0181 L=21(YtrueY^)2=21(0.50.6903)2=0.0181

神经网络训练的核心思路:

根据Loss → \rightarrow 调节参数 W 0 , W 1 W_0,W_1 W0,W1 → \rightarrow 结合GD等方法沿着梯度下降方法

3. BP过程

第一步预想求得 ∂ L ∂ W h 1 \frac{\partial L}{\partial W_{h1}} Wh1L,根据路径回溯可以看到:

{ L = 1 2 ( y − y ^ ) 2 y ^ = σ ( Z y ) Z y = W h 1 h 1 + W h 2 h 2 \left\{\begin{array}{c}L=\frac{1}{2}\left(y-\hat{y}\right)^2 \\ \hat{y}=\sigma(Z_y) \\ Z_y=W_{h1}h_1+W_{h2} h_2\end{array}\right. L=21(yy^)2y^=σ(Zy)Zy=Wh1h1+Wh2h2

所以根据链式法则可以得到:
∂ L ∂ w h 1 = ∂ L ∂ y ^ ∗ ∂ y ^ ∂ Z y ∗ ∂ Z y ∂ W h 1 = 1 2 ( Y t r u e − Y ^ ) 2 ∗ σ ( Z y ) ∗ ( 1 − σ ( Z y ) ) ∗ h 1 = ( 0.6903 − 0.5 ) ∗ ( 0.69 ) ∗ ( 1 − 0.69 ) ∗ 0.68 = 0.02768 \begin{array}{l} \frac{\partial L}{\partial w_{h1}}=\frac{\partial L}{\partial \hat{y} } * \frac{\partial \hat{y} }{\partial Z_y } * \frac{\partial Z_y }{\partial W_{h1} } \\ =\frac{1}{2}(Y_{true}-\hat{Y})^2 * \sigma(Z_y) *(1-\sigma(Z_y)) * h_1 \\ =(0.6903-0.5) *(0.69) *(1-0.69) * 0.68 \\ =0.02768 \end{array} wh1L=y^LZyy^Wh1Zy=21(YtrueY^)2σ(Zy)(1σ(Zy))h1=(0.69030.5)(0.69)(10.69)0.68=0.02768
第二步骤继续回溯,要想求得 ∂ L ∂ W 11 \frac{\partial L}{\partial W_{11}} W11L,根据路径回溯可以看到:

{ ⋯ Z 1 = σ ( h 1 ) h 1 = W 11 X 1 + W 21 X 2 \left\{\begin{array}{c}\cdots \\ Z_1=\sigma(h_1) \\ h_1=W_{11}X_1+W_{21} X_2\end{array}\right. Z1=σ(h1)h1=W11X1+W21X2

所以根据链式法则可以得到:
∂ L ∂ w 11 = ∂ L ∂ y ^ ∗ ∂ y ^ ∂ Z y ∗ ∂ Z y ∂ Z 1 ∗ ∂ Z 1 ∂ h 1 ∗ ∂ h 1 ∂ W 11 = 1 2 ( Y t r u e − Y ^ ) 2 ∗ σ ( Z y ) ∗ ( 1 − σ ( Z y ) ) ∗ W h 1 ∗ σ ( Z 1 ) ∗ ( 1 − σ ( Z 1 ) ) ∗ X 1 = ( 0.6903 − 0.5 ) ∗ ( 0.69 ) ∗ ( 1 − 0.69 ) ∗ 0.3 ∗ 0.68 ∗ ( 1 − 0.68 ) ∗ 0.35 = 0.00093 \begin{array}{l} \frac{\partial L}{\partial w_{11}}=\frac{\partial L}{\partial \hat{y} } * \frac{\partial \hat{y} }{\partial Z_y } * \frac{\partial Z_y }{\partial Z_1 }*\frac{\partial Z_1 }{\partial h_1 } *\frac{\partial h_1 }{\partial W_{11} } \\ =\frac{1}{2}(Y_{true}-\hat{Y})^2 * \sigma(Z_y) *(1-\sigma(Z_y)) * W_{h1}* \sigma(Z_1) *(1-\sigma(Z_1))*X_1 \\ =(0.6903-0.5) *(0.69) *(1-0.69) * 0.3 *0.68*(1-0.68)*0.35\\ =0.00093 \end{array} w11L=y^LZyy^Z1Zyh1Z1W11h1=21(YtrueY^)2σ(Zy)(1σ(Zy))Wh1σ(Z1)(1σ(Z1))X1=(0.69030.5)(0.69)(10.69)0.30.68(10.68)0.35=0.00093

同理可得

∇ L W 0 = [ ∂ L ∂ W 11 ∂ L ∂ W 12 ∂ L ∂ W 21 ∂ L ∂ W 22 ] = [ 0.00093 0.002861 0.002392 0.00736 ] \nabla L_{W0}=\left[\begin{array}{ll}\frac{\partial L}{\partial W_{11}} & \frac{\partial L}{\partial W_{12}} \\ \frac{\partial L}{\partial W_{21}} & \frac{\partial L}{\partial W_{22}}\end{array}\right]=\left[\begin{array}{ll}0.00093 & 0.002861 \\ 0.002392 & 0.00736\end{array}\right] LW0=[W11LW21LW12LW22L]=[0.000930.0023920.0028610.00736]

∇ L W 1 = [ ∂ L ∂ W h 1 , ∂ L ∂ W h 2 ] = [ 0.02768 , 0.02703 ] T \nabla L_{W1}=[\frac{\partial L}{\partial W_{h1}} , \frac{\partial L}{\partial W_{h2}}]=[0.02768,0.02703]^T LW1=[Wh1L,Wh2L]=[0.02768,0.02703]T

所以,结合梯度下降法 e . g . W 11 ′ = W 11 − α ⋅ ∂ L ∂ W 11 = 0.99061 e.g. W_{11}^{\prime}=W_{11}-\alpha\cdot \frac{\partial L}{\partial W_{11}} = 0.99061 e.g.W11=W11αW11L=0.99061得到更新后的权重矩阵为

W 0 ′ = W 0 − α ∇ L W 0 = [ 0.99061 0.79997 0.399976 0.599926 ] W_0^{\prime}=W_0-\alpha\nabla L_{W0}=\left[\begin{array}{ll}0.99061 & 0.79997 \\ 0.399976 & 0.599926\end{array}\right] W0=W0αLW0=[0.990610.3999760.799970.599926]

W 1 ′ = W 1 − α ∇ L W 1 = [ 0.299972 , 0.899973 ] T W_1^{\prime}=W_1-\alpha\nabla L_{W1}=[0.299972,0.899973]^T W1=W1αLW1=[0.299972,0.899973]T

4. 用Numpy手撸Code

import numpy as np
import matplotlib.pylab as plt

def sigmoid_derive(x,derive=False):
	## 用于计算sigmoid函数的值或其求导函数的值
	if derive == True:
	    return x*(1-x)
	else:
	    return 1/(1+np.exp(-x))

X = np.array([[0.35],[0.9]]) #输入层
y = np.array([[0.5]]) #groundtruth
epochs = 200

np.random.seed(1)

W0 = np.array([[0.1,0.8],[0.4,0.6]])
W1 = np.array([[0.3,0.9]])

print("original:\n","W0:\n",W0,"\n W1:\n",W1)
loss = []
for epoch in range(epochs):
	print("In the process of %sth epoch"%(epoch+1))
	l0=X
	l1 = sigmoid_derive(np.dot(W0,l0)) # 计算隐藏层的输出
	l2 = sigmoid_derive(np.dot(W1,l1)) # 计算输出节点的logits
	l2_error = y-l2 #计算MSE Loss
	Loss = 0.5*np.abs(l2_error)**0.5
	loss.append(Loss[0][0])
	print("The Current Loss:", 0.5*np.abs(l2_error)**0.5)
	l2_delta = l2_error*sigmoid_derive(l2,derive=True) # 用于BackProp
	l1_error = l2_delta*W1
	l1_delta = l1_error*sigmoid_derive(l1,derive=True)
	W1 += l2_delta*l1.T
	W0 += l0.T.dot(l1_delta)
	print("After  BackProp:\n","W0:\n",W0,"\n W1:\n",W1)
	print('=========================================')

plt.plot(loss)
plt.xlabel('Epochs') 
plt.ylabel('Loss') 
plt.title('Loss Decreaing')
plt.show()

输出结果:

In the process of 1th epoch
The Current Loss: [[0.21810748]]
After  BackProp:
 W0:
 [[0.09661944 0.78985831]
 [0.39661944 0.58985831]]
 W1:
 [[0.27232597 0.87299836]]
=========================================
In the process of 2th epoch
The Current Loss: [[0.21319219]]
After  BackProp:
 W0:
 [[0.09363393 0.78028763]
 [0.39363393 0.58028763]]
 W1:
 [[0.2455836  0.84691021]]
=========================================
In the process of 3th epoch
The Current Loss: [[0.20830283]]
After  BackProp:
 W0:
 [[0.09102066 0.7712756 ]
 [0.39102066 0.5712756 ]]
 W1:
 [[0.21978966 0.82175133]]

训练Loss降低示意图:

Loss

  • 5
    点赞
  • 34
    收藏
    觉得还不错? 一键收藏
  • 11
    评论
评论 11
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值