神经网络的前向传播和反向传播推导
x
1
x_{1}
x1和
x
2
x_{2}
x2表示输入
w
i
j
w_{ij}
wij表示权重
b
i
j
b_{ij}
bij表示偏置
σ
i
\sigma_{i}
σi表示激活函数,这里使用sigmoid激活函数
o
u
t
out
out表示输出
y
y
y表示真实值
η
\eta
η表示学习率
前向传播
h
1
=
w
11
x
1
+
w
13
x
2
+
b
11
h_{1}=w_{11}x_{1}+w_{13}x_{2}+b_{11}
h1=w11x1+w13x2+b11,
α
1
=
σ
(
h
1
)
=
1
1
+
e
−
h
1
\alpha_{1}=\sigma(h1)=\frac{1}{1+e^{-h1}}
α1=σ(h1)=1+e−h11
h 2 = w 12 x 1 + w 14 x 2 + b 12 h_{2}=w_{12}x_{1}+w_{14}x_{2}+b_{12} h2=w12x1+w14x2+b12, α 2 = σ ( h 2 ) = 1 1 + e − h 2 \alpha_{2}=\sigma(h2)=\frac{1}{1+e^{-h2}} α2=σ(h2)=1+e−h21
z = w 21 α 1 + w 22 α 2 + b 21 z=w_{21}\alpha_{1}+w_{22}\alpha_{2}+b_{21} z=w21α1+w22α2+b21, o u t = σ ( z ) = 1 1 + e − z out=\sigma(z)=\frac{1}{1+e^{-z}} out=σ(z)=1+e−z1
损失函数
E = 1 2 ( o u t − y ) 2 E=\frac{1}{2}(out-y)^2 E=21(out−y)2
反向传播
求导
△
w
21
=
∂
E
∂
w
21
=
∂
E
∂
o
u
t
∂
o
u
t
∂
z
∂
z
∂
w
21
=
(
o
u
t
−
y
)
σ
(
z
)
(
1
−
σ
(
z
)
)
α
1
\bigtriangleup w_{21}=\frac{\partial E}{\partial w_{21}}=\frac{\partial E}{\partial out}\frac{{\partial out}}{\partial z}\frac{\partial z}{\partial w_{21}}=(out-y)\sigma(z)(1-\sigma(z))\alpha_{1}
△w21=∂w21∂E=∂out∂E∂z∂out∂w21∂z=(out−y)σ(z)(1−σ(z))α1
△ w 22 = ∂ E ∂ w 22 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ w 22 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) α 2 \bigtriangleup w_{22}=\frac{\partial E}{\partial w_{22}}=\frac{\partial E}{\partial out}\frac{{\partial out}}{\partial z}\frac{\partial z}{\partial w_{22}}=(out-y)\sigma(z)(1-\sigma(z))\alpha_{2} △w22=∂w22∂E=∂out∂E∂z∂out∂w22∂z=(out−y)σ(z)(1−σ(z))α2
△ b 21 = ∂ E ∂ b 21 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ b 21 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) \bigtriangleup b_{21}=\frac{\partial E}{\partial b_{21}}=\frac{\partial E}{\partial out}\frac{{\partial out}}{\partial z}\frac{\partial z}{\partial b_{21}}=(out-y)\sigma(z)(1-\sigma(z)) △b21=∂b21∂E=∂out∂E∂z∂out∂b21∂z=(out−y)σ(z)(1−σ(z))
更新 w 21 、 w 22 、 b 21 w_{21}、w_{22}、b_{21} w21、w22、b21
w 21 = w 21 − η △ w 21 w_{21}=w_{21}-\eta \bigtriangleup w_{21} w21=w21−η△w21
w 22 = w 22 − η △ w 22 w_{22}=w_{22}-\eta \bigtriangleup w_{22} w22=w22−η△w22
b 21 = b 21 − η △ b 21 b_{21}=b_{21}-\eta \bigtriangleup b_{21} b21=b21−η△b21
求导
△ w 12 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 2 ∂ α 2 ∂ h 2 ∂ α 2 ∂ h 2 ∂ h 2 ∂ w 12 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 22 σ ( h 2 ) ( 1 − σ ( h 2 ) ) x 1 \bigtriangleup w_{12}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{2}}\frac{\partial \alpha_{2}}{\partial h2}\frac{\partial \alpha_{2}}{\partial h_{2}}\frac{{\partial h_{2}}}{\partial w_{12}} =(out-y)\sigma(z)(1-\sigma(z))w_{22}\sigma(h_{2})(1-\sigma(h_{2}))x_{1} △w12=∂out∂E∂z∂out∂α2∂z∂h2∂α2∂h2∂α2∂w12∂h2=(out−y)σ(z)(1−σ(z))w22σ(h2)(1−σ(h2))x1
△ w 14 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 2 ∂ α 2 ∂ h 2 ∂ α 2 ∂ h 2 ∂ h 2 ∂ w 14 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 22 σ ( h 2 ) ( 1 − σ ( h 2 ) ) x 2 \bigtriangleup w_{14}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{2}}\frac{\partial \alpha_{2}}{\partial h2}\frac{\partial \alpha_{2}}{\partial h_{2}}\frac{{\partial h_{2}}}{\partial w_{14}} =(out-y)\sigma(z)(1-\sigma(z))w_{22}\sigma(h_{2})(1-\sigma(h_{2}))x_{2} △w14=∂out∂E∂z∂out∂α2∂z∂h2∂α2∂h2∂α2∂w14∂h2=(out−y)σ(z)(1−σ(z))w22σ(h2)(1−σ(h2))x2
△ b 12 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 2 ∂ α 2 ∂ h 2 ∂ α 2 ∂ h 2 ∂ h 2 ∂ b 12 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 22 σ ( h 2 ) ( 1 − σ ( h 2 ) ) \bigtriangleup b_{12}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{2}}\frac{\partial \alpha_{2}}{\partial h2}\frac{\partial \alpha_{2}}{\partial h_{2}}\frac{{\partial h_{2}}}{\partial b_{12}} =(out-y)\sigma(z)(1-\sigma(z))w_{22}\sigma(h_{2})(1-\sigma(h_{2})) △b12=∂out∂E∂z∂out∂α2∂z∂h2∂α2∂h2∂α2∂b12∂h2=(out−y)σ(z)(1−σ(z))w22σ(h2)(1−σ(h2))
△ w 11 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 1 ∂ α 1 ∂ h 1 ∂ α 1 ∂ h 1 ∂ h 1 ∂ w 11 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 21 σ ( h 1 ) ( 1 − σ ( h 1 ) ) x 1 \bigtriangleup w_{11}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{1}}\frac{\partial \alpha_{1}}{\partial h1}\frac{\partial \alpha_{1}}{\partial h_{1}}\frac{{\partial h_{1}}}{\partial w_{11}}=(out-y)\sigma(z)(1-\sigma(z))w_{21}\sigma(h_{1})(1-\sigma(h_{1}))x_{1} △w11=∂out∂E∂z∂out∂α1∂z∂h1∂α1∂h1∂α1∂w11∂h1=(out−y)σ(z)(1−σ(z))w21σ(h1)(1−σ(h1))x1
△ w 13 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 1 ∂ α 1 ∂ h 1 ∂ α 1 ∂ h 1 ∂ h 1 ∂ w 13 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 21 σ ( h 1 ) ( 1 − σ ( h 1 ) ) x 2 \bigtriangleup w_{13}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{1}}\frac{\partial \alpha_{1}}{\partial h1}\frac{\partial \alpha_{1}}{\partial h_{1}}\frac{{\partial h_{1}}}{\partial w_{13}}=(out-y)\sigma(z)(1-\sigma(z))w_{21}\sigma(h_{1})(1-\sigma(h_{1}))x_{2} △w13=∂out∂E∂z∂out∂α1∂z∂h1∂α1∂h1∂α1∂w13∂h1=(out−y)σ(z)(1−σ(z))w21σ(h1)(1−σ(h1))x2
△ b 11 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 1 ∂ α 1 ∂ h 1 ∂ α 1 ∂ h 1 ∂ h 1 ∂ b 11 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 21 σ ( h 1 ) ( 1 − σ ( h 1 ) ) \bigtriangleup b_{11}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{1}}\frac{\partial \alpha_{1}}{\partial h1}\frac{\partial \alpha_{1}}{\partial h_{1}}\frac{{\partial h_{1}}}{\partial b_{11}}=(out-y)\sigma(z)(1-\sigma(z))w_{21}\sigma(h_{1})(1-\sigma(h_{1})) △b11=∂out∂E∂z∂out∂α1∂z∂h1∂α1∂h1∂α1∂b11∂h1=(out−y)σ(z)(1−σ(z))w21σ(h1)(1−σ(h1))
更新 w 12 、 w 14 、 b 12 w_{12}、w_{14}、b_{12} w12、w14、b12
w 12 = w 12 − η △ w 12 w_{12}=w_{12}-\eta \bigtriangleup w_{12} w12=w12−η△w12
w 14 = w 14 − η △ w 14 w_{14}=w_{14}-\eta \bigtriangleup w_{14} w14=w14−η△w14
b 12 = b 12 − η △ b 12 b_{12}=b_{12}-\eta \bigtriangleup b_{12} b12=b12−η△b12
更新 w 11 、 w 13 、 b 11 w_{11}、w_{13}、b_{11} w11、w13、b11
w 11 = w 11 − η △ w 11 w_{11}=w_{11}-\eta \bigtriangleup w_{11} w11=w11−η△w11
w 13 = w 13 − η △ w 13 w_{13}=w_{13}-\eta \bigtriangleup w_{13} w13=w13−η△w13
b 11 = b 11 − η △ b 11 b_{11}=b_{11}-\eta \bigtriangleup b_{11} b11=b11−η△b11
import matplotlib.pyplot as plt
import numpy as np
# 定义参数
# N:样本数量
# D_in:数据维度、输入维度
# H:隐藏层神经元个数
# D_out:输出维度
N, D_in, H, D_out = 64, 1000, 100, 10
# 生成数据
x = np.random.randn(D_in, N)
y = np.random.randn(D_out, N)
# 初始化参数
w1 = np.random.randn(D_in, H)
b1 = np.zeros((H, N))
w2 = np.random.randn(H, D_out)
b2 = np.zeros((D_out, N))
# 学习率
learning_rate = 1e-6
loss_list = []
# 最大跌打次数
iter = 500
for i in range(iter):
# 前向传播
h = np.matmul(w1.T, x)+b1 # (100, 64)
a = np.maximum(h, 0) # (100, 64) relu激活函数
y_pred = np.matmul(w2.T, a)+b2 # (10, 64)
# 损失函数
loss = np.square(y_pred-y).sum()
loss_list.append(loss)
# 反向传播
grad_y_pred = 2*(y_pred-y) # (10, 64)
grad_w2 = np.matmul(a, grad_y_pred.T) # (100, 10)
grad_b2 = grad_y_pred # (10, 64)
grad_a = np.matmul(w2, grad_y_pred) # (100, 64)
grad_a[a<0] = 0
grad_w1 = np.matmul(x, grad_a.T) # (1000, 100)
grad_b1 = grad_a # (100, 64)
# 更新参数
w1 -= learning_rate*grad_w1
b1 -= learning_rate*grad_b1
w2 -= learning_rate*grad_w2
b2 -= learning_rate*grad_b2
plt.plot(range(iter), loss_list)
plt.ylabel('loss')
plt.xlabel('iter')
plt.show()