神经网络的前向传播和反向传播推导

神经网络的前向传播和反向传播推导

在这里插入图片描述
x 1 x_{1} x1 x 2 x_{2} x2表示输入
w i j w_{ij} wij表示权重
b i j b_{ij} bij表示偏置
σ i \sigma_{i} σi表示激活函数,这里使用sigmoid激活函数
o u t out out表示输出
y y y表示真实值
η \eta η表示学习率

前向传播
h 1 = w 11 x 1 + w 13 x 2 + b 11 h_{1}=w_{11}x_{1}+w_{13}x_{2}+b_{11} h1=w11x1+w13x2+b11 α 1 = σ ( h 1 ) = 1 1 + e − h 1 \alpha_{1}=\sigma(h1)=\frac{1}{1+e^{-h1}} α1=σ(h1)=1+eh11

h 2 = w 12 x 1 + w 14 x 2 + b 12 h_{2}=w_{12}x_{1}+w_{14}x_{2}+b_{12} h2=w12x1+w14x2+b12 α 2 = σ ( h 2 ) = 1 1 + e − h 2 \alpha_{2}=\sigma(h2)=\frac{1}{1+e^{-h2}} α2=σ(h2)=1+eh21

z = w 21 α 1 + w 22 α 2 + b 21 z=w_{21}\alpha_{1}+w_{22}\alpha_{2}+b_{21} z=w21α1+w22α2+b21 o u t = σ ( z ) = 1 1 + e − z out=\sigma(z)=\frac{1}{1+e^{-z}} out=σ(z)=1+ez1

损失函数

E = 1 2 ( o u t − y ) 2 E=\frac{1}{2}(out-y)^2 E=21(outy)2

反向传播
求导
△ w 21 = ∂ E ∂ w 21 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ w 21 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) α 1 \bigtriangleup w_{21}=\frac{\partial E}{\partial w_{21}}=\frac{\partial E}{\partial out}\frac{{\partial out}}{\partial z}\frac{\partial z}{\partial w_{21}}=(out-y)\sigma(z)(1-\sigma(z))\alpha_{1} w21=w21E=outEzoutw21z=(outy)σ(z)(1σ(z))α1

△ w 22 = ∂ E ∂ w 22 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ w 22 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) α 2 \bigtriangleup w_{22}=\frac{\partial E}{\partial w_{22}}=\frac{\partial E}{\partial out}\frac{{\partial out}}{\partial z}\frac{\partial z}{\partial w_{22}}=(out-y)\sigma(z)(1-\sigma(z))\alpha_{2} w22=w22E=outEzoutw22z=(outy)σ(z)(1σ(z))α2

△ b 21 = ∂ E ∂ b 21 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ b 21 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) \bigtriangleup b_{21}=\frac{\partial E}{\partial b_{21}}=\frac{\partial E}{\partial out}\frac{{\partial out}}{\partial z}\frac{\partial z}{\partial b_{21}}=(out-y)\sigma(z)(1-\sigma(z)) b21=b21E=outEzoutb21z=(outy)σ(z)(1σ(z))

更新 w 21 、 w 22 、 b 21 w_{21}、w_{22}、b_{21} w21w22b21

w 21 = w 21 − η △ w 21 w_{21}=w_{21}-\eta \bigtriangleup w_{21} w21=w21ηw21

w 22 = w 22 − η △ w 22 w_{22}=w_{22}-\eta \bigtriangleup w_{22} w22=w22ηw22

b 21 = b 21 − η △ b 21 b_{21}=b_{21}-\eta \bigtriangleup b_{21} b21=b21ηb21

求导

△ w 12 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 2 ∂ α 2 ∂ h 2 ∂ α 2 ∂ h 2 ∂ h 2 ∂ w 12 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 22 σ ( h 2 ) ( 1 − σ ( h 2 ) ) x 1 \bigtriangleup w_{12}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{2}}\frac{\partial \alpha_{2}}{\partial h2}\frac{\partial \alpha_{2}}{\partial h_{2}}\frac{{\partial h_{2}}}{\partial w_{12}} =(out-y)\sigma(z)(1-\sigma(z))w_{22}\sigma(h_{2})(1-\sigma(h_{2}))x_{1} w12=outEzoutα2zh2α2h2α2w12h2=(outy)σ(z)(1σ(z))w22σ(h2)(1σ(h2))x1

△ w 14 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 2 ∂ α 2 ∂ h 2 ∂ α 2 ∂ h 2 ∂ h 2 ∂ w 14 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 22 σ ( h 2 ) ( 1 − σ ( h 2 ) ) x 2 \bigtriangleup w_{14}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{2}}\frac{\partial \alpha_{2}}{\partial h2}\frac{\partial \alpha_{2}}{\partial h_{2}}\frac{{\partial h_{2}}}{\partial w_{14}} =(out-y)\sigma(z)(1-\sigma(z))w_{22}\sigma(h_{2})(1-\sigma(h_{2}))x_{2} w14=outEzoutα2zh2α2h2α2w14h2=(outy)σ(z)(1σ(z))w22σ(h2)(1σ(h2))x2

△ b 12 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 2 ∂ α 2 ∂ h 2 ∂ α 2 ∂ h 2 ∂ h 2 ∂ b 12 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 22 σ ( h 2 ) ( 1 − σ ( h 2 ) ) \bigtriangleup b_{12}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{2}}\frac{\partial \alpha_{2}}{\partial h2}\frac{\partial \alpha_{2}}{\partial h_{2}}\frac{{\partial h_{2}}}{\partial b_{12}} =(out-y)\sigma(z)(1-\sigma(z))w_{22}\sigma(h_{2})(1-\sigma(h_{2})) b12=outEzoutα2zh2α2h2α2b12h2=(outy)σ(z)(1σ(z))w22σ(h2)(1σ(h2))

△ w 11 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 1 ∂ α 1 ∂ h 1 ∂ α 1 ∂ h 1 ∂ h 1 ∂ w 11 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 21 σ ( h 1 ) ( 1 − σ ( h 1 ) ) x 1 \bigtriangleup w_{11}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{1}}\frac{\partial \alpha_{1}}{\partial h1}\frac{\partial \alpha_{1}}{\partial h_{1}}\frac{{\partial h_{1}}}{\partial w_{11}}=(out-y)\sigma(z)(1-\sigma(z))w_{21}\sigma(h_{1})(1-\sigma(h_{1}))x_{1} w11=outEzoutα1zh1α1h1α1w11h1=(outy)σ(z)(1σ(z))w21σ(h1)(1σ(h1))x1

△ w 13 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 1 ∂ α 1 ∂ h 1 ∂ α 1 ∂ h 1 ∂ h 1 ∂ w 13 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 21 σ ( h 1 ) ( 1 − σ ( h 1 ) ) x 2 \bigtriangleup w_{13}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{1}}\frac{\partial \alpha_{1}}{\partial h1}\frac{\partial \alpha_{1}}{\partial h_{1}}\frac{{\partial h_{1}}}{\partial w_{13}}=(out-y)\sigma(z)(1-\sigma(z))w_{21}\sigma(h_{1})(1-\sigma(h_{1}))x_{2} w13=outEzoutα1zh1α1h1α1w13h1=(outy)σ(z)(1σ(z))w21σ(h1)(1σ(h1))x2

△ b 11 = ∂ E ∂ o u t ∂ o u t ∂ z ∂ z ∂ α 1 ∂ α 1 ∂ h 1 ∂ α 1 ∂ h 1 ∂ h 1 ∂ b 11 = ( o u t − y ) σ ( z ) ( 1 − σ ( z ) ) w 21 σ ( h 1 ) ( 1 − σ ( h 1 ) ) \bigtriangleup b_{11}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{1}}\frac{\partial \alpha_{1}}{\partial h1}\frac{\partial \alpha_{1}}{\partial h_{1}}\frac{{\partial h_{1}}}{\partial b_{11}}=(out-y)\sigma(z)(1-\sigma(z))w_{21}\sigma(h_{1})(1-\sigma(h_{1})) b11=outEzoutα1zh1α1h1α1b11h1=(outy)σ(z)(1σ(z))w21σ(h1)(1σ(h1))

更新 w 12 、 w 14 、 b 12 w_{12}、w_{14}、b_{12} w12w14b12

w 12 = w 12 − η △ w 12 w_{12}=w_{12}-\eta \bigtriangleup w_{12} w12=w12ηw12

w 14 = w 14 − η △ w 14 w_{14}=w_{14}-\eta \bigtriangleup w_{14} w14=w14ηw14

b 12 = b 12 − η △ b 12 b_{12}=b_{12}-\eta \bigtriangleup b_{12} b12=b12ηb12

更新 w 11 、 w 13 、 b 11 w_{11}、w_{13}、b_{11} w11w13b11

w 11 = w 11 − η △ w 11 w_{11}=w_{11}-\eta \bigtriangleup w_{11} w11=w11ηw11

w 13 = w 13 − η △ w 13 w_{13}=w_{13}-\eta \bigtriangleup w_{13} w13=w13ηw13

b 11 = b 11 − η △ b 11 b_{11}=b_{11}-\eta \bigtriangleup b_{11} b11=b11ηb11

import matplotlib.pyplot as plt
import numpy as np

# 定义参数
# N:样本数量
# D_in:数据维度、输入维度
# H:隐藏层神经元个数
# D_out:输出维度
N, D_in, H, D_out = 64, 1000, 100, 10

# 生成数据
x = np.random.randn(D_in, N)
y = np.random.randn(D_out, N)

# 初始化参数
w1 = np.random.randn(D_in, H)
b1 = np.zeros((H, N))
w2 = np.random.randn(H, D_out)
b2 = np.zeros((D_out, N))

# 学习率
learning_rate = 1e-6

loss_list = []

# 最大跌打次数
iter = 500

for i in range(iter):
    # 前向传播
    h = np.matmul(w1.T, x)+b1 # (100, 64)
    a = np.maximum(h, 0) # (100, 64) relu激活函数
    y_pred = np.matmul(w2.T, a)+b2 # (10, 64)
    
    # 损失函数
    loss = np.square(y_pred-y).sum()
    
    loss_list.append(loss)
    
    # 反向传播
    grad_y_pred = 2*(y_pred-y) # (10, 64)
    grad_w2 = np.matmul(a, grad_y_pred.T) # (100, 10)
    grad_b2 = grad_y_pred # (10, 64)
    grad_a = np.matmul(w2, grad_y_pred) # (100, 64)
    grad_a[a<0] = 0
    grad_w1 = np.matmul(x, grad_a.T) # (1000, 100)
    grad_b1 = grad_a # (100, 64)
    
    # 更新参数
    w1 -= learning_rate*grad_w1
    b1 -= learning_rate*grad_b1
    w2 -= learning_rate*grad_w2
    b2 -= learning_rate*grad_b2
    
plt.plot(range(iter), loss_list)
plt.ylabel('loss')
plt.xlabel('iter')
plt.show()

在这里插入图片描述

神经网络反向传播是一种常用的训练神经网络的方法,其核心思想是通过计算误差梯度来更新神经网络的权重和偏置。下面是神经网络反向传播推导过程: 假设我们有一个三层神经网络,其中输入层有 $i_l$ 个神经元,隐含层有 $j_m$ 个神经元,输出层有 $k_n$ 个神经元。我们用 $w_{ji}$ 表示连接输入层第 $i$ 个神经元和隐含层第 $j$ 个神经元之间的权重,用 $w_{kj}$ 表示连接隐含层第 $j$ 个神经元和输出层第 $k$ 个神经元之间的权重。用 $b_j$ 表示隐含层第 $j$ 个神经元的偏置,用 $b_k$ 表示输出层第 $k$ 个神经元的偏置。用 $a_j$ 表示隐含层第 $j$ 个神经元的输入,用 $z_j$ 表示隐含层第 $j$ 个神经元的输出,用 $a_k$ 表示输出层第 $k$ 个神经元的输入,用 $z_k$ 表示输出层第 $k$ 个神经元的输出。用 $y_k$ 表示输出层第 $k$ 个神经元的期望输出,用 $o_k$ 表示输出层第 $k$ 个神经元的实际输出。用 $E$ 表示网络的误差。 1.计算输出层误差 $$ \delta_k = \frac{\partial E}{\partial a_k} = \frac{\partial E}{\partial o_k} \frac{\partial o_k}{\partial a_k} = (y_k - o_k) f'(a_k) $$ 其中,$f'(a_k)$ 表示输出层第 $k$ 个神经元的激活函数的导数。 2.计算隐含层误差 $$ \delta_j = \frac{\partial E}{\partial a_j} = \frac{\partial E}{\partial z_j} \frac{\partial z_j}{\partial a_j} = \sum_{k=1}^{k_n} \delta_k w_{kj} f'(a_j) $$ 3.计算输出层权重和偏置的梯度 $$ \frac{\partial E}{\partial w_{kj}} = \delta_k z_j \\ \frac{\partial E}{\partial b_k} = \delta_k $$ 4.计算隐含层权重和偏置的梯度 $$ \frac{\partial E}{\partial w_{ji}} = \delta_j x_i \\ \frac{\partial E}{\partial b_j} = \delta_j $$ 5.更新权重和偏置 $$ w_{kj} \leftarrow w_{kj} + \eta \frac{\partial E}{\partial w_{kj}} \\ b_k \leftarrow b_k + \eta \frac{\partial E}{\partial b_k} \\ w_{ji} \leftarrow w_{ji} + \eta \frac{\partial E}{\partial w_{ji}} \\ b_j \leftarrow b_j + \eta \frac{\partial E}{\partial b_j} $$ 其中,$\eta$ 表示学习率。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值