反向传播(Backpropagation)是深度学习中用于计算损失函数对模型参数梯度的核心算法。它通过链式法则从输出层向输入层逐层计算梯度,并利用梯度下降法更新模型参数。以下是一个详细的反向传播案例讲解,结合数学公式和代码实现。
1. 案例背景
假设我们有一个简单的神经网络,结构如下:
- 输入层:1 个神经元(( x ))。
- 隐藏层:1 个神经元(( h )),使用 Sigmoid 激活函数。
- 输出层:1 个神经元(( y )),使用线性激活函数(无激活函数)。
- 损失函数:均方误差(MSE)。
网络结构如下:
[
h = \sigma(w_1 x + b_1)
]
[
y = w_2 h + b_2
]
[
L = \frac{1}{2} (y_{\text{true}} - y)^2
]
其中:
- ( \sigma ) 是 Sigmoid 函数:( \sigma(z) = \frac{1}{1 + e^{-z}} )。
- ( w_1, w_2 ) 是权重。
- ( b_1, b_2 ) 是偏置。
- ( L ) 是损失函数。
2. 正向传播
假设输入 ( x = 1 ),真实值 ( y_{\text{true}} = 0.5 ),初始参数为:
- ( w_1 = 0.5 ), ( b_1 = 0.2 )
- ( w_2 = 0.3 ), ( b_2 = 0.1 )
计算过程:
- 计算隐藏层输出 ( h ):
[
z_1 = w_1 x + b_1 = 0.5 \times 1 + 0.2 = 0.7
]
[
h = \sigma(z_1) = \frac{1}{1 + e^{-0.7}} \approx 0.668
] - 计算输出层输出 ( y ):
[
y = w_2 h + b_2 = 0.3 \times 0.668 + 0.1 \approx 0.300
] - 计算损失 ( L ):
[
L = \frac{1}{2} (y_{\text{true}} - y)^2 = \frac{1}{2} (0.5 - 0.300)^2 = 0.020
]
3. 反向传播
反向传播的目标是计算损失函数 ( L ) 对每个参数(( w_1, b_1, w_2, b_2 ))的梯度,并更新参数。
3.1 计算输出层的梯度
- 计算 ( \frac{\partial L}{\partial y} ):
[
\frac{\partial L}{\partial y} = y - y_{\text{true}} = 0.300 - 0.5 = -0.200
] - 计算 ( \frac{\partial y}{\partial w_2} ) 和 ( \frac{\partial y}{\partial b_2} ):
[
\frac{\partial y}{\partial w_2} = h \approx 0.668
]
[
\frac{\partial y}{\partial b_2} = 1
] - 计算 ( \frac{\partial L}{\partial w_2} ) 和 ( \frac{\partial L}{\partial b_2} ):
[
\frac{\partial L}{\partial w_2} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial w_2} = -0.200 \times 0.668 \approx -0.134
]
[
\frac{\partial L}{\partial b_2} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial b_2} = -0.200 \times 1 = -0.200
]
3.2 计算隐藏层的梯度
- 计算 ( \frac{\partial L}{\partial h} ):
[
\frac{\partial L}{\partial h} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial h} = -0.200 \times w_2 = -0.200 \times 0.3 = -0.060
] - 计算 ( \frac{\partial h}{\partial z_1} ):
[
\frac{\partial h}{\partial z_1} = h \cdot (1 - h) \approx 0.668 \times (1 - 0.668) \approx 0.222
] - 计算 ( \frac{\partial L}{\partial z_1} ):
[
\frac{\partial L}{\partial z_1} = \frac{\partial L}{\partial h} \cdot \frac{\partial h}{\partial z_1} = -0.060 \times 0.222 \approx -0.013
] - 计算 ( \frac{\partial z_1}{\partial w_1} ) 和 ( \frac{\partial z_1}{\partial b_1} ):
[
\frac{\partial z_1}{\partial w_1} = x = 1
]
[
\frac{\partial z_1}{\partial b_1} = 1
] - 计算 ( \frac{\partial L}{\partial w_1} ) 和 ( \frac{\partial L}{\partial b_1} ):
[
\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial z_1} \cdot \frac{\partial z_1}{\partial w_1} = -0.013 \times 1 = -0.013
]
[
\frac{\partial L}{\partial b_1} = \frac{\partial L}{\partial z_1} \cdot \frac{\partial z_1}{\partial b_1} = -0.013 \times 1 = -0.013
]
4. 更新参数
假设学习率 ( \eta = 0.1 ),更新参数:
- 更新 ( w_2 ) 和 ( b_2 ):
[
w_2 = w_2 - \eta \cdot \frac{\partial L}{\partial w_2} = 0.3 - 0.1 \times (-0.134) \approx 0.313
]
[
b_2 = b_2 - \eta \cdot \frac{\partial L}{\partial b_2} = 0.1 - 0.1 \times (-0.200) = 0.120
] - 更新 ( w_1 ) 和 ( b_1 ):
[
w_1 = w_1 - \eta \cdot \frac{\partial L}{\partial w_1} = 0.5 - 0.1 \times (-0.013) \approx 0.501
]
[
b_1 = b_1 - \eta \cdot \frac{\partial L}{\partial b_1} = 0.2 - 0.1 \times (-0.013) \approx 0.201
]
5. 代码实现
以下是反向传播的 Python 实现:
import numpy as np
# Sigmoid 函数
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# 初始化参数
w1, b1 = 0.5, 0.2
w2, b2 = 0.3, 0.1
# 输入和真实值
x, y_true = 1, 0.5
# 正向传播
z1 = w1 * x + b1
h = sigmoid(z1)
y = w2 * h + b2
loss = 0.5 * (y_true - y) ** 2
# 反向传播
dL_dy = y - y_true
dL_dw2 = dL_dy * h
dL_db2 = dL_dy * 1
dL_dh = dL_dy * w2
dh_dz1 = h * (1 - h)
dL_dz1 = dL_dh * dh_dz1
dL_dw1 = dL_dz1 * x
dL_db1 = dL_dz1 * 1
# 更新参数
eta = 0.1
w2 -= eta * dL_dw2
b2 -= eta * dL_db2
w1 -= eta * dL_dw1
b1 -= eta * dL_db1
print(f"Updated w1: {w1}, b1: {b1}, w2: {w2}, b2: {b2}")
6. 总结
- 反向传播通过链式法则逐层计算梯度。
- 梯度用于更新模型参数,使损失函数最小化。
- 通过反复迭代正向传播和反向传播,模型可以逐步优化。