PyTorch 深度学习实践
04. Back Propagation
刘二大人视频学习笔记,2023年07月10日。
本章是对反向传播算法的介绍,做一些重要内容的记录,共勉。
(🥺发现有些东西没图我也说不出来捏)
0.前言
反向传播算法是在神经网络中非常重要的算法,可以在图上面进行梯度的传播,可以建立相对来说更有弹性的模型结构。
之前讲的线性模型,可以分出输入和权重,从而可以见到图中把 ω ∗ \omega * ω∗作为神经元,这样要求损失的导数来更新权重。
这样对于比较简单的情况比较容易,但是对于比较复杂的情况下(神经网络)该如何计算呢?
H代表隐层
h 1 ( 1 ) h_1^{(1)} h1(1)是第一隐层的第一个,他是一个六维向量,如果想从输入的五维向量得到的话,需要一个6*5的权重矩阵。以此类推, h 1 ( 2 ) h_1^{(2)} h1(2)需要的是42个……
所以有没有一种算法,可以把这个网络看作一个图,然后利用图和梯度下降算法得出结果?
1.计算图
对于矩阵的各种公式,我们可以去搜索《The Matrix Cookbook》这本书来查询。
那么,两层好像有些麻烦,可不可以把式子展开化简呢?我们进行展开,发现无论有多少层,最后都会变成一层,层与层之前没有区别,同时计算的权重也没有任何意义。
所以需要对每一层的结果进行一次非线性变化,如图中例子,避免了展开,从而形成了真正的神经网络。(这里我对为什么要避免展开有些一知半解)
2.链式求导
感觉用例子好理解一点,求其中 f = ω ⋅ x , x = 2 , ω = 3 f = \omega \cdot x,x=2,\omega=3 f=ω⋅x,x=2,ω=3
-
Forward 前馈运算:由 f = ω ⋅ x f = \omega \cdot x f=ω⋅x算出z = 2*3 = 6。
-
Local Gradient:计算出 ∂ z ∂ x = ω \frac{\partial z}{\partial x}= \omega ∂x∂z=ω和 ∂ z ∂ ω = x \frac{\partial z}{\partial \omega}= x ∂ω∂z=x
-
Given gradient form successive node:从后面层传来的 ∂ L ∂ z \frac{\partial L}{\partial z} ∂z∂L的值。比如说 ∂ L ∂ z = 5 \frac{\partial L}{\partial z} = 5 ∂z∂L=5。
-
Backward 反馈运算:利用链式求导法则,求出 ∂ L ∂ x = ∂ L ∂ z ⋅ ∂ z ∂ x = 5 ⋅ ω = 15 \frac{\partial L}{\partial x} = \frac{\partial L}{\partial z} \cdot \frac{\partial z}{\partial x}=5\cdot\omega=15 ∂x∂L=∂z∂L⋅∂x∂z=5⋅ω=15,和
∂ L ∂ ω = ∂ L ∂ z ⋅ ∂ z ∂ ω = 5 ⋅ x = 10 \frac{\partial L}{\partial \omega} = \frac{\partial L}{\partial z} \cdot \frac{\partial z}{\partial \omega}=5\cdot x=10 ∂ω∂L=∂z∂L⋅∂ω∂z=5⋅x=10
从而来更新 x 和 ω x和\omega x和ω的值。然后利用比如 ∂ L ∂ x \frac{\partial L}{\partial x} ∂x∂L的值来进行前一层更新。
线性模型计算图:
练习题1:当上图x=2,y=4, ω \omega ω=1的时候,梯度是多少?
y ^ = 2 r = − 2 l o s s = 4 ∂ l o s s ∂ r = 2 r = 4 ∂ l o s s ∂ y ^ = ∂ l o s s ∂ r ⋅ ∂ r ∂ y ^ = − 4 ⋅ 1 = − 4 ∂ l o s s ∂ ω = ∂ l o s s ∂ y ^ ⋅ ∂ y ^ ∂ ω = − 4 ⋅ 2 = − 8 \hat{y} = 2\\ r=-2\\ loss = 4\\ \frac{\partial loss}{\partial r} = 2r=4\\ \frac{\partial loss}{\partial \hat{y}} = \frac{\partial loss}{\partial r} \cdot \frac{\partial r}{\partial \hat{y}} = -4 \cdot 1 = -4\\ \frac{\partial loss}{\partial \omega} = \frac{\partial loss}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial \omega} = -4 \cdot 2 = -8 y^=2r=−2loss=4∂r∂loss=2r=4∂y^∂loss=∂r∂loss⋅∂y^∂r=−4⋅1=−4∂ω∂loss=∂y^∂loss⋅∂ω∂y^=−4⋅2=−8
练习题2:当$ \hat{y} = x * \omega + b 上图 x = 1 , y = 2 , 上图x=1,y=2, 上图x=1,y=2,\omega$=1,b=2的时候,梯度是多少?
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3kDt0opt-1689001852566)(/home/lidong/图片/pt-04-08.png)]
y ^ = 1 ∗ 1 + 2 = 3 ∂ x ω + b ∂ ω = x , ∂ x ω + b ∂ b = 1 r = 1 l o s s = 1 ∂ l o s s ∂ r = 2 r = 2 ∂ l o s s ∂ y ^ = ∂ l o s s ∂ r ⋅ ∂ r ∂ y ^ = 2 ⋅ 1 = 2 ∂ l o s s ∂ ω = ∂ l o s s ∂ y ^ ⋅ ∂ y ^ ∂ ω = 2 ⋅ 1 = 2 = ω ∂ l o s s ∂ b = ∂ l o s s ∂ y ^ ⋅ ∂ y ^ ∂ b = 2 ⋅ 1 = 2 = b \hat{y}= 1*1+2=3\\ \frac{\partial x \omega + b}{\partial \omega} = x,\frac{\partial x \omega + b}{\partial b} = 1\\ r = 1\\ loss = 1\\ \frac{\partial loss}{\partial r} = 2r=2\\ \frac{\partial loss}{\partial \hat{y}} = \frac{\partial loss}{\partial r} \cdot \frac{\partial r}{\partial \hat{y}} = 2 \cdot 1 = 2\\ \frac{\partial loss}{\partial \omega} = \frac{\partial loss}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial \omega} = 2 \cdot 1 = 2 = \omega\\ \frac{\partial loss}{\partial b} = \frac{\partial loss}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial b} = 2 \cdot 1 = 2 = b\\ y^=1∗1+2=3∂ω∂xω+b=x,∂b∂xω+b=1r=1loss=1∂r∂loss=2r=2∂y^∂loss=∂r∂loss⋅∂y^∂r=2⋅1=2∂ω∂loss=∂y^∂loss⋅∂ω∂y^=2⋅1=2=ω∂b∂loss=∂y^∂loss⋅∂b∂y^=2⋅1=2=b
3. Tensor in PyTorch
Tensor:是在PyTorch中的一种重要的最基本的数据成员,用来存储上述计算的所有数值,它可以是数值,向量,矩阵,立方体……有两个数据成员——Data值,Grad损失函数对权重的导数(也是一个Tensor)
代码实现
import torch
# prepare the training set
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
# initial guess of weight
w = torch.Tensor([1.0])
# 需要计算梯度
w.requires_grad = True
# 下面的代码,是在构建计算图,可以试着把计算图和代码对应上
# y-hat,构建了一计算图,会把x转换为Tensor类型,其中w设定为需要计算梯度,计算出的y-hat也许要,x不需要
def forward(x):
return x * w
# calculate loss
def loss(x, y):
y_pred = forward(x)
return (y_pred - y) ** 2
print('Predict (before training)', 4, forward(4).item())
for epoch in range(100):
for x, y in zip(x_data, y_data):
l = loss(x, y) # 前馈计算
l.backward() # 计算Tensor(requires_grad = True),计算完成之后,计算图会被释放
print('\tgrad:', x, y, w.grad.item())
w.data -= 0.01 * w.grad.data # data取得是导数的值,w.grad是一个tensor类型,会变成计算图的计算
w.grad.data.zero_() # 要把w里面的梯度数据全部清零,不然下次更新会累加
print('progress:', epoch, l.item()) # l.item()相当于取标量
print('Predict (after training)', 4, forward(4).item())
4.作业题
计算图和代码实现。
import torch
# prepare the training set
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w1 = torch.Tensor([1.0])
w1.requires_grad = True
w2 = torch.Tensor([1.0])
w2.requires_grad = True
b = torch.Tensor([1.0])
b.requires_grad = True
def forward(x):
return w1 * x **2 + w2 * x + b
# calculate loss
def loss(x, y):
y_pred = forward(x)
return (y_pred - y) ** 2
print('Predict (before training)', 4, forward(4).item())
for epoch in range(100):
for x, y in zip(x_data, y_data):
l = loss(x, y)
l.backward()
print('\tgrad:', x, y, w1.grad.item(), w2.grad.item(), b.grad.item())
w1.data -= 0.01 * w1.grad.data
w2.data -= 0.01 * w2.grad.data
b.data -= 0.01 * b.grad.data
w1.grad.data.zero_()
w2.grad.data.zero_()
b.grad.data.zero_()
print('progress:', epoch, l.item())
print('Predict (after training)', 4, forward(4).item())
grad: 1.0 2.0 2.0 2.0 2.0
grad: 2.0 4.0 22.880001068115234 11.440000534057617 5.720000267028809
grad: 3.0 6.0 77.04720306396484 25.682401657104492 8.560800552368164
progress: 0 18.321826934814453
grad: 1.0 2.0 -1.1466078758239746 -1.1466078758239746 -1.1466078758239746
grad: 2.0 4.0 -15.536651611328125 -7.7683258056640625 -3.8841629028320312
grad: 3.0 6.0 -30.432214736938477 -10.144071578979492 -3.381357192993164
progress: 1 2.858394145965576
grad: 1.0 2.0 0.3451242446899414 0.3451242446899414 0.3451242446899414
grad: 2.0 4.0 2.4273414611816406 1.2136707305908203 0.6068353652954102
grad: 3.0 6.0 19.449920654296875 6.483306884765625 2.161102294921875
……
progress: 98 0.0063416799530386925
grad: 1.0 2.0 0.31661415100097656 0.31661415100097656 0.31661415100097656
grad: 2.0 4.0 -1.7297439575195312 -0.8648719787597656 -0.4324359893798828
grad: 3.0 6.0 1.4307546615600586 0.47691822052001953 0.15897274017333984
progress: 99 0.00631808303296566
Predict (after training) 4 8.544171333312988