一、目的
线性模型 y = ω x y={\omega}x y=ωx,使用pytorch实现反向传播
二、编程
w是tensor(张量类型),tensor中包含data和grad,data和grad也是Tensor。grad初始为None,调用l.backward()方法后w.grad为Tensor,故更新w.data时需使用w.grad.data。如果w需要计算梯度,那构建的计算图中,跟w相关的tensor都默认需要计算梯度。
import torch
a = torch.tensor([1.0])
print(type(a))
print(a.data)
print(type(a.data))
print(a.grad)
print(type(a.grad))
<class 'torch.Tensor'>
tensor([1.])
<class 'torch.Tensor'>
None
<class 'NoneType'>
下面使用pytorch实现反向传播
import matplotlib.pyplot as plt
import torch
# 数据集
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
# 定义w为tensor类型,需要计算梯度
w = torch.tensor([1.0])
w.requires_grad = True
# 正向传播
def forward(x):
return w * x
# 计算成本;构建计算图
def loss(x, y):
y_pred = forward(x)
return (y_pred - y)**2
# 开始训练
costs = []
for epoch in range(100):
for x_val, y_val in zip(x_data, y_data):
# l是一个张量,tensor主要是在建立计算图 forward, compute the loss
l = loss(x_val, y_val)
# 反向传播计算需要计算的梯度
l.backward()
print('\tgrad:', x, y, w.grad.item())
# 权重更新时,需要用到标量,注意grad也是一个tensor
w.data = w.data - 0.01 * w.grad.data
# 释放之前计算的梯度
w.grad.data.zero_()
costs.append(l.item())
print("epoch=", epoch, "cost=", l.item())
# 预测
print("x=4,y=", forward(4).item())
# 绘图
plt.plot(costs)
plt.xlabel("epoch")
plt.ylabel("cost")
plt.show()
grad: 1.0 2.0 -2.0
grad: 2.0 4.0 -7.840000152587891
grad: 3.0 6.0 -16.228801727294922
epoch= 0 cost= 7.315943717956543
grad: 1.0 2.0 -1.478623867034912
grad: 2.0 4.0 -5.796205520629883
grad: 3.0 6.0 -11.998146057128906
epoch= 1 cost= 3.9987640380859375
...
grad: 1.0 2.0 -7.152557373046875e-07
grad: 2.0 4.0 -2.86102294921875e-06
grad: 3.0 6.0 -5.7220458984375e-06
epoch= 98 cost= 9.094947017729282e-13
grad: 1.0 2.0 -7.152557373046875e-07
grad: 2.0 4.0 -2.86102294921875e-06
grad: 3.0 6.0 -5.7220458984375e-06
epoch= 99 cost= 9.094947017729282e-13
x=4,y= 7.999998569488525
接下来我们实现对模型
y
=
ω
1
x
2
+
ω
2
x
+
b
y={\omega_1}x^2+{\omega_2}x+b
y=ω1x2+ω2x+b的反向传播
import matplotlib.pyplot as plt
import torch
# 数据集
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
# 定义w为tensor类型,需要计算梯度
w1 = torch.tensor([1.0])
w1.requires_grad = True
w2 = torch.tensor([1.0])
w2.requires_grad = True
b = torch.tensor([0.0])
b.requires_grad = True
# 正向传播
def forward(x):
return w1 * x **2 + w2 * x + b
# 计算成本;构建计算图
def loss(x, y):
y_pred = forward(x)
return (y_pred - y)**2
# 开始训练
costs = []
for epoch in range(100):
for x_val, y_val in zip(x_data, y_data):
# l是一个张量,tensor主要是在建立计算图 forward, compute the loss
l = loss(x_val, y_val)
# 反向传播计算需要计算的梯度
l.backward()
print('\tgrad:', x_val, y_val, w1.grad.item(), w2.grad.item(), b.grad.item())
# 权重更新时,需要用到标量,注意grad也是一个tensor
w1.data = w1.data - 0.01 * w1.grad.data
w2.data = w2.data - 0.01 * w2.grad.data
b.data = b.data - 0.01 * b.grad.data
# 释放之前计算的梯度
w1.grad.data.zero_()
w2.grad.data.zero_()
b.grad.data.zero_()
costs.append(l.item())
print("epoch=", epoch, "cost=", l.item())
# 预测
print("x=4,y=", forward(4).item())
# 绘图
plt.plot(costs)
plt.xlabel("epoch")
plt.ylabel("cost")
plt.show()
grad: 1.0 2.0 0.0 0.0 0.0
grad: 2.0 4.0 16.0 8.0 4.0
grad: 3.0 6.0 77.04000854492188 25.680004119873047 8.560001373291016
epoch= 0 cost= 18.318405151367188
grad: 1.0 2.0 -2.785599946975708 -2.785599946975708 -2.785599946975708
grad: 2.0 4.0 -18.606464385986328 -9.303232192993164 -4.651616096496582
grad: 3.0 6.0 -20.65099334716797 -6.883664131164551 -2.2945547103881836
epoch= 1 cost= 1.3162453174591064
grad: 1.0 2.0 -1.3706533908843994 -1.3706533908843994 -1.3706533908843994
grad: 2.0 4.0 -2.1309146881103516 -1.0654573440551758 -0.5327286720275879
grad: 3.0 6.0 24.26446533203125 8.088154792785645 2.696051597595215
epoch= 2 cost= 1.8171736001968384
...
grad: 1.0 2.0 0.1441793441772461 0.1441793441772461 0.1441793441772461
grad: 2.0 4.0 -1.126626968383789 -0.5633134841918945 -0.28165674209594727
grad: 3.0 6.0 1.0085105895996094 0.3361701965332031 0.11205673217773438
epoch= 98 cost= 0.003139177802950144
grad: 1.0 2.0 0.14582538604736328 0.14582538604736328 0.14582538604736328
grad: 2.0 4.0 -1.1205825805664062 -0.5602912902832031 -0.28014564514160156
grad: 3.0 6.0 1.0001163482666016 0.3333721160888672 0.11112403869628906
epoch= 99 cost= 0.003087138058617711
x=4,y= 8.3240966796875
这里对于x=4时的预测值就不如模型
y
=
ω
x
y={\omega}x
y=ωx准确了,因为数据集为标准线性的。