二、自动求导 Torch.autograd()
概念:
autograd() 是torch的自动求导机制,用于神经网络的训练。
神经网络的训练主要有两个大步骤:
1、前向传播:
构建合理的模型,训练网络参数,使其在训练数据集上能够拟合结果。
2、反向传播:
根据预测结果与正确结果之间的差距(用损失函数来衡量),来调节模型的参数。通常有梯度下降方法。这是就需要自动求导autograd()。
1、举例
# Usage in PyTorch
# create a random data tensor
import torch, torchvision
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)
# forward pass
prediction = model(data)
# Autograd then calculates and stores the gradients for each model parameter
# in the parameter’s .grad attribute.
loss = (prediction - labels).sum()
loss.backward() # backward pass
# Next, we load an optimizer
optim = torch.optim.SGD(model.parameters(), lr= 0.01, momentum = 0.9)
# Finally, we call .step() to initiate gradient descent.
# The optimizer adjusts each parameter by its gradient stored in .grad.
optim.step() # gradient descent
注:
1、torch.rand()
# Returns a tensor filled with random numbers from a uniform distribution on the interval [0, 1)
# 返回 对应维度的张量,每个元素数值为[0, 1)
2、torchvision.models.resnet18(pretrained=False, ** kwargs)
构建一个resnet18
模型。pretrained (bool) – True
, 返回在ImageNet上训练好的模型。
参考:https://pytorch-cn.readthedocs.io/zh/latest/torchvision/torchvision-models/
3、torch.optim, .step()
参考:https://pytorch.org/docs/stable/optim.html
2、自动(求导)微分
## 2、Differention in Autograd
import torch
a = torch.tensor([2., 3.], requires_grad=True)
# requires_grad=True this signals to "autograd" that every operation on them should be traacked
b = torch.tensor([6., 4.], requires_grad=True)
Q = 3*a**3 - b**2
# We need to explicitly pass a gradient argument in Q.backward() because it is a vector.
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad,retain_graph=True)
# Gradients are now deposited in a.grad and b.grad
print(9*a**2 == a.grad)
print(-2*b == b.grad)
print(a.grad)
print(b.grad)
# Equivalently, we can also aggregate Q into a scalar and call backward implicitly,
# like Q.sum().backward().
Q.sum().backward()
print(a.grad)
print(b.grad)
# 运行结果
tensor([True, True])
tensor([True, True])
tensor([36., 81.])
tensor([-12., -8.])
tensor([ 72., 162.])
tensor([-24., -16.])
3 、链式求导法则,
4、pytorch 动态计算图
5、无需自动求导的情况:
(1)pytorch 默认所有运算都有需要求导,但是可以通过设置requires_grad(bool)属性。在有很多不需要求导的张量情况下,可以减少内存消耗。
## 5、Exclusion from the DAG
x = torch.rand(5, 5)
y = torch.rand(5, 5)
z = torch.rand((5,5), requires_grad=True)
a = x + y
print(f"Does 'a' require gradients?: {a.requires_grad}")
b = x + z
print(f"Does 'b' require gradients?: {b.requires_grad}")
# 运行结果
Does 'a' require gradients?: False
Does 'b' require gradients?: True
(2)如果利用现成的模型,只改动输出层的参数。则可以设置requires_grad = false。在反向传播时,不会改变其他层参数。
# frozen parameters
# for finetuning a pretrained network
# like torch.no_grad()
## Context-manager that disabled gradient calculation
## and this context manager is thread local, it will not affect computation in other threads
from torch import nn, optim
model = torchvision.models.resnet18(pretrained = True)
# Freeze all the parameters in the network
for param in model.parameters():
param.requires_grad = False
model.fc = nn.Linear(512, 10)
# Optimize only the classifier
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)