60分钟闪学pytorch深度学习记录2-CSDN博客

本文链接：https://blog.csdn.net/Tanqy1997/article/details/116988959

二、自动求导 Torch.autograd()

概念：

autograd() 是torch的自动求导机制，用于神经网络的训练。

神经网络的训练主要有两个大步骤：

1、前向传播：

构建合理的模型，训练网络参数，使其在训练数据集上能够拟合结果。

2、反向传播：

根据预测结果与正确结果之间的差距（用损失函数来衡量），来调节模型的参数。通常有梯度下降方法。这是就需要自动求导autograd()。

1、举例

# Usage in PyTorch

# create a random data tensor
import torch, torchvision
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

# forward pass
prediction = model(data)
 
# Autograd then calculates and stores the gradients for each model parameter 
# in the parameter’s .grad attribute.
loss = (prediction - labels).sum()
loss.backward() # backward pass

# Next, we load an optimizer
optim = torch.optim.SGD(model.parameters(), lr= 0.01, momentum = 0.9)

# Finally, we call .step() to initiate gradient descent. 
# The optimizer adjusts each parameter by its gradient stored in .grad.
optim.step() # gradient descent

注：

1、torch.rand()

# Returns a tensor filled with random numbers from a uniform distribution on the interval [0, 1)

# 返回对应维度的张量，每个元素数值为[0, 1)

2、torchvision.models.resnet18(pretrained=False, ** kwargs)

构建一个resnet18模型。pretrained (bool) – True, 返回在ImageNet上训练好的模型。

参考：https://pytorch-cn.readthedocs.io/zh/latest/torchvision/torchvision-models/

3、torch.optim, .step()

参考：https://pytorch.org/docs/stable/optim.html

2、自动（求导）微分

## 2、Differention in Autograd
import torch

a = torch.tensor([2., 3.], requires_grad=True) 
# requires_grad=True  this signals to "autograd" that every operation on them should be traacked
b = torch.tensor([6., 4.], requires_grad=True)

Q = 3*a**3 - b**2

# We need to explicitly pass a gradient argument in Q.backward() because it is a vector.
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad,retain_graph=True)

# Gradients are now deposited in a.grad and b.grad
print(9*a**2 == a.grad)
print(-2*b == b.grad)
print(a.grad)
print(b.grad)

# Equivalently, we can also aggregate Q into a scalar and call backward implicitly,
# like Q.sum().backward().
Q.sum().backward()
print(a.grad)
print(b.grad)

# 运行结果
tensor([True, True])
tensor([True, True])
tensor([36., 81.])
tensor([-12.,  -8.])
tensor([ 72., 162.])
tensor([-24., -16.])

3 、链式求导法则，

4、pytorch 动态计算图

5、无需自动求导的情况：

（1）pytorch 默认所有运算都有需要求导，但是可以通过设置requires_grad(bool）属性。在有很多不需要求导的张量情况下，可以减少内存消耗。

## 5、Exclusion from the DAG


x = torch.rand(5, 5)
y = torch.rand(5, 5)
z = torch.rand((5,5), requires_grad=True)

a = x + y
print(f"Does 'a' require gradients?: {a.requires_grad}")
b = x + z
print(f"Does 'b' require gradients?: {b.requires_grad}")


# 运行结果
Does 'a' require gradients?: False
Does 'b' require gradients?: True

（2）如果利用现成的模型，只改动输出层的参数。则可以设置requires_grad = false。在反向传播时，不会改变其他层参数。

# frozen parameters
# for finetuning a pretrained network

# like torch.no_grad()
## Context-manager that disabled gradient calculation
## and this context manager is thread local, it will not affect computation in other threads
from torch import nn, optim

model = torchvision.models.resnet18(pretrained = True)


# Freeze all the parameters in the network
for param in model.parameters():
    param.requires_grad = False

model.fc = nn.Linear(512, 10)

# Optimize only the classifier
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)