跟着无神学Pytorch Day4 自动梯度下降成员的具体原理和成员设置的使用方式以及规则

最新推荐文章于 2024-06-13 17:23:59 发布

AIzealot无

最新推荐文章于 2024-06-13 17:23:59 发布

阅读量49

点赞数 1

分类专栏：跟无神学AI 文章标签： pytorch 人工智能机器学习深度学习改行学it python

本文链接：https://blog.csdn.net/m0_72806612/article/details/132625648

版权

跟无神学AI 专栏收录该内容

38 篇文章 2 订阅

订阅专栏

梯度下降的具体原理

梯度下降原理和DAG

Conceptually, autograd keeps a record of data (tensors) & all executed operations (along with the resulting new tensors) in a directed acyclic graph (DAG) consisting of Function objects. In this DAG, leaves are the input tensors, roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

梯度下降算法在Pytorch中使用DAG这种数据结构实现，是一种有向无环图，学过DS的可知其为一种体现多个元素之间多对多关系的数据结构。

DAGs are dynamic in PyTorch An important thing to note is that the graph is recreated from scratch; after each .backward() call, autograd starts populating a new graph. This is exactly what allows you to use control flow statements in your model; you can change the shape, size and operations at every iteration if needed.

DAG在Pytorch中是动态的，允许我们在模型中控制其状态，比如在每个轮次中改变大小、形状和操作。

In a forward pass, autograd does two things simultaneously:

run the requested operation to compute a resulting tensor, and
maintain the operation’s gradient function in the DAG.

The backward pass kicks off when .backward() is called on the DAG root. autograd then:

computes the gradients from each .grad_fn,
accumulates them in the respective tensor’s .grad attribute, and
using the chain rule, propagates all the way to the leaf tensors.

Below is a visual representation of the DAG in our example. In the graph, the arrows are in the direction of the forward pass. The nodes represent the backward functions of each operation in the forward pass. The leaf nodes in blue represent our leaf tensors a and b.

总的来说，上面的蓝色为输入张量（a,b）,被称为叶结点，箭头为前向传播的方向，同时进行梯度计算，将梯度存储在DAG中，底下的终点为根节点，是输出向量，根据每一个节点储存的梯度使用链式法则，传递回叶结点。

以上的J为Jacobian矩阵，是一种求导的算子，通俗来讲是向量（Y）中的每一个维度对于X(x1，x2, ...,xn)进行求导，求导后扩展成的矩阵。

再通过给定的向量v，进行计算。这里用到链式法则，是一种用中间变量求微分的计算法则，属于高等数学的范畴。l向量为关于y的一个函数，即关于y的一个表达式。

这样，l关于y的导数就求出，以此结果代表我们输入的v向量。

这就是自动梯度下降所求的结果。

梯度下降的设定

使用方式

在神经网络中，可以用model.parameters（）的方法将模型对象的参数设置为冻结，比如在迁移学习将前面已经预训练过的模型的卷积层参数冻结一样。

from torch import nn, optim

model = resnet18(weights=ResNet18_Weights.DEFAULT)

# Freeze all the parameters in the network
for param in model.parameters():
param.requires_grad = False

然后将最后的一个前连接层用来做分类器：

Let’s say we want to finetune the model on a new dataset with 10 labels. In resnet, the classifier is the last linear layer model.fc. We can simply replace it with a new linear layer (unfrozen by default) that acts as our classifier.

model.fc = nn.Linear(512, 10)

使用原则

torch.autograd tracks operations on all tensors which have their requires_grad flag set to True. For tensors that don’t require gradients, setting this attribute to False excludes it from the gradient computation DAG.

官方文档给出以上的说明：张量的其中一个的require_grad标志被设定为true，则默认该张量是会被加入到DAG中进行梯度下降的。

The output tensor of an operation will require gradients even if only a single input tensor has requires_grad=True.

根据以上规则有如下测试：

x = torch.rand(5, 5)
y = torch.rand(5, 5)
z = torch.rand((5, 5), requires_grad=True)

a = x + y
print(f"Does `a` require gradients? : {a.requires_grad}")
b = x + z
print(f"Does `b` require gradients?: {b.requires_grad}")

测试的输出结果：

Does `a` require gradients? : False
Does `b` require gradients?: True

AIzealot无

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
跟着无神学Pytorch Day4 自动梯度下降成员的具体原理和成员设置的使用方式以及规则

梯度下降算法在Pytorch中使用DAG这种数据结构实现，是一种有向无环图，学过DS的可知其为一种体现多个元素之间多对多关系的数据结构。after eachDAG在Pytorch中是动态的，允许我们在模型中控制其状态，比如在每个轮次中改变大小、形状和操作。
复制链接

扫一扫