跟着无神学Pytorch Day4 自动梯度下降成员的具体原理和成员设置的使用方式以及规则

梯度下降的具体原理

梯度下降原理和DAG

Conceptually, autograd keeps a record of data (tensors) & all executed operations (along with the resulting new tensors) in a directed acyclic graph (DAG) consisting of Function objects. In this DAG, leaves are the input tensors, roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

梯度下降算法在Pytorch中使用DAG这种数据结构实现,是一种有向无环图,学过DS的可知其为一种体现多个元素之间多对多关系的数据结构。

DAGs are dynamic in PyTorch An important thing to note is that the graph is recreated from scratch; after each .backward() call, autograd starts populating a new graph. This is exactly what allows you to use control flow statements in your model; you can change the shape, size and operations at every iteration if needed.

DAG在Pytorch中是动态的,允许我们在模型中控制其状态,比如在每个轮次中改变大小、形状和操作。

Conceptually, autograd keeps a record of data (tensors) & all executed operations (along with the resulting new tensors) in a directed acyclic graph (DAG) consisting of Function objects. In this DAG, leaves are the input tensors, roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

In a forward pass, autograd does two things simultaneously:

  • run the requested operation to compute a resulting tensor, and

  • maintain the operation’s gradient function in the DAG.

The backward pass kicks off when .backward() is called on the DAG root. autograd then:

  • computes the gradients from each .grad_fn,

  • accumulates them in the respective tensor’s .grad attribute, and

  • using the chain rule, propagates all the way to the leaf tensors.

Below is a visual representation of the DAG in our example. In the graph, the arrows are in the direction of the forward pass. The nodes represent the backward functions of each operation in the forward pass. The leaf nodes in blue represent our leaf tensors a and b.

总的来说,上面的蓝色为输入张量(a,b),被称为叶结点,箭头为前向传播的方向,同时进行梯度计算,将梯度存储在DAG中,底下的终点为根节点,是输出向量,根据每一个节点储存的梯度使用链式法则,传递回叶结点。

以上的J为Jacobian矩阵,是一种求导的算子,通俗来讲是向量(Y)中的每一个维度对于X(x1,x2, ...,xn)进行求导,求导后扩展成的矩阵。

再通过给定的向量v,进行计算。这里用到链式法则,是一种用中间变量求微分的计算法则,属于高等数学的范畴。l向量为关于y的一个函数,即关于y的一个表达式。

这样,l关于y的导数就求出,以此结果代表我们输入的v向量。

这就是自动梯度下降所求的结果。

梯度下降的设定

使用方式

在神经网络中,可以用model.parameters()的方法将模型对象的参数设置为冻结,比如在迁移学习将前面已经预训练过的模型的卷积层参数冻结一样。

from torch import nn, optim

model = resnet18(weights=ResNet18_Weights.DEFAULT)

# Freeze all the parameters in the network
for param in model.parameters():
    param.requires_grad = False

然后将最后的一个前连接层用来做分类器:

Let’s say we want to finetune the model on a new dataset with 10 labels. In resnet, the classifier is the last linear layer model.fc. We can simply replace it with a new linear layer (unfrozen by default) that acts as our classifier.

model.fc = nn.Linear(512, 10)

使用原则

torch.autograd tracks operations on all tensors which have their requires_grad flag set to True. For tensors that don’t require gradients, setting this attribute to False excludes it from the gradient computation DAG.

官方文档给出以上的说明:张量的其中一个的require_grad标志被设定为true,则默认该张量是会被加入到DAG中进行梯度下降的。

The output tensor of an operation will require gradients even if only a single input tensor has requires_grad=True.

根据以上规则有如下测试:

x = torch.rand(5, 5)
y = torch.rand(5, 5)
z = torch.rand((5, 5), requires_grad=True)

a = x + y
print(f"Does `a` require gradients? : {a.requires_grad}")
b = x + z
print(f"Does `b` require gradients?: {b.requires_grad}")

测试的输出结果:

Does `a` require gradients? : False
Does `b` require gradients?: True

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值