pytorch学习笔记（二）：gradient

最新推荐文章于 2024-03-16 09:55:21 发布

u012436149

最新推荐文章于 2024-03-16 09:55:21 发布

阅读量3.8w

点赞数 15

分类专栏： pytorch pytorch学习笔记

本文链接：https://blog.csdn.net/u012436149/article/details/54645162

版权

pytorch 同时被 2 个专栏收录

25 篇文章 42 订阅

订阅专栏

pytorch学习笔记

25 篇文章 265 订阅

订阅专栏

在BP的时候，pytorch是将Variable的梯度放在Variable对象中的，我们随时都可以使用Variable.grad得到对应Variable的grad。刚创建Variable的时候，它的grad属性是初始化为0.0的（0.2 版本已经是打印的结果是 None。）。

import torch
from torch.autograd import Variable
w1 = Variable(torch.Tensor([1.0,2.0,3.0]),requires_grad=True)#需要求导的话，requires_grad=True属性是必须的。
w2 = Variable(torch.Tensor([1.0,2.0,3.0]),requires_grad=True)
print(w1.grad) # 0.2 版本打印的是 None
print(w2.grad) # 0.2 版本打印的是 None

Variable containing:
 0
 0
 0
[torch.FloatTensor of size 3]

Variable containing:
 0
 0
 0
[torch.FloatTensor of size 3]

从下面这两段代码可以看出，使用d.backward()求Variable的梯度的时候，Variable.grad是累加的即: Variable.grad=Variable.grad+new_grad

d = torch.mean(w1)
d.backward()
w1.grad

Variable containing:
 0.3333
 0.3333
 0.3333
[torch.FloatTensor of size 3]

d.backward()
w1.grad

Variable containing:
 0.6667
 0.6667
 0.6667
[torch.FloatTensor of size 3]

既然累加的话，那我们如何置零呢？

w1.grad.data.zero_()
w1.grad

Variable containing:
 0
 0
 0
[torch.FloatTensor of size 3]

通过上面的方法，就可以将grad置零。通过打印出来的信息可以看出，w1.grad其实是Variable。现在可以更清楚的理解一下Variable与Tensor之间的关系，上篇博客已经说过，Variable是Tensor的一个wrapper，那么到底是什么样的wrapper呢？从目前的掌握的知识来看，一个是保存weights的Tensor，一个是保存grad的Variable。Variable的一些运算，实际上就是里面的Tensor的运算。
pytorch中的所有运算都是基于Tensor的，Variable只是一个Wrapper，Variable的计算的实质就是里面的Tensor在计算。Variable默认代表的是里面存储的Tensor（weights）。理解到这，我们就可以对grad进行随意操作了。

# 获得梯度后，如何更新
learning_rate = 0.1
#w1.data -= learning_rate * w1.grad.data 与下面式子等价
w1.data.sub_(learning_rate*w1.grad.data)# w1.data是获取保存weights的Tensor

这里更新的时候为什么要用Tensor更新，为什么不直接用Variable？
Variable更多是用在feedforward中的，因为feedforward是需要记住各个Tensor之间联系的，这样，才能正确的bp。Tensor不会记录路径。而且，如果使用Variable操作的话，就会造成循环图了（猜测）。

torch.optim

如果每个参数的更新都要w1.data.sub_(learning_rate*w1.grad.data)，那就比较头疼了。还好，pytorch为我们提供了torch.optim包，这个包可以简化我们更新参数的操作。

import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr = 0.01)

# in your training loop:
for i in range(steps):
  optimizer.zero_grad() # zero the gradient buffers，必须要置零
  output = net(input)
  loss = criterion(output, target)
  loss.backward()
  optimizer.step() # Does the update

注意：torch.optim只用于更新参数，不care梯度的计算。

关于 backward()

backward(gradient=None, retain_variables=False)
参数：
gradient (Tensor) – Gradient of the differentiated function w.r.t. the data. Required only if the data has more than one element

z.backward(gradient=grads)

上面代码应该怎么解释呢？

\partial o b j \partial z \partial z \partial w = g r a d s * \partial z \partial w

$\frac{\partial obj}{\partial z}\frac{\partial z}{\partial w}=grads*\frac{\partial z}{\partial w}$
对于 retain_variables:

import torch
from torch.autograd import Variable
w1 = Variable(torch.Tensor([1.0,2.0,3.0]),requires_grad=True)#需要求导的话，requires_grad=True属性是必须的。
w2 = Variable(torch.Tensor([1.0,2.0,3.0]),requires_grad=True)

z = w1*w2+w1 # 第二次BP出现问题就在这，不知道第一次BP之后销毁了啥。
res = torch.mean(z)
res.backward() #第一次求导没问题
res.backward() #第二次BP会报错,但使用 retain_variables=True，就好了。
# Trying to backward through the graph second time, but the buffers have already been 
#freed. Please specify retain_variables=True when calling backward for the first time

这里也可以看出，backward 这个方法也是 释放一些资源的 的一个标志，如果不需要 backward 的话，一定要记得设置网络为 eval。

其他

这里来测试一下只使用部分 Variable 求出来的 loss对于原Variable求导得到的梯度是什么样的。

import torch
import torch.cuda as cuda
from torch.autograd import Variable
w1 = Variable(cuda.FloatTensor(2,3), requires_grad=True)
res = torch.mean(w1[1])# 只用了variable的第二行参数
res.backward()
print(w1.grad)

Variable containing:
 0.0000  0.0000  0.0000
 0.3333  0.3333  0.3333
[torch.cuda.FloatTensor of size 2x3 (GPU 0)]

看结果和直觉是一样的。

u012436149

关注

15
点赞
踩
51

收藏

觉得还不错? 一键收藏
12
评论
pytorch学习笔记（二）：gradient

在BP的时候，pytorch是将Variable的梯度放在Variable对象中的，我们随时都可以使用Variable.grad得到对应Variable的grad。刚创建Variable的时候，它的grad属性是初始化为0.0的（0.2 版本已经是打印的结果是 None。）。import torchfrom torch.autograd import Variablew1 = Varia...
复制链接

扫一扫