pytorch学习笔记（二）：gradient

最新推荐文章于 2024-05-21 10:59:25 发布

算法学习者

最新推荐文章于 2024-05-21 10:59:25 发布

阅读量8.5k

点赞数 1

分类专栏： pytorch

pytorch 专栏收录该内容

34 篇文章 2 订阅

订阅专栏

gradient

在BP的时候，pytorch是将Variable的梯度放在Variable对象中的，我们随时都可以使用Variable.grad得到对应Variable的grad。刚创建Variable的时候，它的grad属性是初始化为0.0的。

import torch
from torch.autograd import Variable
w1 = Variable(torch.Tensor([1.0,2.0,3.0]),requires_grad=True)#需要求导的话，requires_grad=True属性是必须的。
w2 = Variable(torch.Tensor([1.0,2.0,3.0]),requires_grad=True)
print(w1.grad)
print(w2.grad)
 
 1
2
3
4
5
6
 
 1
2
3
4
5
6

Variable containing:
 0
 0
 0
[torch.FloatTensor of size 3]

Variable containing:
 0
 0
 0
[torch.FloatTensor of size 3]

从下面这两段代码可以看出，使用d.backward()求Variable的梯度的时候，Variable.grad是累加的即:Variable.grad=Variable.grad+new_grad

d = torch.mean(w1)
d.backward()
w1.grad
 
 1
2
3
 
 1
2
3

Variable containing:
 0.3333
 0.3333
 0.3333
[torch.FloatTensor of size 3]

d.backward()
w1.grad
 
 1
2
 
 1
2

Variable containing:
 0.6667
 0.6667
 0.6667
[torch.FloatTensor of size 3]

既然累加的话，那我们如何置零呢？

w1.grad.data.zero_()
w1.grad
 
 1
2
 
 1
2

Variable containing:
 0
 0
 0
[torch.FloatTensor of size 3]

通过上面的方法，就可以将grad置零。通过打印出来的信息可以看出，w1.grad其实是Tensor。现在可以更清楚的理解一下Variable与Tensor之间的关系，上篇博客已经说过，Variable是Tensor的一个wrapper，那么到底是什么样的wrapper呢？从目前的掌握的知识来看，一个是保存weights的Tensor，一个是保存grad的Variable。Variable的一些运算，实际上就是里面的Tensor的运算。
pytorch中的所有运算都是基于Tensor的，Variable只是一个Wrapper，Variable的计算的实质就是里面的Tensor在计算。Variable默认代表的是里面存储的Tensor（weights）。理解到这，我们就可以对grad进行随意操作了。

# 获得梯度后，如何更新
learning_rate = 0.1
#w1.data -= learning_rate * w1.grad.data 与下面式子等价
w1.data.sub_(learning_rate*w1.grad.data)# w1.data是获取保存weights的Tensor
 
 1
2
3
4
 
 1
2
3
4

这里更新的时候为什么要用Tensor更新，为什么不直接用Variable？
Variable更多是用在feedforward中的，因为feedforward是需要记住各个Tensor之间联系的，这样，才能正确的bp。Tensor不会记录路径。而且，如果使用Variable操作的话，就会造成循环图了（猜测）。

torch.optim

如果每个参数的更新都要w1.data.sub_(learning_rate*w1.grad.data)，那就比较头疼了。还好，pytorch为我们提供了torch.optim包，这个包可以简化我们更新参数的操作。

import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr = 0.01)

# in your training loop:
for i in range(steps):
  optimizer.zero_grad() # zero the gradient buffers，必须要置零
  output = net(input)
  loss = criterion(output, target)
  loss.backward()
  optimizer.step() # Does the update
 
 1
2
3
4
5
6
7
8
9
10
11
 
 
  
  
 
 
 
 1
2
3
4
5
6
7
8
9
10
11

注意：torch.optim只用于更新参数，不care梯度的计算。

关于 backword()

backward(gradient=None, retain_variables=False)
参数：
gradient (Tensor) – Gradient of the differentiated function w.r.t. the data. Required only if the data has more than one element

z.backword(gradient=grads)
 
 1
 
 1

上面代码应该怎么解释呢？

\partial o b j \partial z \partial z \partial w = g r a d s * \partial z \partial w

算法学习者

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
pytorch学习笔记（二）：gradient

gradient在BP的时候，pytorch是将Variable的梯度放在Variable对象中的，我们随时都可以使用Variable.grad得到对应Variable的grad。刚创建Variable的时候，它的grad属性是初始化为0.0的。import torchfrom torch.autograd import Variablew1 = Variable(torch.Te
复制链接

扫一扫