pytorch requires_grad 与 detach 区别梯度传递细节 cpu gpu Variable numpy转换

最新推荐文章于 2024-07-19 10:46:01 发布

青盏

最新推荐文章于 2024-07-19 10:46:01 发布

阅读量2.2w

点赞数 17

分类专栏： DL tools

本文链接：https://blog.csdn.net/qq_16234613/article/details/80025832

版权

DL tools 专栏收录该内容

54 篇文章 1 订阅

订阅专栏

pytorch变量类型可以分成三大类，cpu，gpu，Variable。分别表示数据在cpu上参与计算，数据在gpu上参与计算，已经数据加入到梯度计算图中。三者转换方法也很简单：

cpu转gpu使用t.cuda()
gpu转cpu使用t.cpu()
cpu,gpu转variable使用Variable(t)
Variable转cpu，gpu使用v.data
tensor转numpy使用t.numpy()
numpy转tensor使用torch.from_numpy()
注意y = Variable(t.cuda())生成一个节点y，y = Variable(t).cuda()，生成两个计算图节点t和y

将[1]转成1，单元素tensor转成scalar变量，类型不变。single element tensor to scalar
a = [1]
a[0]

# detach_()将计算图中节点转为叶子节点，也就是将节点.grad_fn设置为none，这样detach_()的前一个节点就不会再与当前变量连接
>>> import torch
>>> from torch.autograd import Variable
>>> 
>>> x = Variable(torch.Tensor([[1,2,3],[4,5,6]]), requires_grad=True)
>>> y = Variable(torch.Tensor([[1,2,3],[4,5,6]]), requires_grad=True)
>>> m = 1*x
>>> m.detach_()
>>> n = y.pow(3)
>>> z.backward(torch.ones(2,3))
>>> print(x.grad)  #没有后续变量与x连接，x与m是断开的
None
>>> print(y.grad)
Variable containing:
 1.8000e+01  5.7600e+02  4.3740e+03
 1.8432e+04  5.6250e+04  1.3997e+05
[torch.FloatTensor of size 2x3]

由于pytorch是动态编程，detach使用位置不同，效果也不一样。
import torch
from torch.autograd import Variable
a = Variable(torch.randn(2, 2), requires_grad=True)
b = a * 2
c = b * 2
b.detach_()
c.sum().backward()
print(a.grad, b.grad, c.grad)

Variable containing:
 4  4
 4  4
[torch.FloatTensor of size 2x2]
 None None


import torch
from torch.autograd import Variable
a = Variable(torch.randn(2, 2), requires_grad=True)
b = a * 2
b.detach_()
c = b * 2
c.sum().backward()
print(a.grad, b.grad, c.grad)
#报错： element 0 of variables does not require grad and does not have a grad_fn


import torch
from torch.autograd import Variable
a = Variable(torch.randn(2, 2), requires_grad=True)
b = a * 2
d = a * 3
temp = b.detach()
c = temp * 2 + d
c.sum().backward()
print(a.grad, b.grad, c.grad, d.grad)

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]
 None None None

注意如果使用detach_()，则虽然新分离出Variable，但其指向的tensor还是同一个修改的话，会产生影响。
import torch
from torch.nn import init
from torch.autograd import Variable
t1 = torch.FloatTensor([1., 2.])
v1 = Variable(t1)
t2 = torch.FloatTensor([2., 3.])
v2 = Variable(t2)
v3 = v1 + v2
v3_detached = v3.detach()
v3_detached.data.add_(t1) # 修改了 v3_detached Variable中 tensor 的值
print(v3, v3_detached)    # v3 中tensor 的值也会改变

# 如果对tensor采用直接根据索引赋值，这些元素也将不在参与梯度计算
>>> import torch
>>> from torch.autograd import Variable
>>> x = Variable(torch.Tensor([[1,2,3],[4,5,6]]), requires_grad=True)
>>> y = Variable(torch.Tensor([[1,2,3],[4,5,6]]), requires_grad=True)
>>> m = 1*x
>>> m[(m>4).detach()] = 0
>>> print(m)   #m中值5 6被直接赋值为0
Variable containing:
 1  2  3
 4  0  0
[torch.FloatTensor of size 2x3]

>>> n = y.pow(3)
>>> z = m.pow(2)+3*n.pow(2)
>>> z.backward(torch.ones(2,3))
>>> print(x.grad)  # x的梯度不再包含值5 6的梯度
Variable containing:
 2  4  6
 8  0  0
[torch.FloatTensor of size 2x3]

# requires_grad=False 用于控制是否对leaf variable求导
>>> import torch
>>> from torch.autograd import Variable
>>> 
>>> x = Variable(torch.Tensor([[1,2,3],[4,5,6]]), requires_grad=False)
>>> y = Variable(torch.Tensor([[1,2,3],[4,5,6]]), requires_grad=True)
>>> m = x.pow(2)
>>> n = y.pow(3)
>>> z = m.pow(2)+3*n.pow(2)
>>> z.backward(torch.ones(2,3))
>>> print(m.requires_grad)
False
>>> print(z.requires_grad)
True
>>> print(x.grad)
None
>>> print(y.grad)
Variable containing:
 1.8000e+01  5.7600e+02  4.3740e+03
 1.8432e+04  5.6250e+04  1.3997e+05
[torch.FloatTensor of size 2x3]

# requires_grad 只能用于 leaf variables
>>> import torch
>>> from torch.autograd import Variable
>>> 
>>> x = Variable(torch.Tensor([[1,2,3],[4,5,6]]), requires_grad=True)
z.backward(torch.ones(2,3))
print(m.requires_grad)
print(x.grad)
print(y.grad)>>> y = Variable(torch.Tensor([[1,2,3],[4,5,6]]), requires_grad=True)
>>> m = x.pow(2)
>>> m.requires_grad = False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().
>>> n = y.pow(3)
>>> z = m.pow(2)+3*n.pow(2)
>>> z.backward(torch.ones(2,3))
>>> print(m.requires_grad)
True
>>> print(x.grad)
Variable containing:
   4   32  108
 256  500  864
[torch.FloatTensor of size 2x3]

# 一个二维张量与一个一维张量相除结果
Variable containing:
 0.0000  0.2447  0.0000
 0.0000  0.2447  0.0000
[torch.cuda.FloatTensor of size 2x3 (GPU 0)]

Variable containing:
 0.0010
 2.0010
 0.0010
[torch.cuda.FloatTensor of size 3 (GPU 0)]

Variable containing:
 0.0000  0.1223  0.0000
 0.0000  0.1223  0.0000
[torch.cuda.FloatTensor of size 2x3 (GPU 0)]

# 一个二维张量与一个一维张量相除结果
Variable containing:
 0.0000  0.2447  0.0000
 0.0000  0.2447  0.0000
[torch.cuda.FloatTensor of size 2x3 (GPU 0)]

Variable containing:
 0.0010  2.0010  0.0010
[torch.cuda.FloatTensor of size 1x3 (GPU 0)]

Variable containing:
 0.0000  0.1223  0.0000
 0.0000  0.1223  0.0000
[torch.cuda.FloatTensor of size 2x3 (GPU 0)]

d = c.view(-1,1).sum(1)的排序方式
Variable containing:
(0 ,0 ,.,.) = 
  2.5000  6.5000
  4.5000  4.0000

(0 ,1 ,.,.) = 
  2.5000  6.5000
  4.5000  4.0000

(1 ,0 ,.,.) = 
  4.5000  6.5000
  6.5000  6.5000

(1 ,1 ,.,.) = 
  4.5000  6.5000
  6.5000  6.5000
[torch.cuda.FloatTensor of size 2x2x2x2 (GPU 0)]

Variable containing:
 2.5000
 6.5000
 4.5000
 4.0000
 2.5000
 6.5000
 4.5000
 4.0000
 4.5000
 6.5000
 6.5000
 6.5000
 4.5000
 6.5000
 6.5000
 6.5000

非leaf Variable不存储grad
xx = Variable(torch.randn(1,1), requires_grad = True)
yy = 3*xx
zz = yy**2
zz.backward()
xx.grad # 0.5137
yy.grad # None
zz.grad # None

注意：
a = Variable(torch.randn(2,10), requires_grad=True).cuda()
a不是leaf，Variable是leaf，但是使用cuda后生成另一个非leaf Variable。
如下才是leaf：
a = Variable(torch.randn(2,10).cuda(), requires_grad=True)

如果你想获得非leaf Variable的grad需要注入hook：
yGrad = torch.zeros(1,1)
def extract(xVar):
	global yGrad
	yGrad = xVar	

xx = Variable(torch.randn(1,1), requires_grad = True)
yy = 3*xx
zz = yy**2

yy.register_hook(extract)

#### Run the backprop:
print (yGrad) # Shows 0.
zz.backward()
print (yGrad) # Show the correct dzdy