pytorch_detach 切断网络反传

最新推荐文章于 2024-01-16 04:45:00 发布

JJunQw

最新推荐文章于 2024-01-16 04:45:00 发布

阅读量1.8k

点赞数

分类专栏： Pytorch_Mxnet

Pytorch_Mxnet 专栏收录该内容

17 篇文章 1 订阅

订阅专栏

detach

官方文档中，对这个方法是这么介绍的。

    detach = _add_docstr(_C._TensorBase.detach, r"""
    Returns a new Tensor, detached from the current graph.

    The result will never require gradient.

    .. note::

      Returned Tensor uses the same data tensor as the original one.
      In-place modifications on either of them will be seen, and may trigger
      errors in correctness checks.
    """)

返回一个新的从当前图中分离的 Variable。
返回的 Variable 永远不会需要梯度
如果被 detach 的Variable volatile=True，那么 detach 出来的 volatile 也为 True
还有一个注意事项，即：返回的 Variable 和被 detach 的Variable 指向同一个 tensor


import torch
from torch.nn import init

t1 = torch.tensor([1., 2.],requires_grad=True)
t2 = torch.tensor([2., 3.],requires_grad=True)
v3 = t1 + t2

v3_detached = v3.detach()
v3_detached.data.add_(t1) # 修改了 v3_detached Variable中 tensor 的值

print(v3, v3_detached)    # v3 中tensor 的值也会改变
print(v3.requires_grad,v3_detached.requires_grad)

'''
tensor([4., 7.], grad_fn=<AddBackward0>) tensor([4., 7.])
True False
'''

在pytorch中通过拷贝需要切断位置前的tensor实现这个功能。tensor中拷贝的函数有两个，一个是clone()，另外一个是copy_(),clone()相当于完全复制了之前的tensor，他的梯度也会复制，而且在反向传播时，克隆的样本和结果是等价的，可以简单的理解为clone只是给了同一个tensor不同的代号，和‘=’等价。所以如果想要生成一个新的分开的tensor，请使用copy_()。
不过对于这样的操作，pytorch中有专门的函数——detach()。

用户自己创建的节点是leaf_node(如图中的abc三个节点),不依赖于其他变量,对于leaf_node不能进行in_place操作.根节点是计算图的最终目标(如图y),通过链式法则可以计算出所有节点相对于根节点的梯度值.这一过程通过调用root.backward()就可以实现.
因此,detach所做的就是,重新声明一个变量,指向原变量的存放位置,但是requires_grad为false.更深入一点的理解是,计算图从detach过的变量这里就断了, 它变成了一个leaf_node.即使之后重新将它的requires_node置为true,它也不会具有梯度.

pytorch 梯度

(0.4之后),tensor和variable合并，tensor具有grad、grad_fn等属性；
默认创建的tensor，grad默认为False, 如果当前tensor_grad为None，则不会向前传播，如果有其它支路具有grad，则只传播其它支路的grad

# 默认创建requires_grad = False的Tensor
x = torch.ones(1)   # create a tensor with requires_grad=False (default)
print(x.requires_grad)
 # out: False
 
 # 创建另一个Tensor，同样requires_grad = False
y = torch.ones(1)  # another tensor with requires_grad=False
 # both inputs have requires_grad=False. so does the output
z = x + y
 # 因为两个Tensor x,y，requires_grad=False.都无法实现自动微分，
 # 所以操作（operation）z=x+y后的z也是无法自动微分，requires_grad=False
print(z.requires_grad)
 # out: False
 
 # then autograd won't track this computation. let's verify!
 # 因而无法autograd，程序报错
# z.backward()
 # out：程序报错：RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

    
# now create a tensor with requires_grad=True
w = torch.ones(1, requires_grad=True)
print(w.requires_grad)
 # out: True
 
 # add to the previous result that has require_grad=False
 # 因为total的操作中输入Tensor w的requires_grad=True，因而操作可以进行反向传播和自动求导。
total = w + z
# the total sum now requires grad!
total.requires_grad
# out: True
# autograd can compute the gradients as well
total.backward()
print(w.grad)
#out: tensor([ 1.])

# and no computation is wasted to compute gradients for x, y and z, which don't require grad
# 由于z，x，y的requires_grad=False,所以并没有计算三者的梯度
z.grad == x.grad == y.grad == None
# True

nn.Paramter

import torch.nn.functional as F

# With square kernels and equal stride
filters = torch.randn(8,4,3,3)
weiths = torch.nn.Parameter(torch.randn(8,4,3,3))

inputs = torch.randn(1,4,5,5)

out = F.conv2d(inputs, weiths, stride=2,padding=1)
print(out.shape)

con2d = torch.nn.Conv2d(4,8,3,stride=2,padding=1)

out_2 = con2d(inputs)

print(out_2.shape)