pytorch .detach() .detach_() 和 .data用于切断反向传播

最新推荐文章于 2024-07-26 13:24:37 发布

weixin_33913332

最新推荐文章于 2024-07-26 13:24:37 发布

阅读量4.1w

点赞数 42

文章标签：人工智能 python

原文链接：http://www.cnblogs.com/wanghui-garcia/p/10677071.html

版权

参考：https://pytorch-cn.readthedocs.io/zh/latest/package_references/torch-autograd/#detachsource

当我们再训练网络的时候可能希望保持一部分的网络参数不变，只对其中一部分的参数进行调整；或者值训练部分分支网络，并不让其梯度对主网络的梯度造成影响，这时候我们就需要使用detach()函数来切断一些分支的反向传播

1 detach()[source]

返回一个新的Variable，从当前计算图中分离下来的，但是仍指向原变量的存放位置,不同之处只是requires_grad为false，得到的这个Variable永远不需要计算其梯度，不具有grad。

即使之后重新将它的requires_grad置为true,它也不会具有梯度grad

这样我们就会继续使用这个新的Variable进行计算，后面当我们进行反向传播时，到该调用detach()的Variable就会停止，不能再继续向前进行传播

源码为：

def detach(self):
        """Returns a new Variable, detached from the current graph.
        Result will never require gradient. If the input is volatile, the output
        will be volatile too.
        .. note::
          Returned Variable uses the same data tensor, as the original one, and
          in-place modifications on either of them will be seen, and may trigger
          errors in correctness checks.
        """
        result = NoGrad()(self)  # this is needed, because it merges version counters
        result._grad_fn = None
　　　　 return result

可见函数进行的操作有：

将grad_fn设置为None
将Variable的requires_grad设置为False

如果输入 volatile=True(即不需要保存记录，当只需要结果而不需要更新参数时这么设置来加快运算速度)，那么返回的Variable volatile=True。（volatile已经弃用）

注意：

返回的Variable和原始的Variable公用同一个data tensor。in-place函数修改会在两个Variable上同时体现(因为它们共享data tensor)，当要对其调用backward()时可能会导致错误。

最低0.47元/天解锁文章

weixin_33913332

关注

42
点赞
踩
128

收藏

觉得还不错? 一键收藏
0
评论
pytorch .detach() .detach_() 和 .data用于切断反向传播

参考：https://pytorch-cn.readthedocs.io/zh/latest/package_references/torch-autograd/#detachsource当我们再训练网络的时候可能希望保持一部分的网络参数不变，只对其中一部分的参数进行调整；或者值训练部分分支网络，并不让其梯度对主网络的梯度造成影响，这时候我们就需要使用detach()函数来切断一些分支的反向传...
复制链接

扫一扫