pytorch 中一个变量被detach()后,这个变量生成的其他变量还存在梯度的验证

对比学习中,simsiam用到了z变量停止梯度回传,z变量生成的变量p存在梯度,验证如下(鬼知道我为什么要验证。。):

import torch
import torch.nn as nn

def f(x):
    return x + 1.

def h(z):
    return z + 1.

def d(p,z):
    z = z.detach()
    return -(p*z).sum(dim=1).mean()

def save_grad(name):
    def hook(grad):
        grads[name] = grad
    return hook

if __name__ == '__main__':
    grads = {}

    x = torch.tensor([[2.,4,6],
                      [1,3,5]],requires_grad=True)
    x1 = x.sigmoid()
    x2 = x.relu()
    print(x1,x2)

    z1 = f(x1)
    z2 = f(x2)

    p1 = h(z1)
    p2 = h(z2)

    p1.register_hook(save_grad('p1'))
    z1.register_hook(save_grad('z1'))
    l = d(p1,z1) * 0.5 + d(p2,z2) * 0.5
    z2.register_hook(save_grad('z2'))
    # p2 = p2.detach()
    # p2.register_hook(save_grad('p2'))

    # print(x.grad,x1.grad,z1.grad,p1.grad)
    l.backward()
    print(x.grad,x1.grad,z1.grad,p1.grad,grads['p1'],grads['z1'],grads['z2'])

pytorch 为了节省显存,在反向传播的过程中只针对计算图中的叶子结点(leaf variable)保留了梯度值(gradient)。如果我们希望检测某些中间变量(intermediate variable) 的梯度就需要用到 tensor的register_hook接口。所以在输出中,只有x.grad 有输出,其他变量的grad输出都是None, p1,z1,z2利用了register_hook保存了.

tensor([[0.8808, 0.9820, 0.9975],
        [0.7311, 0.9526, 0.9933]], grad_fn=<SigmoidBackward>) tensor([[2., 4., 6.],
        [1., 3., 5.]], grad_fn=<ReluBackward0>)
None None None None
tensor([[-0.7994, -1.2588, -1.7512],
        [-0.5851, -1.0221, -1.5033]]) None None None tensor([[-0.4702, -0.4955, -0.4994],
        [-0.4328, -0.4881, -0.4983]]) tensor([[-0.4702, -0.4955, -0.4994],
        [-0.4328, -0.4881, -0.4983]]) tensor([[-0.7500, -1.2500, -1.7500],
        [-0.5000, -1.0000, -1.5000]])

如果程序加上:

p2 = p2.detach()
p2.register_hook(save_grad('p2'))

会报错:

raise RuntimeError("cannot register a hook on a tensor that "
RuntimeError: cannot register a hook on a tensor that doesn't require gradient

或者下面的例子:

import torch
import torch.nn as nn

class test(nn.Module):
    def __init__(self, in_dim, hid_dim, out_dim):
        super(test, self).__init__()
        self.mlp = nn.Sequential(nn.Linear(in_dim,hid_dim,bias=False),
                                 nn.ReLU(inplace=True),
                                 nn.Linear(hid_dim,out_dim))
        self.predictor = nn.Sequential(nn.Linear(out_dim,out_dim,bias=False),
                                 nn.ReLU(inplace=True))
    def forward(self,x1,x2):
        z1 = self.mlp(x1)
        z2 = self.mlp(x2)

        p1 = self.predictor(z1)
        p2 = self.predictor(z2)

        return p1,p2,z1.detach(),z2.detach()

def save_grad(name):
    def hook(grad):
        grads[name] = grad
    return hook

if __name__ == '__main__':
    grads = {}

    x = torch.tensor([[2.,4,6],
                      [1,3,5]],requires_grad=True)
    x1 = x.sigmoid()
    x2 = x.relu()
    print(x1,x2)
    t = test(x1.size(1),8,4)
    p1, p2, z1, z2 = t(x1,x2)
    p1.register_hook(save_grad('p1'))
    # z1.register_hook(save_grad('z1'))

    l = -(p1*z1).sum(dim=1).mean() * 0.5 -(p2*z2).sum(dim=1).mean() * 0.5

    # print(z1.grad, p1.grad)
    l.backward()
    # print(z1.grad, p1.grad,grads['p1'],grads['z1'])
    print(z1.grad, p1.grad, grads['p1'])    


输出如下:

tensor([[0.8808, 0.9820, 0.9975],
        [0.7311, 0.9526, 0.9933]], grad_fn=<SigmoidBackward>) tensor([[2., 4., 6.],
        [1., 3., 5.]], grad_fn=<ReluBackward0>)
None None tensor([[0.0126, 0.1019, 0.0038, 0.0216],
        [0.0064, 0.0897, 0.0021, 0.0195]])

如果在上述代码中加上:

z1.register_hook(save_grad('z1'))

会报错:

raise RuntimeError("cannot register a hook on a tensor that "
RuntimeError: cannot register a hook on a tensor that doesn't require gradient

有不对的地方麻烦指出来,花了一天看stop-gradient的这个操作,还是有点迷。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值