对比学习中,simsiam用到了z变量停止梯度回传,z变量生成的变量p存在梯度,验证如下(鬼知道我为什么要验证。。):
import torch
import torch.nn as nn
def f(x):
return x + 1.
def h(z):
return z + 1.
def d(p,z):
z = z.detach()
return -(p*z).sum(dim=1).mean()
def save_grad(name):
def hook(grad):
grads[name] = grad
return hook
if __name__ == '__main__':
grads = {}
x = torch.tensor([[2.,4,6],
[1,3,5]],requires_grad=True)
x1 = x.sigmoid()
x2 = x.relu()
print(x1,x2)
z1 = f(x1)
z2 = f(x2)
p1 = h(z1)
p2 = h(z2)
p1.register_hook(save_grad('p1'))
z1.register_hook(save_grad('z1'))
l = d(p1,z1) * 0.5 + d(p2,z2) * 0.5
z2.register_hook(save_grad('z2'))
# p2 = p2.detach()
# p2.register_hook(save_grad('p2'))
# print(x.grad,x1.grad,z1.grad,p1.grad)
l.backward()
print(x.grad,x1.grad,z1.grad,p1.grad,grads['p1'],grads['z1'],grads['z2'])
pytorch 为了节省显存,在反向传播的过程中只针对计算图中的叶子结点(leaf variable)保留了梯度值(gradient)。如果我们希望检测某些中间变量(intermediate variable) 的梯度就需要用到 tensor的register_hook
接口。所以在输出中,只有x.grad 有输出,其他变量的grad输出都是None, p1,z1,z2利用了register_hook保存了.
tensor([[0.8808, 0.9820, 0.9975],
[0.7311, 0.9526, 0.9933]], grad_fn=<SigmoidBackward>) tensor([[2., 4., 6.],
[1., 3., 5.]], grad_fn=<ReluBackward0>)
None None None None
tensor([[-0.7994, -1.2588, -1.7512],
[-0.5851, -1.0221, -1.5033]]) None None None tensor([[-0.4702, -0.4955, -0.4994],
[-0.4328, -0.4881, -0.4983]]) tensor([[-0.4702, -0.4955, -0.4994],
[-0.4328, -0.4881, -0.4983]]) tensor([[-0.7500, -1.2500, -1.7500],
[-0.5000, -1.0000, -1.5000]])
如果程序加上:
p2 = p2.detach() p2.register_hook(save_grad('p2'))
会报错:
raise RuntimeError("cannot register a hook on a tensor that "
RuntimeError: cannot register a hook on a tensor that doesn't require gradient
或者下面的例子:
import torch
import torch.nn as nn
class test(nn.Module):
def __init__(self, in_dim, hid_dim, out_dim):
super(test, self).__init__()
self.mlp = nn.Sequential(nn.Linear(in_dim,hid_dim,bias=False),
nn.ReLU(inplace=True),
nn.Linear(hid_dim,out_dim))
self.predictor = nn.Sequential(nn.Linear(out_dim,out_dim,bias=False),
nn.ReLU(inplace=True))
def forward(self,x1,x2):
z1 = self.mlp(x1)
z2 = self.mlp(x2)
p1 = self.predictor(z1)
p2 = self.predictor(z2)
return p1,p2,z1.detach(),z2.detach()
def save_grad(name):
def hook(grad):
grads[name] = grad
return hook
if __name__ == '__main__':
grads = {}
x = torch.tensor([[2.,4,6],
[1,3,5]],requires_grad=True)
x1 = x.sigmoid()
x2 = x.relu()
print(x1,x2)
t = test(x1.size(1),8,4)
p1, p2, z1, z2 = t(x1,x2)
p1.register_hook(save_grad('p1'))
# z1.register_hook(save_grad('z1'))
l = -(p1*z1).sum(dim=1).mean() * 0.5 -(p2*z2).sum(dim=1).mean() * 0.5
# print(z1.grad, p1.grad)
l.backward()
# print(z1.grad, p1.grad,grads['p1'],grads['z1'])
print(z1.grad, p1.grad, grads['p1'])
输出如下:
tensor([[0.8808, 0.9820, 0.9975],
[0.7311, 0.9526, 0.9933]], grad_fn=<SigmoidBackward>) tensor([[2., 4., 6.],
[1., 3., 5.]], grad_fn=<ReluBackward0>)
None None tensor([[0.0126, 0.1019, 0.0038, 0.0216],
[0.0064, 0.0897, 0.0021, 0.0195]])
如果在上述代码中加上:
z1.register_hook(save_grad('z1'))
会报错:
raise RuntimeError("cannot register a hook on a tensor that "
RuntimeError: cannot register a hook on a tensor that doesn't require gradient
有不对的地方麻烦指出来,花了一天看stop-gradient的这个操作,还是有点迷。