retain_graph
只有叶子节点的梯度会被保留
在pytorch里面哪些是叶子节点呢?如果输入是requires_grad=True的话,输入是叶子节点,否则只有网络的参数是叶子节点。
叶子节点的梯度是累积的,也就是说如果多次调用backward没有对叶子节点的梯度清零的话,叶子节点的梯度是不断的累加的,但是中间其他节点的 梯度是全新的,下面是例子
from __future__ import print_function
import torch
from torch.autograd import Variable
x = Variable(torch.ones(2, 2), requires_grad = True)
y = x + 2
#y.register_hook(print)
z = y * y * 3
out = z.mean()
out.backward(retain_graph=True)
print(x.grad)
out.backward(retain_graph=True)
print(x.grad)
out.backward()
print(x.grad)
由上图可见,叶子节点的梯度是累加的
from __future__ import print_function
import torch
from torch.autograd import Variable
x = Variable(torch.ones(2, 2), requires_grad = True)
y = x + 2
y.register_hook(print)
z = y * y * 3
out = z.mean()
out.backward(retain_graph=True)
out.backward(retain_graph=True)
out.backward()
中间节点的梯度都是新的,不与上一次的叠加
optimizer.zero_grad()和model.zero_grad()只是用来清参数的梯度,不清其他叶子节点的梯度(通常输入是另一个叶子节点,但是输入的梯度对于我们来说通常是没有意义的,所以我们不用考虑输入的梯度,如果要对输入梯度进行清零,就单独写一段清零的代码就ok了)。中间的梯度只有通过钩子可以答应出来,具体代码如上。
detach()和clone()对梯度的影响
detach会阻断梯度,clone()不会阻断梯度,在反传的时候复制一份
from __future__ import print_function
import torch
from torch.autograd import Variable
x = Variable(torch.ones(2, 2), requires_grad = True)
y = x + 2
y.register_hook(print)
h1 = y.clone()
z = h1 * h1 * 3
out = z.mean()
out.backward(retain_graph=True)
out.backward(retain_graph=True)
out.backward()
结果
from __future__ import print_function
import torch
from torch.autograd import Variable
x = Variable(torch.ones(2, 2), requires_grad = True)
y = x + 2
y.register_hook(print)
h1 = y.clone()
h2 = y.clone()
z = h1 * h1 * 3 +h2*h2*3
out = z.mean()
out.backward(retain_graph=True)
out.backward(retain_graph=True)
out.backward()
这个时候传到y的梯度会加倍,因为复制了两份,每一份都会传来梯度,然后相加
但是detach()会阻断梯度的传播
from __future__ import print_function
import torch
from torch.autograd import Variable
x = Variable(torch.ones(2, 2), requires_grad = True)
y = x + 2
y.register_hook(print)
h1 = y.detach()
z = h1 * h1 * 3
out = z.mean()
out.backward(retain_graph=True)
out.backward(retain_graph=True)
out.backward()
这个时候梯度会阻断,程序会报错
打印参数梯度
for f in net.parameters():
print(f.grad)