pytorch hook的使用
钩子的含义
相当于插件。可以实现一些额外的功能,而又不用修改主体代码。把这些额外功能实现了挂在主代码上,所以叫钩子,很形象。
求中间变量的梯度报错:
UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten\src\ATen/core/TensorBody.h:417.)
return self._grad
以函数名为参数,对于注册了钩子的参数,在反向传播时将梯度作为参数传递给 函数名所对应的函数。
可在函数操作梯度值。钩子函数不应该修改输入和输出,并且在使用后应及时删除,以避免每次都运行钩子增加运行负载。钩子函数主要用在获取某些中间结果的情景,如中间某一层的输出或某一层的梯度。
register_hook,是针对Variable对象的
register_backward_hook和register_forward_hook是针对nn.Module这个对象的。
使用例子:
https://blog.csdn.net/DCGJ666/article/details/121638159
冻结
网络冻结
pytorch中关于网络的反向传播操作是基于Variable对象,Variable中有一个参数requires_grad,将requires_grad=False,网络就不会对该层计算梯度。
在用户手动定义Variable时,参数requires_grad默认值是False。而在Module中的层在定义时,相关Variable的requires_grad参数默认是True。
固定冻结:可直接修改网络
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
for p in self.parameters():
p.requires_grad=False
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
可以在中间插入requires_grad=False,插入行前面参数就是False
动态冻结
先查看需冻结名称
for name, module in model._modules.items():
print(name) # layerbase yourlayername
print(module)# Sequential((0):Liner (1): ReLU)等
for name, module in model._modules.items():
if name != "yourlayername":
# layers.append(name)
for p in module.parameters():
opt.append(p)
else:
for p in module.parameters():
p.requires_grad = False
model = MLP().cuda()
param_to_optim = []
for name, module in model._modules.items():
if name != "yourlayername":
for p in module.parameters():
param_to_optim.append(p)
else:
for p in module.parameters():
p.requires_grad = False
opt = torch.optim.Adam(param_to_optim,lr=1e-4)
loss = nn.MSELoss()
for i in tqdm(range(200)):
print(i)
ypred = model(x)
l = loss(ypred,yb)
opt.zero_grad()
l.backward()
opt.step()
for i in tqdm(range(200)):
if i == 0:
opt = torch.optim.Adam(model.parameters(),lr=1e-4)
if int((i/100)%2) != 0 :
param_to_optim = []
for name, module in model._modules.items():
if name == "layers":
for p in module.parameters():
param_to_optim.append(p)
opt = torch.optim.Adam(param_to_optim, lr=1e-4)
else:
param_to_optim = []
for name, module in model._modules.items():
if name != "layers":
for p in module.parameters():
param_to_optim.append(p)
opt = torch.optim.Adam(param_to_optim, lr=1e-4)
ypred = model(x,i)
l = loss(ypred,yb)
opt.zero_grad()
l.backward()
opt.step()