Pytorch多GPU训练时使用hook提取模型中间层输出时与模型输入张量不在同一个GPU上的解决办法
通常对于单卡训练的模型,使用hook可以较为方便地提取出模型中间层输出。
例如我们想要获取自定义模型DBL中的conv2d的输出,可以先打印出这个网络,获取到conv2d在模型中的次序,然后使用for循环确定其位置并注册hook。
参考https://www.jianshu.com/p/0a270d63aca9
import torch
import torch.nn as nn
class CBL(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, groups):
super(CBL, self).__init__()
pad = (kernel_size - 1) // 2
self.conv = nn.Sequential(
nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
kernel_size=kernel_size, stride=1, padding=pad,
groups=groups, bias=False),
nn.BatchNorm2d(out_channels, momentum=0.9, eps=1e-5),
nn.LeakyReLU(0.1),
)
## hook相关代码
self.mid_fea = []
for index_i, (name, module) in enumerate(self.named_modules()):
if index_i == 0: # conv在模型中的序号是0
module.register_forward_hook(hook=self.layer_hook)
# 必须在前向推理之前声明hook
break
def layer_hook(self, module, fea_in, fea_out):
self.mid_fea.append(fea_out)
def forward(self, x):
out = self.conv(x)
return out
if __name__ == "__main__":
# 这里为了方便没有使用gpu
model = CBL(8, 16, 3, 1)
x = torch.ones(1, 8, 10, 10)
out = model(x)
print(model.mid_fea[0])
然而当我们使用多个GPU训练模型时,上述方法得到的中间层输出可能总是与模型输入张量不在同一个gpu上,这可能会导致后续的计算报错。即使使用to(device),似乎总是不能把中间层输出移动到指定的gpu上。查了半天,论坛上给出了一个解决方法:不要使用列表保存中间层输出,而是使用字典,将不同的device上的中间层分别存放。示例如下
参考网址:https://discuss.pytorch.org/t/register-forward-hook-with-multiple-gpus/12115
import torch
import torch.nn as nn
class CBL(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, groups):
super(CBL, self).__init__()
pad = (kernel_size - 1) // 2
self.conv = nn.Sequential(
nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
kernel_size=kernel_size, stride=1, padding=pad,
groups=groups, bias=False),
nn.BatchNorm2d(out_channels, momentum=0.9, eps=1e-5),
nn.LeakyReLU(0.1),
)
## hook相关代码
self.mid_fea = {}
for index_i, (name, module) in enumerate(self.named_modules()):
if index_i == 1: # conv在模型中的序号是1
module.register_forward_hook(hook=self.layer_hook)
# 必须在前向推理之前声明hook
break
def layer_hook(self, module, fea_in, fea_out):
self.mid_fea[fea_in[0].device].append(fea_out)
def forward(self, x):
self.mid_fea[x.device] = []
out = self.conv(x)
return out, self.mid_fea[x.device][0] # 返回模型输出以及中间层特征
if __name__ == "__main__":
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CBL(8, 16, 3, 1).to(device)
model = nn.DataParallel(model) # 使用多张gpu
x = torch.ones(2, 8, 10, 10)
out, mid_fea = model(x)