Pytorch多GPU训练时使用hook提取模型中间层输出时与模型输入张量不在同一个GPU上的解决办法

最新推荐文章于 2022-10-16 01:47:10 发布

浩浩乎@

最新推荐文章于 2022-10-16 01:47:10 发布

阅读量1.4k

点赞数 1

本文链接：https://blog.csdn.net/qq7835144/article/details/122454868

版权

Python 同时被 2 个专栏收录

19 篇文章 0 订阅

订阅专栏

pytorch

1 篇文章 0 订阅

订阅专栏

Pytorch多GPU训练时使用hook提取模型中间层输出时与模型输入张量不在同一个GPU上的解决办法

通常对于单卡训练的模型，使用hook可以较为方便地提取出模型中间层输出。
例如我们想要获取自定义模型DBL中的conv2d的输出，可以先打印出这个网络，获取到conv2d在模型中的次序，然后使用for循环确定其位置并注册hook。
参考https://www.jianshu.com/p/0a270d63aca9

import torch
import torch.nn as nn

class CBL(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, groups):
        super(CBL, self).__init__()
        pad = (kernel_size - 1) // 2
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                      kernel_size=kernel_size, stride=1, padding=pad,
                      groups=groups, bias=False),
            nn.BatchNorm2d(out_channels, momentum=0.9, eps=1e-5),
            nn.LeakyReLU(0.1),
        )

        ## hook相关代码
        self.mid_fea = []
        for index_i, (name, module) in enumerate(self.named_modules()):
            if index_i == 0:        # conv在模型中的序号是0
                module.register_forward_hook(hook=self.layer_hook)
                # 必须在前向推理之前声明hook
                break
                
    def layer_hook(self, module, fea_in, fea_out):
        self.mid_fea.append(fea_out)

    def forward(self, x):
        out = self.conv(x)
        return out

if __name__ == "__main__":
	# 这里为了方便没有使用gpu
    model = CBL(8, 16, 3, 1)
    x = torch.ones(1, 8, 10, 10)
    out = model(x)
    print(model.mid_fea[0])

然而当我们使用多个GPU训练模型时，上述方法得到的中间层输出可能总是与模型输入张量不在同一个gpu上，这可能会导致后续的计算报错。即使使用to(device)，似乎总是不能把中间层输出移动到指定的gpu上。查了半天，论坛上给出了一个解决方法：不要使用列表保存中间层输出，而是使用字典，将不同的device上的中间层分别存放。示例如下
参考网址：https://discuss.pytorch.org/t/register-forward-hook-with-multiple-gpus/12115

import torch
import torch.nn as nn


class CBL(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, groups):
        super(CBL, self).__init__()
        pad = (kernel_size - 1) // 2
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                      kernel_size=kernel_size, stride=1, padding=pad,
                      groups=groups, bias=False),
            nn.BatchNorm2d(out_channels, momentum=0.9, eps=1e-5),
            nn.LeakyReLU(0.1),
        )

        ## hook相关代码
        self.mid_fea = {}
                for index_i, (name, module) in enumerate(self.named_modules()):
            if index_i == 1:        # conv在模型中的序号是1
                module.register_forward_hook(hook=self.layer_hook)
                # 必须在前向推理之前声明hook
                break

    def layer_hook(self, module, fea_in, fea_out):
        self.mid_fea[fea_in[0].device].append(fea_out)

    def forward(self, x):
        self.mid_fea[x.device] = []
        out = self.conv(x)
        return out, self.mid_fea[x.device][0]		# 返回模型输出以及中间层特征


if __name__ == "__main__":
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = CBL(8, 16, 3, 1).to(device)
    model = nn.DataParallel(model)          # 使用多张gpu

    x = torch.ones(2, 8, 10, 10)
    out, mid_fea = model(x)

浩浩乎@

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
Pytorch多GPU训练时使用hook提取模型中间层输出时与模型输入张量不在同一个GPU上的解决办法

Pytorch多GPU训练时使用hook提取模型中间层输出时与模型输入张量不在同一个GPU上的解决办法通常对于单卡训练的模型，使用hook可以较为方便地提取出模型中间层输出。例如我们想要获取自定义模型DBL中的conv2d的输出，可以先打印出这个网络，获取到conv2d在模型中的次序，然后使用for循环确定其位置并注册hook。参考https://www.jianshu.com/p/0a270d63aca9import torchimport torch.nn as nnclass CBL(n
复制链接

扫一扫