解决RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0])

最新推荐文章于 2024-08-23 17:36:30 发布

qq_37088976

最新推荐文章于 2024-08-23 17:36:30 发布

阅读量1.1w

点赞数 6

分类专栏： pytorch 文章标签： pytorch

本文链接：https://blog.csdn.net/qq_37088976/article/details/98186931

版权

解决RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:1问题

在加载多GPU模型后使用pytorch的DataParallel()时出现以上报错，网上查找了很多资料都没有正确的解决方法。
个人猜想原因在于多GPU模型的参数保存在多卡上，而DataParrallel的forward源码中规定（如下所示），model和inputs都必须在GPU: 0上，因此该出现报错

def forward(self, *inputs, **kwargs):
    if not self.device_ids:
        return self.module(*inputs, **kwargs)

    for t in chain(self.module.parameters(), self.module.buffers()):
        if t.device != self.src_device_obj:
            raise RuntimeError("module must have its parameters and buffers "
                               "on device {} (device_ids[0]) but found one of "
                               "them on device: {}".format(self.src_device_obj, t.device))

    inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
    if len(self.device_ids) == 1: