RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and

最新推荐文章于 2024-03-26 10:15:06 发布

Believe yourself!!!

最新推荐文章于 2024-03-26 10:15:06 发布

阅读量243

点赞数

文章标签：深度学习 pytorch

本文链接：https://blog.csdn.net/qq_58011370/article/details/134640160

版权

问题描述：

Traceback (most recent call last):
  File "train_cls.py", line 445, in <module>
    train(cfg=cfg)
  File "train_cls.py", line 288, in train
    cls1, cam1, cls2, cam2, cls3, cam3, cls4, cam4, attns = wetr(inputs, txt_features)
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data5/pengzhang/TPRO_nochange/cls_network/model.py", line 155, in forward
    k_fea4 = k_fc4(know_feature)
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data5/pengzhang/TPRO_nochange/cls_network/model.py", line 34, in forward
    x = self.fc1(x)
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:1! (when checking argument for argument mat1 in method wrapper_addmm)

可以看出此时指出的是我在模型训练的时候将不同的tensor放到了不同的设备上面，包括GPU和CPU，但是一般跑过深度学习代码的应该都知道：在模型训练的时候需要将所有的inputs、model都统一放到GPU或者CPU上面进行训练，一开始我也以为是不同设备冲突的问题，但是在经过仔细检查以后还是没有发现问题所在，最后在上面的报错中的最后一行

File "/home/pengzhang/anaconda3/envs/TPRO/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)

找到了问题所在，可以打开这个linear.py这个源代码，源代码如下所示：

def forward(self, input: Tensor) -> Tensor:
        return F.linear(input, self.weight, self.bias)

通过观察我们可以看出，此时我们的输入变量input是在GPU上面的，但是我们的self.weight和self.bias是定义在了CPU上面，所以这个时候我们需要将self.weight和self.bias也同样指定到和input一样的GPU上面即可解决这个问题，即：

    def forward(self, input: Tensor) -> Tensor:
        device = input.device
        return F.linear(input, self.weight.to(device), self.bias.to(device))

目前的疑惑：

linear.py这个是pytorch的官方源代码，之前也用过这个函数，也没有遇到过这个问题，之前都是程序自动就指定到GPU上面训练，我们只需要指定model和input即可，但是不知道为什么这次会遇到这个问题，非常欢迎知道这个问题的小伙伴在下方评论区留言！

Believe yourself!!!

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and

可以看出此时指出的是我在模型训练的时候将不同的tensor放到了不同的设备上面，包括GPU和CPU，但是一般跑过深度学习代码的应该都知道：在模型训练的时候需要将所有的inputs、model都统一放到GPU或者CPU上面进行训练，一开始我也以为是不同设备冲突的问题，但是在经过仔细检查以后还是没有发现问题所在，最后在上面的报错中的最后一行。
复制链接

扫一扫