pytorch测试报错:RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module

模型在服务器多gpu上训练,测试在自己台式机上进行,只有一块gpu,测试报错:

  File "/home/fuxueping/sdb/PycharmProjects/face_recognition/test_face_recognition_pytorc
h.py", line 563, in <module>
    test(method)
  File "/home/fuxueping/sdb/PycharmProjects/face_recognition/test_face_recognition_pytorc
h.py", line 547, in test
    regfeat = extractFeature_mobileFace(net, regimglist, batch)
  File "/home/fuxueping/sdb/PycharmProjects/face_recognition/test_face_recognition_pytorc
h.py", line 190, in extractFeature_mobileFace
    checkpoint = torch.load(model)
  File "/home/fuxueping/anaconda2/envs/pytorch_yolov3/lib/python3.6/site-
packages/torch/serialization.py", line 303, in load
    return _load(f, map_location, pickle_module)
  File "/home/fuxueping/anaconda2/envs/pytorch_yolov3/lib/python3.6/site-
packages/torch/serialization.py", line 469, in _load
    result = unpickler.load()
  File "/home/fuxueping/anaconda2/envs/pytorch_yolov3/lib/python3.6/site-
packages/torch/serialization.py", line 437, in persistent_load
    data_type(size), location)
  File "/home/fuxueping/anaconda2/envs/pytorch_yolov3/lib/python3.6/site-
packages/torch/serialization.py", line 88, in default_restore_location

    result = fn(storage, location)
  File "/home/fuxueping/anaconda2/envs/pytorch_yolov3/lib/python3.6/site-
packages/torch/serialization.py", line 70, in _cuda_deserialize
    return obj.cuda(device)
  File "/home/fuxueping/anaconda2/envs/pytorch_yolov3/lib/python3.6/site-
packages/torch/_utils.py", line 68, in _cuda
    with torch.cuda.device(device):
  File "/home/fuxueping/anaconda2/envs/pytorch_yolov3/lib/python3.6/site-
packages/torch/cuda/__init__.py", line 227, in __enter__
    torch._C._cuda_setDevice(self.idx)
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:32

出错场景:我是在服务器上用显卡4,5,6,7上训练我的模型,但是模型还在继续跑,所以我在自己的台式机上测试训练模型的中间结果,台式机只有gpu0。在pytorch上重新load训练好的深度学习模型时报错:RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:32。

出错原因;引起这种报错的原因是因为pytorch在save模型的时候会把显卡的信息也保存,当重新load的时候,发现不是同一一块显卡就报错invalid device ordinal。

解决方法:

未修改前代码:

 #gpu 0
    checkpoint = torch.load(model)
    state_dict = {k.replace("module.", ""): v for k, v in checkpoint.items()}
    net.load_state_dict(state_dict)

修改后:

    #gpu 0
    checkpoint = torch.load(model,map_location=lambda storage, loc: storage.cuda(0))
    state_dict = {k.replace("module.", ""): v for k, v in checkpoint.items()}
    net.load_state_dict(state_dict)

详细说明,请参考:https://blog.csdn.net/yinhui_zhang/article/details/86572232

 

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

猫猫与橙子

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值