return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None 【最全问题解决方案】

黄狗操作员

已于 2024-03-23 15:35:26 修改

阅读量2.3k

点赞数 21

文章标签：深度学习人工智能计算机视觉

于 2024-03-23 15:34:58 首次发布

本文链接：https://blog.csdn.net/HJKHUKB/article/details/136967942

版权

问题报错：return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

解决方法一：按照后面的提示增加环境变量 CUDA_LAUNCH_BLOCKING=1

建议直接用命令行制定执行时候的环境变量

CUDA_LAUNCH_BLOCKING=1 python train.py

解决方法二：你复制的代码的作者有多个gpu训练设备，并且和你的设备不匹配，你需要仔细查找每个设备的名称

快速搜索键：Ctrl+F , 搜索cuda/device，查看是否指定号码

例如你复制的代码里是 cuda:2

但是如果你是笔记本或者单显卡主机，你只有cuda:0

你替换 cuda:2 为 cuda:0 即可解决

解决方法三：可能是由于你的库和gpu驱动版本不兼容，当然这种情况较少，不建议轻易更换驱动，这将消耗大量时间，建议你多检查

解决方法四：重启（我知道这可能离谱，但确实有博客这么说）

解决方法五：内存问题

1.更换更diao的显卡

2.设置内存最大化

3.释放无关内存
if hasattr(torch.cuda, 'empty_cache'):
        torch.cuda.empty_cache()

4.可能需要杀死一些进程

调试过程可能的衍生问题：

ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant'

解决
pip install chardet

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_mm)
或者
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_mm)

这个直接是因为你代码里使用了不同的设备，特指不同的GPU，可能有好几个cuda，

解决方法：

device = input.device
在输出后面加上 .to(device)

一般情况下前三个可以解决问题，大概率是你复制或者摘过来代码中，原作者设备和你的不匹配导致的，你需要仔细检查，或者是使用.to(device）把计算放入你自己的单个或多个显卡设备。

由于看了很多博客，希望可以帮助大家一站式解决问题，所参考的博客放在下面：

如果侵权请联系我删除

https://blog.51cto.com/u_15717393/5471457#_22

Debug Pytorch: RuntimeError: CUDA error: device-side assert triggered_return t.to(device, dtype if t.is_floating_point()-CSDN博客

https://wenku.csdn.net/answer/5b75c85ae73511edbcb5fa163eeb3507

问题解决 | RuntimeError: CUDA error: invalid device ordinalCUDA kernel errors_runtimeerror: cuda error: invalid device ordinal c-CSDN博客

【PyTorch】CUDA error: device-side assert triggered_return t.to(device, dtype if t.is_floating_point()-CSDN博客

https://www.cnblogs.com/urahyou/p/17832397.html

解决RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:-CSDN博客