RuntimeError: CUDA error: initialization error 的参考解决方法

写在前面

自己的测试环境:
Ubuntu20.04,python3.8

一、问题描述

运行 python 程序时,遇到如下报错:

Traceback (most recent call last):
  File "/home/wong/ProgramFiles/anaconda3/envs/pytorch_env/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/wong/ProgramFiles/anaconda3/envs/pytorch_env/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/wong/Datum/workspace_demo/packagetest_backbone_vs_pooling_ws/src/packagetest_backbone_vs_pooling/backbone_vs_pooling/train.py", line 103, in train_test
    backbone.to(device)
  File "/home/wong/ProgramFiles/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1160, in to
    return self._apply(convert)
  File "/home/wong/ProgramFiles/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/wong/ProgramFiles/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/wong/ProgramFiles/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/wong/ProgramFiles/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/home/wong/ProgramFiles/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1158, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/home/wong/ProgramFiles/anaconda3/envs/pytorch_env/lib/python3.8/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: CUDA error: initialization error
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "main.py", line 138, in <module>
    main(args)
  File "main.py", line 109, in main
    all_results = pool.starmap(train_test, params)
  File "/home/wong/ProgramFiles/anaconda3/envs/pytorch_env/lib/python3.8/multiprocessing/pool.py", line 372, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/home/wong/ProgramFiles/anaconda3/envs/pytorch_env/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
RuntimeError: CUDA error: initialization error
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions

在这里插入图片描述

二、解决方法

这个错误是由于 CUDA 初始化失败,并且在多进程情况下,由于同时访问 GPU 资源导致了冲突。以下是自己测试成功的解决方案:
默认的多进程启动方法是 fork,在 GPU 使用时可能导致 CUDA 初始化失败。将启动方法修改为 spawn 可以避免这个问题。

在主程序中添加以下内容:

import torch.multiprocessing as mp
if __name__ == '__main__':
    mp.set_start_method('spawn', force=True)  # 设置启动方法为 spawn
    main(args)

然后再次运行程序, 应该可以运行成功。

参考链接

[1] chat.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值