CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variabl

问题描述

安装OpenPCDet时,
python setup.py develop
报错:

UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /opt/conda/conda-bld/pytorch_1616554790289/work/c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-10.2'
running develop
running egg_info
writing pcdet.egg-info/PKG-INFO
writing dependency_links to pcdet.egg-info/dependency_links.txt
writing requirements to pcdet.egg-info/requires.txt
writing top-level names to pcdet.egg-info/top_level.txt
reading manifest file 'pcdet.egg-info/SOURCES.txt'
writing manifest file 'pcdet.egg-info/SOURCES.txt'
running build_ext
building 'pcdet.ops.iou3d_nms.iou3d_nms_cuda' extension
Traceback (most recent call last):
  File "setup.py", line 125, in <module>
    'src/sampling_gpu.cu',
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/site-packages/setuptools/command/develop.py", line 34, in run
    self.install_for_development()
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/site-packages/setuptools/command/develop.py", line 136, in install_for_development
    self.run_command('build_ext')
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/distutils/command/build_ext.py", line 339, in run
    self.build_extensions()
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 708, in build_extensions
    build_ext.build_extensions(self)
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
    self._build_extensions_serial()
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
    depends=ext.depends)
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 524, in unix_wrap_ninja_compile
    cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 423, in unix_cuda_flags
    cflags + _get_cuda_arch_flags(cflags))
  File "/home/gengruixu/anaconda3/envs/pytorch18/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1561, in _get_cuda_arch_flags
    arch_list[-1] += '+PTX'
IndexError: list index out of range

解决

可见,最初的错误是:

CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.


https://github.com/pytorch/pytorch/issues/49081#issuecomment-766793705
上找到解决方法:

yurunsheng1 commented on 25 Jan
apt-get install nvidia-modprobe

This works for me.

这个也work for me.

The nvidia-modprobe utility is used by user-space NVIDIA driver components to make sure the NVIDIA kernel module is loaded and that the NVIDIA character device files are present. These facilities are normally provided by Linux distribution configuration systems such as udev.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 2
    评论
这个错误通常表示在设置 CUDA_VISIBLE_DEVICES 环境变量后,程序尝试在没有可用 GPU 设备的情况下运行。这可能是由于以下原因之一引起的: 1. 没有正确安装 CUDA 驱动程序:请确保您已正确安装了与您的 GPU 相匹配的 CUDA 驱动程序。您可以从 NVIDIA 官方网站下载并安装适合您 GPU 的最新驱动程序。 2. CUDA 版本与 PyTorch 版本不兼容:请确保您安装的 PyTorch 版本与您的 CUDA 版本兼容。可以查看 PyTorch 官方文档中的兼容性矩阵来确定适用于您 CUDA 版本的 PyTorch 版本。 3. 硬件问题:如果您的电脑或服务器上没有可用的 GPU 设备,那么您将无法在 GPU 上运行程序。请确保您的硬件配置中包含适用于深度学习的 GPU 设备。 为了解决这个问题,您可以尝试以下几种方法: 1. 确保您已正确安装了相应的 CUDA 驱动程序,并且版本与 PyTorch 兼容。 2. 检查 CUDA_VISIBLE_DEVICES 环境变量的设置。如果您手动设置了这个变量,请确保设置正确,并且在程序启动之前进行设置。 3. 如果您没有可用的 GPU 设备,可以将程序切换到 CPU 运行模式。您可以在程序中添加以下代码,将 PyTorch 强制使用 CPU: ```python import os os.environ["CUDA_VISIBLE_DEVICES"] = "-1" ``` 这将禁用所有可用的 GPU 设备,并将计算转移到 CPU 上。 如果您尝试了以上方法仍然无法解决问题,可能需要进一步检查您的环境设置、硬件配置或安装过程中的任何错误。 希望这些信息对您有所帮助。如果您有其他问题,请随时向我提问。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

R.X. NLOS

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值