安装CUDA时报错ModuleNotFoundError: No module named Quirks的原因

12 篇文章 0 订阅
8 篇文章 0 订阅

不久前在实验使用https://github.com/StanfordVL/rubiksnet这个视频动作识别模型时,发现其对python版要求3.7以上,于是尝鲜下载了个python3.9.6编译安装:

wget https://www.python.org/ftp/python/3.9.6/Python-3.9.6.tgz
tar xf Python-3.9.6.tgz
cd Python-3.9.6
sudo apt-get install build-essential python3-dev python3-setuptools python3-pip libncursesw5-dev libgdbm-dev libc6-dev zlib1g-dev libsqlite3-dev tk-dev libssl-dev openssl libffi-dev
./configure --with-ssl --prefix=/usr/local/python3
sudo make
sudo make install

然后手工修改python3链接由原有的python3.6指向python3.9

cd /usr/bin
rm python3
ln -s /usr/local/python3/bin/python3.6.9 python3

然后安装CUDA11.1.1(我的服务器使用的是RTX3090,需要使用这个版本以上才能正常工作,但是也不能安装最新的CUDA11.4,因为pytorch最新的1.9版本也支持到了CUDA11.1,如果安装了CUDA11.4,跑代码用到了cuda时肯定会报错RuntimeError: CUDA error: no kernel image is available for execution on the device):

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.1.1/local_installers/cuda-repo-ubuntu1804-11-1-local_11.1.1-455.32.00-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-1-local_11.1.1-455.32.00-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-1-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

结果报错:

Traceback (most recent call last):
  File "/usr/bin/quirks-handler", line 26, in <module>
    import Quirks.quirkapplier
ModuleNotFoundError: No module named 'Quirks'
dpkg: error processing package nvidia-dkms-470 (--configure):
 installed nvidia-dkms-470 package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of cuda-drivers-470:
 cuda-drivers-470 depends on nvidia-dkms-470 (>= 470.57.02); however:
  Package nvidia-dkms-470 is not configured yet.
...
update-initramfs: Generating /boot/initrd.img-5.4.0-72-generic
W: Possible missing firmware /lib/firmware/rtl_nic/rtl8125a-3.fw for module r8169
W: Possible missing firmware /lib/firmware/rtl_nic/rtl8168fp-3.fw for module r8169
Errors were encountered while processing:
 nvidia-dkms-470
 cuda-drivers-470
 nvidia-driver-470
 cuda-drivers
 cuda-runtime-11-4
 cuda-11-4
 cuda-demo-suite-11-4
 cuda
E: Sub-process /usr/bin/dpkg returned an error code (1)

试着单独安装ubuntu-drivers-common(Quirks包含在内):

sudo apt-get install --reinstall ubuntu-drivers-common

发现/usr/bin/quirks-handler更新了,但是还是报上面找不到Quirks的错,并且提示:

you can either revert the python3 link to the previous version, or change the python3 executable specified in /usr/bin/quirks-handler to the previous version executable(ex: python3.5).

感觉像是当前python版本python3.9.6太高了,CUDA安装还不支持,所以把python3的链接改回去指向python3.6:

cd /usr/bin
rm python3
ln -s python3.6 python3

再重新安装cuda就成功了,安装完CUDA后记得重启动让GPU driver生效,否则,可能还报下面的错误:

GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
>>> print(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/python3/lib/python3.9/site-packages/torch/tensor.py", line 193, in __repr__
    return torch._tensor_str._str(self)
  File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 383, in _str
    return _str_intern(self)
  File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 358, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 242, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/usr/local/python3/lib/python3.9/site-packages/torch/_tensor_str.py", line 90, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: no kernel image is available for execution on the device

上面这个错误,原因就是安装的pytorch版本和它支持的CUDA版本与现在安装的CUDA版本对不上,或者版本对得上,但是安装CUDA后还没有重启生效。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Arnold-FY-Chen

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值