上次搞了好久的驱动配置又用不了,电脑又开始用eddy_openmp了,查看nvidia-smi显示如下报错
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
最近有看到大神的帖子说ubuntu系统重启后会自动更新内核,journalctl | grep "Linux version" 查看了一下确实有更新,但是问题在于,我的驱动已经用到560了,一般不会存在驱动版本跟不上内核更新的问题
apt-get purge nvidia* #卸载驱动后重新安装失败,报错信息如下
Error! Bad return status for module build on kernel: 5.15.0-119-generic (x86_64)
Consult /var/lib/dkms/nvidia/550.90.07/build/make.log for more information.
dpkg: error processing package nvidia-dkms-550-server-open (--configure):
installed nvidia-dkms-550-server-open package post-installation script subprocess returned error exit status 10
Setting up libnvidia-encode-550-server:amd64 (550.90.07-0ubuntu0.20.04.2) ...
Setting up libnvidia-encode-550-server:i386 (550.90.07-0ubuntu0.20.04.2) ...
dpkg: dependency problems prevent configuration of nvidia-driver-550-server-open:
nvidia-driver-550-server-open depends on nvidia-dkms-550-server-open (<= 550.90.07-1); however:
Package nvidia-dkms-550-server-open is not configured yet.
nvidia-driver-550-server-open depends on nvidia-dkms-550-server-open (>= 550.90.07); however:
Package nvidia-dkms-550-server-open is not configured yet.
dpkg: error processing package nvidia-driver-550-server-open (--configure):
dependency problems - leaving unconfigured
Processing triggers for libc-bin (2.31-0ubuntu9.16) ...
No apport report written because the error message indicates its a followup error from a previous failure.
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for desktop-file-utils (0.24-1ubuntu3) ...
Processing triggers for mime-support (3.64ubuntu1) ...
Processing triggers for gnome-menus (3.36.0-1ubuntu1) ...
Processing triggers for initramfs-tools (0.136ubuntu6.7) ...
update-initramfs: Generating /boot/initrd.img-5.15.0-119-generic
I: The initramfs will attempt to resume from /dev/nvme0n1p3
I: (UUID=ef6d7681-b7ff-4ae4-bb31-92f4be832ba1)
I: Set the RESUME variable to override this.
Errors were encountered while processing:
nvidia-dkms-550-server-open
nvidia-driver-550-server-open
E: Sub-process /usr/bin/dpkg returned an error code (1)
又尝试使用可视化界面安装,还是报错
############################################################
有网友提到是编译器版本的问题,很怀疑,因为当时为了迁就eddy_cuda 用了很老的cuda和编译器,但是很神奇的问题在于
使用 gcc --Version查看gcc版本时,显示
gcc (conda-forge gcc 12.3.0-10) 12.3.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
而使用sudo update-alternatives --config gcc 查看系统默认配置,显示的又是这个
There are 7 choices for the alternative gcc (providing /usr/bin/gcc).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/bin/gcc-5 90 auto mode
1 /usr/bin/g++-6 9 manual mode
2 /usr/bin/g++-7 1 manual mode
3 /usr/bin/g++-9 1 manual mode
4 /usr/bin/gcc-5 90 manual mode
* 5 /usr/bin/gcc-6 1 manual mode
6 /usr/bin/gcc-7 9 manual mode
7 /usr/bin/gcc-9 50 manual mode
虽然咱也不懂什么原因,直接sudo update-alternatives --config gcc 后选择gcc-9作为系统默认版本,突然就可以正常安装驱动跟调用了,非常神奇