服务器更新后,输入nvidia-smi出现如下报错:
解决方法参考:
输入命令查看nvidia驱动的版本号:
dpkg -l | grep nvidia
再输入命令查看内核的版本:
cat /proc/driver/nvidia/version
可以看到目前系统安装的 NVIDIA 驱动包版本是 470.256.02
,但是内核模块显示的版本是 535.183.01
。这意味着系统中安装的驱动包与正在使用的内核模块版本不匹配,导致了 GPU 驱动问题。
更新驱动后,仍有部分包安装失败:
未完全安装的包 (
iU
状态):这些包是 NVIDIA 驱动程序的重要组件,必须完全安装才能正常工作。
nvidia-dkms-535
nvidia-driver-535
nvidia-kernel-common-535
xserver-xorg-video-nvidia-535
修复未安装的包:
sudo apt --fix-broken install
报错了一堆依赖问题:
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
nvidia-dkms-535 : Depends: nvidia-firmware-535-535.183.01 but it is not going to be installed
nvidia-driver-535 : Depends: libnvidia-compute-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
Depends: libnvidia-extra-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
Depends: nvidia-compute-utils-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
Depends: libnvidia-decode-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
Depends: libnvidia-encode-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
Depends: nvidia-utils-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
Depends: libnvidia-cfg1-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
Recommends: libnvidia-compute-535:i386 (= 535.183.01-0ubuntu0.20.04.1) but it is not installable
Recommends: libnvidia-decode-535:i386 (= 535.183.01-0ubuntu0.20.04.1) but it is not installable
Recommends: libnvidia-encode-535:i386 (= 535.183.01-0ubuntu0.20.04.1) but it is not installable
Recommends: libnvidia-fbc1-535:i386 (= 535.183.01-0ubuntu0.20.04.1) but it is not installable
Recommends: libnvidia-gl-535:i386 (= 535.183.01-0ubuntu0.20.04.1) but it is not installable
nvidia-kernel-common-535 : Depends: nvidia-firmware-535-535.183.01 but it is not going to be installed
xserver-xorg-video-nvidia-535 : Depends: libnvidia-cfg1-535 (= 535.183.01-0ubuntu0.20.04.1) but it is not going to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).
然后尝试删掉有问题的包,还是同样的报错。陷入了循坏,修复包需要满足依赖项,删除包同样需要满足依赖项,做什么操作都会报错,依赖关系混乱,只能重装驱动:
尝试使用 dpkg
强制移除 NVIDIA 驱动,忽略依赖关系:
sudo dpkg -r --force-depends nvidia-driver-535
之后,清理系统并重新安装驱动:
sudo apt-get autoremove
sudo apt-get clean
sudo apt-get update
然后重新安装:
sudo apt-get install nvidia-driver-535
(这些步骤比较直接,需要谨慎操作,避免系统损坏!!)