通过nvidia-smi查看到GPU版本信息不全, 请通过 nvidia-smi -q 查看完整信息
版本过旧报错内容
RuntimeError: [address=0.0.0.0:33981, pid=1754380] The NVIDIA driver on your system is too old (found version 11040).
Please update your GPU driver by downloading and installing a new version
from the URL: http://www.nvidia.com/Download/index.aspx Alternatively,
go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
到官网下载相应最新的驱动版本
通过检查pyTorch支持的最高CUDA为12.4, 所以选择12.4版本下载
选择 Driver Version 为 550.127.05 下载
下载文件 NVIDIA-Linux-x86_64-550.127.05.run
如果你不方便下载, 可以使用我分享的链接下载, 如果对你有帮助, 请点赞回复评论, 感谢
通过网盘分享的文件:NVIDIA-Linux-x86_64-550.127.05.run
链接: https://pan.baidu.com/s/1EjpEZCV8K8i2hztbBPUM4Q?pwd=xcew 提取码: xcew
--来自百度网盘超级会员v8的分享
然后将此驱动上传到目标机器任意目录
chmod +x NVIDIA-Linux-x86_64-550.127.05.run
安装前先卸载旧版本 切记否则会报下面的错误
sudo apt-get purge nvidia-*
ls /usr/bin | grep nvidia
ls /lib/modules/$(uname -r)/kernel/drivers/video/ | grep nvidia
sudo rm -rf /usr/local/cuda*
sudo rm -rf /usr/bin/nvidia*
sudo rm -rf /lib/modules/$(uname -r)/kernel/drivers/video/nvidia*
sudo ./NVIDIA-Linux-x86_64-550.127.05.run 回车开始安装
WARNING: Continuing installation despite the presence of a loaded NVIDIA kernel module. Some sanity checks will not be performed. It is
strongly recommended that you reboot your computer after installation is complete. If the installation is not successful after
rebooting the computer, you can run `nvidia-uninstall` to attempt to remove the NVIDIA driver.
OK
WARNING: Your driver installation has been altered since it was initially installed; this may happen, for example, if you have since
installed the NVIDIA driver through a mechanism other than nvidia-installer (such as your distribution's native package management
system). nvidia-installer will attempt to uninstall as best it can. Please see the file '/var/log/nvidia-installer.log' for
details.
OK
一回确认,
安装完成后, 提醒需要重启, 没有重启前执行 nvidia-smi 报错
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 550.127
此时核心是安装最新的, 驱动版本也需要更新, 通过官网查找相应版本, 我的操作系统是Ubuntu 20.04
在此网站找到对应文件下载并安装
此软件包如果下载不了, 可以来这里下载
通过网盘分享的文件:cuda-compat-12-4_550.127.05-0ubuntu1_amd64.deb
链接: https://pan.baidu.com/s/1h8ycdIpGJsPf9oC7cFT9YQ?pwd=p2i4 提取码: p2i4
--来自百度网盘超级会员v8的分享
安装
sudo apt install ./cuda-compat-12-4_550.127.05-0ubuntu1_amd64.deb
注意: 这里面可以安装失败, 请根据提示卸载旧版本
安装成功后是报这样的内容: cuda-compat-12-4 已经是最新版 (550.127.05-0ubuntu1)。
nvcc --version
然后重启服务器
sudo reboot
升级成功效果
重启后, 执行命令 nvidia-smi 仍然报错 是因为上面没有清理干净, 导致旧的版本生效导致的, 按上面的步骤清理干净
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 550.127
/proc/driver/nvidia/version
中显示的是 NVIDIA 内核驱动版本