NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
具体步骤如下:
第一步,打开终端,先用 nvidia-smi 查看一下,发现如下报错:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.
Make sure that the latest NVIDIA driver is installed and running.
第二步,使用 nvcc -V 检查驱动和 cuda。
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0
发现驱动是存在的,于是进行下一步
第三步,查看已安装驱动的版本信息
ls /usr/src | grep nvidia
比如服务器驱动版本是:nvidia-470.74
cat /proc/driver/nvidia/version
第四步,依次输入以下命令
sudo yum install dkms
sudo dkms install -m nvidia -v 470.74
等待安装完成后,再次输入 nvidia-smi,查看 GPU 使用状态:
报错解决!!