nvidia-smi报错:NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
由于服务器关机,服务器重启后,发现cuda不可用,输入“nvidia-smi”出现报错,如下所示:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
按照一下方式解决:
- 输入nvcc-V,出现下面提示:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0
- 安装dkms
sudo apt-get install dkms
- 查看本机连接不上的驱动版本
ls -l /usr/src/
得到下列提示:
total 20
drwxr-xr-x 25 root root 4096 4月 19 18:12 linux-headers-4.15.0-175
drwxr-xr-x 8 root root 4096 4月 19 18:12 linux-headers-4.15.0-175-generic
drwxr-xr-x 25 root root 4096 4月 21 06:08 linux-headers-4.15.0-176
drwxr-xr-x 8 root root 4096 4月 21 06:08 linux-headers-4.15.0-176-generic
drwxr-xr-x 8 root root 4096 4月 19 21:43 nvidia-510.60.02
显示nvidia-510.60.02
- 使用dkms重新安装适合驱动
sudo dkms install -m nvidia -v 510.60.02
这条命令 -v 后面需要填写本机的nvidia驱动版本,我这是510.60.02,由第三步得到!
- 重新输入nvidi-smi
成功显示
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:06:00.0 Off | N/A |
| 32% 50C P0 61W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:09:00.0 Off | N/A |
| 23% 34C P0 57W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:0A:00.0 Off | N/A |
| 34% 28C P0 53W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+