屏蔽 NVIDIA 显卡
1. nvidia-smi
yongqiang@deepnorth-amax:~$ nvidia-smi
Mon Sep 9 15:40:51 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.14 Driver Version: 430.14 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:1B:00.0 Off | N/A |
| 22% 42C P0 63W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:1C:00.0 Off | N/A |
| 23% 43C P0 57W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... Off | 00000000:1D:00.0 Off | N/A |
| 23% 43C P0 62W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 208... Off | 00000000:1E:00.0 Off | N/A |
| 22% 42C P0 49W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce RTX 208... Off | 00000000:89:00.0 Off | N/A |
| 22% 41C P0 75W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce RTX 208... Off | 00000000:8A:00.0 Off | N/A |
| 22% 43C P0 63W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce RTX 208... Off | 00000000:8B:00.0 Off | N/A |
| 22% 41C P0 50W / 250W | 0MiB / 11019MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce RTX 208... Off | 00000000:8C:00.0 Off | N/A |
| 20% 41C P0 54W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
yongqiang@deepnorth-amax:~$
2. sudo echo 1 > /sys/bus/pci/devices/0000:8a:00.0/remove
yongqiangh@deepnorth-amax:~$ sudo echo 1 > /sys/bus/pci/devices/0000\:8a\:00.0/remove
-bash: /sys/bus/pci/devices/0000:8a:00.0/remove: Permission denied
yongqiangh@deepnorth-amax:~$
yongqiangh@deepnorth-amax:~$ sudo su
[sudo] password for deepnorth:
root@deepnorth-amax:/home/deepnorth#
root@deepnorth-amax:/home/deepnorth# sudo echo 1 > /sys/bus/pci/devices/0000\:8a\:00.0/remove
root@deepnorth-amax:/home/deepnorth#
root@deepnorth-amax:/home/deepnorth# exit
exit
yongqiangh@deepnorth-amax:~$
yongqiangh@deepnorth-amax:~$ nvidia-smi
Mon Sep 9 15:53:20 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.14 Driver Version: 430.14 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:1B:00.0 Off | N/A |
| 22% 42C P0 63W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:1C:00.0 Off | N/A |
| 24% 44C P0 55W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... Off | 00000000:1D:00.0 Off | N/A |
| 23% 43C P0 61W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 208... Off | 00000000:1E:00.0 Off | N/A |
| 22% 42C P0 50W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce RTX 208... Off | 00000000:89:00.0 Off | N/A |
| 25% 44C P2 77W / 250W | 10527MiB / 11019MiB | 12% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce RTX 208... Off | 00000000:8B:00.0 Off | N/A |
| 22% 40C P0 49W / 250W | 0MiB / 11019MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce RTX 208... Off | 00000000:8C:00.0 Off | N/A |
| 19% 40C P0 54W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 4 6149 C python 10517MiB |
+-----------------------------------------------------------------------------+
yongqiangh@deepnorth-amax:~$
3. Bus Id
lspci | grep VGA
yongqiang@deepnorth-amax:~$ lspci | grep VGA
03:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
1b:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
1c:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
1d:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
1e:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
89:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
8b:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
8c:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
yongqiang@deepnorth-amax:~$
nvidia-smi -a | grep "Bus Id"
yongqiang@deepnorth-amax:~$ nvidia-smi -a | grep "Bus Id"
Bus Id : 00000000:1B:00.0
Bus Id : 00000000:1C:00.0
Bus Id : 00000000:1D:00.0
Bus Id : 00000000:1E:00.0
Bus Id : 00000000:89:00.0
Bus Id : 00000000:8B:00.0
Bus Id : 00000000:8C:00.0
yongqiang@deepnorth-amax:~$
lspci | grep -i nvidia
yongqiang@deepnorth-amax:~$ lspci | grep -i nvidia
1b:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
1b:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
1b:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
1b:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
1c:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
1c:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
1c:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
1c:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
1d:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
1d:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
1d:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
1d:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
1e:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
1e:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
1e:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
1e:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
89:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
89:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
89:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
89:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
8a:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
8a:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
8a:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
8b:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
8b:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
8b:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
8b:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
8c:00.0 VGA compatible controller: NVIDIA Corporation Device 1e04 (rev a1)
8c:00.1 Audio device: NVIDIA Corporation Device 10f7 (rev a1)
8c:00.2 USB controller: NVIDIA Corporation Device 1ad6 (rev a1)
8c:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1ad7 (rev a1)
yongqiang@deepnorth-amax:~
NVIDIA 显卡使用过程中,部分显卡出现 nvidia-smi 不可见现象,系统重启后,显卡 nvidia-smi 正常可见。上述现象多次出现时,确认每次不可见显卡 Bus Id 是否都是一样的。每次不可见显卡 Bus Id 都是一样情况下,考虑显卡已坏。