安装完nvidia-docker后,使用命令查看GPU使用率
nvidia-docker run --rm nvidia/cuda:10.1-devel nvidia-smi
报错如下:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": unknown.
解决如下:
首先确保已经安装了
- NVIDIA驱动
查看方式如下
nvidia-smi
输出
Tue Aug 25 13:48:25 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:3B:00.0 Off | 0 |
| N/A 29C P0 25W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 00000000:AF:00.0 Off | 0 |
| N/A 31C P0 25W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 00000000:D8:00.0 Off | 0 |
| N/A 35C P0 26W / 250W | 10MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
驱动下载地址驱动程序下载
2. CUDA库安装成功
查看方式如下
nvcc -V
输出
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
- 开启docker服务
systemctl enable docker # 开机自动启动docker
systemctl start docker # 启动docker
systemctl restart docker # 重启dokcer
其次上面报错的原因主要是因为volume路径不存在,需要自己创建,可以查看已有的volume
nvidia-docker volume ls
如果没有nvidia_driver_*,则创建新的volume,一定要带版本,418.39为CUDA版本
nvidia-docker volume create nvidia_driver_418.39
然后再执行
nvidia-docker run --rm nvidia/cuda:10.1-devel nvidia-smi