参考NVIDIA的教程.
一.
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0018] error waiting for container: context canceled
这里先尝试了github issue中的解决方案1,将 sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
拆分成
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
还是没有效果,这里因为在执行sudo apt-get update
时老是报deepin的错误,所以接下来尝试卸载deepin, 这里尝试了很多种方式, 在进行如下两种操作后, sudo apt-get update
不在显示和wine相关的错误.
1. 使用命令sudo apt remove deepin*
卸载掉安装的wine软件
2. 使用命令 find / -name '*wine\*'
在磁盘中找到wine的剩余文件,然后删除.
(base) wlj@wlj-OUC:~$ sudo find / -name '*wine*'
find: ‘/run/user/1000/gvfs’: Permission denied
/etc/apt/sources.list.d/deepin-wine.i-m.dev.list.save
/etc/apt/sources.list.d/deepin-wine.i-m.dev.list
/etc/apt/preferences.d/deepin-wine.i-m.dev.pref
之后在执行 sudo apt-get update
就可以了.然后又按照上面NVIDIA的教程重新走了一遍, 进行测试就没有问题了.
sudo docker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi
Wed May 4 02:11:20 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:01:00.0 Off | N/A |
| 29% 61C P0 45W / 175W | 0MiB / 7952MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 2070 Off | 00000000:03:00.0 Off | N/A |
| 38% 41C P0 1W / 175W | 0MiB / 7952MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
但是此时执行nvidia-docker仍然报错.
二.
(base) wlj@wlj-OUC:~$ nvidia-docker
nvidia-docker: command not found
然后发现上面那个链接指导并不全面, 找到了NVIDIA-docker 的官方git, 在其中一个安装指导链接, 这里我就只执行了如下三个命令就可以使用NVIDIA-docker了.
1.
curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get install -y nvidia-docker2
到此处, nvidia-docker 就可以使用了.
4. 当我在服务器上安装时,在第二的大问题上遇到了本地没有的错误,当我执行完二.1和 二.2的命令后,我又更新了一下,结果报错如下:
(base) ouc@ouc-Super-Server:~$ sudo apt-get update
E: Conflicting values set for option Signed-By regarding source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64/ /: /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg !=
E: The list of sources could not be read.
这里参考这里的解决方案,删除了一些文件
(base) ouc@ouc-Super-Server:/etc/apt/sources.list.d$ ls
mysql.list rvm-ubuntu-smplayer-bionic.list vscode.list
mysql.list.save rvm-ubuntu-smplayer-bionic.list.save vscode.list.save
nvidia-container-toolkit.list teamviewer.list
nvidia-docker.list teamviewer.list.save
(base) ouc@ouc-Super-Server:/etc/apt/sources.list.d$ sudo rm nvidia-*
然后我又从一开始走了一遍,到第二步的时候,我直接安装了NVIDIA-docker2,就成功了,奇奇怪怪。
(base) ouc@ouc-Super-Server:/etc/apt/sources.list.d$ sudo apt-get install -y nvidia-docker2
Reading package lists... Done
Building dependency tree
Reading state information... Done
(Reading database ... 248522 files and directories currently installed.)
Preparing to unpack .../nvidia-docker2_2.10.0-1_all.deb ...
Unpacking nvidia-docker2 (2.10.0-1) ...
Setting up nvidia-docker2 (2.10.0-1) ...
ok,服务器上的NVIDIA-docker也安装好了。