本文参考至:
https://docs.docker.com/engine/install/ubuntu/
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installing-on-ubuntu-and-debian
一、 安装docker,参考https://docs.docker.com/engine/install/ubuntu/
1.1 安装软件包以允许apt通过HTTPS使用存储库
sudo apt-get update && sudo apt-get install -y \
apt-transport-https ca-certificates curl software-properties-common gnupg2
1.2 添加Docker的官方GPG密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key --keyring /etc/apt/trusted.gpg.d/docker.gpg add -
1.3 添加Docker apt存储库
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
1.4 安装Docker CE
sudo apt-get update && sudo apt-get install -y \
containerd.io=1.2.13-2 \
docker-ce=5:19.03.11~3-0~ubuntu-$(lsb_release -cs) \
docker-ce-cli=5:19.03.11~3-0~ubuntu-$(lsb_release -cs)
1.5 创建/etc/docker
sudo mkdir /etc/docker
1.6 设置Docker守护程序配置
cat <<EOF | sudo tee /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes":{
"nvidia":{
"path":"nvidia-container-runtime",
"runtimeArgs":[]
}
},
"log-driver":"json-file",
"log-opts":{
"max-size":"200m",
"max-file":"3"
}
}
EOF
如果未设置"default-runtime": "nvidia"
(nvidia-docker为默认运行时),请使用 nvidia-docker run .....
运行容器
1.7 创建docker.service.d
sudo mkdir -p /etc/systemd/system/docker.service.d
1.8 重启Docker
sudo systemctl daemon-reload
sudo systemctl restart docker
1.9 设置Docker开机启动
sudo systemctl enable docker
二、 安装nvidia-docker
2.1 修改/etc/hosts
添加下面内容
# nvidia.github.io
185.199.108.153 nvidia.github.io
185.199.109.153 nvidia.github.io
185.199.110.153 nvidia.github.io
185.199.111.153 nvidia.github.io
2.2 设置存储库和GPG密钥
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
2.3 添加experimental存储库
curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
2.4 更新软件包清单
sudo apt-get update
2.5 安装nvidia-docker
sudo apt-get install -y nvidia-docker2
2.6 重启Docker
sudo systemctl restart docker
2.7 运行基本CUDA容器来测试是否有效
sudo docker run --rm --gpus all nvidia/cuda:11.1.1-base nvidia-smi
- 控制台输出如下所示:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+