英伟达官网手册:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installation
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html
一、 nvidia-container-toolkit 是什么?
NVIDIA Container Toolkit 使用户能够构建和运行 GPU 加速容器。该工具包包括一个容器运行时库和实用程序,用于自动配置容器以利用 NVIDIA GPU。
二、安装nvidia-container-toolkit
你需要先安装好docker和nvidia驱动
1. 配置存储库
英伟达官方存储库配置:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list |
sed ‘s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g’ |
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
由于官网的放在github上,访问很慢所以这里使用国内的存储库,中科大的。
curl -fsSL https://mirrors.ustc.edu.cn/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://mirrors.ustc.edu.cn/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://nvidia.github.io#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://mirrors.ustc.edu.cn#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
2. 更新软件包列表
apt-get update
3. 安装nvidia-container-toolkit
sudo apt-get install -y nvidia-container-toolkit
4. 验证安装
nvidia-container-cli --version
三、 配置
1. 配置docker
使用 nvidia-ctk 命令配置容器运行时:
该命令用于配置 Docker 以使用 NVIDIA 容器运行时。具体来说,它会修改 /etc/docker/daemon.json 文件,将 NVIDIA 容器运行时设置为 Docker 的默认运行时
配置 Docker 使用 NVIDIA 容器运行时:这允许 Docker 容器访问和利用 NVIDIA GPU 资源,从而支持 GPU 加速。
修改 /etc/docker/daemon.json 文件:该命令会将 NVIDIA 容器运行时的配置信息写入 Docker 的配置文件中。
$ sudo nvidia-ctk runtime configure --runtime=docker
INFO[0000] Loading config from /etc/docker/daemon.json
INFO[0000] Wrote updated config to /etc/docker/daemon.json
INFO[0000] It is recommended that docker daemon be restarted.
重启docker
systemctl restart docker
$ cat /etc/docker/daemon.json
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
查看docker 支持的运行时有没有nvidia
$ docker info | grep Runtimes
Runtimes: nvidia runc io.containerd.runc.v2
四、 启动容器运行 nvidia-smi 查看效果
$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
- --runtime=nvidia : 指定容器运行时
- --gpus all:请求所有可用的 GPU 资源
- nvidia-smi:查看 NVIDIA GPU 的状态信息,包括 GPU 使用率、内存使用情况等
$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Thu Feb 13 09:05:55 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A800 80GB PCIe Off | 00000000:34:00.0 Off | 0 |
| N/A 35C P0 51W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A800 80GB PCIe Off | 00000000:35:00.0 Off | 0 |
| N/A 36C P0 52W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A800 80GB PCIe Off | 00000000:36:00.0 Off | 0 |
| N/A 36C P0 50W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A800 80GB PCIe Off | 00000000:37:00.0 Off | 0 |
| N/A 36C P0 52W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A800 80GB PCIe Off | 00000000:9B:00.0 Off | 0 |
| N/A 34C P0 50W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A800 80GB PCIe Off | 00000000:9C:00.0 Off | 0 |
| N/A 35C P0 51W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A800 80GB PCIe Off | 00000000:9D:00.0 Off | 0 |
| N/A 35C P0 49W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A800 80GB PCIe Off | 00000000:9E:00.0 Off | 0 |
| N/A 35C P0 53W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+