Docker安装 Nvidia-container-runtime 安装
如果需要安装支持GPU的Docker容器,则选择安装最新的docker安装版本19.03+。
1、卸载Docker和Nvidia-container-runtime。
yum remove -y docker-ce docker-ce-cli containerd
yum remove -y nvidia-container-runtime* libnvidia-container*
2、执行如下命令,备份并移除daemon.json文件。
mkdir /etc/docker
vi /etc/docker/daemon.json
输入如下内容:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "10"
},
"bip": "169.254.123.1/24",
"oom-score-adjust": -1000,
"storage-driver": "overlay2",
"storage-opts":["overlay2.override_kernel_check=true"],
"live-restore": true
}```
备份该配置:
```bash
mv /etc/docker/daemon.json /tmp/daemon.json
如果需要指定位置安装,新增参数graph即可,如下:
3、执行以下命令,安装Docker。
在待升级Docker版本的节点上下载Docker安装包。
VERSION=19.03.5
URL=http://aliacs-k8s-cn-beijing.oss-cn-beijing.aliyuncs.com/public/pkg/docker/docker-${VERSION}.tar.gz
curl -ssL $URL -o /tmp/docker-${VERSION}.tar.gz
cd /tmp
tar -xf docker-${VERSION}.tar.gz
cd /tmp/pkg/docker/${VERSION}/rpm
yum localinstall -y $(ls .)
4、执行以下命令,在节点上安装Nvidia-container-runtime。
cd /tmp
yum install -y unzip
wget http://kubeflow.oss-cn-beijing.aliyuncs.com/nvidia.zip
unzip nvidia.zip
yum -y -q --nogpgcheck localinstall /tmp/nvidia/*
5、执行以下命令,配置daemon.json。
将上述的daemon.json覆盖/etc/docker/daemon.json,使原有配置生效。
mv /tmp/daemon.json /etc/docker/daemon.json
6、执行以下命令,重启Docker。
service docker start
systemctl enable docker
7、测试docker可正常使用nvidia
docker run --gpus all nvidia/cuda:10.2-base nvidia-smi
显示如下内容,则安装成功
[root@bogon tmp]# docker run --gpus all nvidia/cuda:10.2-base nvidia-smi
Unable to find image 'nvidia/cuda:10.2-base' locally
10.2-base: Pulling from nvidia/cuda
f08d8e2a3ba1: Pull complete
3baa9cb2483b: Pull complete
94e5ff4c0b15: Pull complete
1860925334f9: Pull complete
4e133546ace1: Pull complete
db5ee31a93b2: Pull complete
34fd2c745ff0: Pull complete
Digest: sha256:4918ce9269396c77da6b45ee97e227fb56f37093c498a2a0bbebfda785a3b69e
Status: Downloaded newer image for nvidia/cuda:10.2-base
Thu Aug 20 05:01:57 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:04:00.0 Off | N/A |
| 20% 33C P0 11W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
8、简单daemon.json配置如下:
{
"registry-mirrors": [
"https://registry.docker-cn.com",
"http://hub-mirror.c.163.com",
"https://docker.mirrors.ustc.edu.cn"
],
"insecure-registries": ["harbor.test.com","registry.cn-shenzhen.aliyuncs.com"],
"max-concurrent-downloads": 10
}
参考:
1、nvidia-docker离线安装包下载
http://mirror.cs.uchicago.edu/nvidia-docker/nvidia-container-runtime/stable/ubuntu16.04/amd64/
需要安装的包需要按照系统和需求自行选择。
参考:八、服务器【Ubuntu】GPU-TeslaP100部署中使用