文章目录
1. 安装 Nvidia 相关依赖
CentOS7
yum install -y nvidia-container-toolkit nvidia-container-runtime
2. 配置 dockerd 启动命令
修改 Systemd Service 配置文件
mkdir -p /etc/systemd/system/docker.service.d
tee /etc/systemd/system/docker.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
EOF
systemctl daemon-reload
systemctl restart docker
修改 Docker Daemon configuration file
tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
pkill -SIGHUP dockerd
你也可以将 nvidia
配置为默认的 runtime
,将下面的配置加入 /etc/docker/daemon.json
:
"default-runtime": "nvidia"
或者执行命令:
dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
重启 dockerd
systemctl restart docker
3. 测试
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
4. 踩坑笔记
-
异常1:启动 docker 之后,运行脚本,出现
open3d-python==0.3.0.0
依赖库异常ImportError: libGL.so.1: cannot open shared object file: No such file or direc tory
。安装
libgl1-mesa-glx
库:apt-get update && apt-get install libgl1-mesa-glx
如果输出以下
is not signed
错误,则需要首先将nvidia
的apt-get
软件源移走,然后再更新Ign:10 https://developer.download.nvidia.cn/compute/machine-learning/repos/ubuntu1804/x86_64 Release.gpg Reading package lists... Done W: GPG error: https://developer.download.nvidia.cn/compute/machine-learning/repos/ubuntu1804/x86_64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools <cudatools@nvidia.com> E: The repository 'https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release' is not signed. N: Updating from such a repository can't be done securely, and is therefore disabled by default. N: See apt-secure(8) manpage for repository creation and user configuration details.
执行:
mv /etc/apt/sources.list.d/cu͚da.list /etc/apt/sources.list.d/nvidia-ml.list /tmp