环境说明:
操作系统:centos 7 最小化安装
docker-ce版本:19.03+
docker的安装很简单,直接配置好repo,在线yum即可安装
如果是内网环境,可以使用命令下载离线包及其依赖包:
repotrack docker-ce
yum install ./*.rpm
systemctl enable docker --now
配置nvidia-docker的源
# 国内访问比较慢 需要多次尝试
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
> && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo
# cat /etc/yum.repos.d/nvidia-docker.repo
[libnvidia-container]
name=libnvidia-container
baseurl=https://nvidia.github.io/libnvidia-container/stable/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
[libnvidia-container-experimental]
name=libnvidia-container-experimental
baseurl=https://nvidia.github.io/libnvidia-container/experimental/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=0
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
[nvidia-container-runtime]
name=nvidia-container-runtime
baseurl=https://nvidia.github.io/nvidia-container-runtime/stable/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/nvidia-container-runtime/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
[nvidia-container-runtime-experimental]
name=nvidia-container-runtime-experimental
baseurl=https://nvidia.github.io/nvidia-container-runtime/experimental/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=0
gpgkey=https://nvidia.github.io/nvidia-container-runtime/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
[nvidia-docker]
name=nvidia-docker
baseurl=https://nvidia.github.io/nvidia-docker/centos7/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/nvidia-docker/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
下载离线rpm包进行安装
repotrack nvidia-docker2
repotrack nvidia-container-toolkit
yum install ./*.rpm
升级内核
#升级内核
yum install kernel-headers kernel-devel kernel*
禁用默认的显示模块
#禁用模块
vim /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
# 升级boot
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
sudo dracut -v /boot/initramfs-$(uname -r).img $(uname -r)
安装显卡驱动(必须安装)
# 需要根据显卡型号去下载显卡驱动
wget https://cn.download.nvidia.cn/tesla/450.156.00/NVIDIA-Linux-x86_64-450.156.00.run
chmod +x NVIDIA-Linux-x86_64-450.156.00.run
./NVIDIA-Linux-x86_64-450.156.00.run #根据提示安装完成即可
安装完成后重启系统,执行命令验证
nvidia-smi
配置docker支持GPU:
cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
# 重启docker
systemctl daemon-reload && systemctl restart docker
# 验证容器内是否可以调用GPU
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
在docker中启用GPU支持,可以用来跑AI程序,提高GPU资源利用率,而不必宿主机只能跑几个程序,降低显卡资源的浪费。