卸载docker
目前安装的是docker18.03:
状态正常:
启停docker服务用到的命令:
systemctl status docker
systemctl stop docker
systemctl start docker
查看ubuntu中docker相关的软件包:
dpkg -l | grep docker
卸载命令:
sudo apt-get remove --auto-remove docker*
sudo apt-get remove --purge docker*
删除Docker镜像、容器、数据卷等文件:
sudo rm -rf /var/lib/docker
卸载完成
安装nvidia-docker
参考:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
先安装docker-ce
curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker
上方是官方的快速安装脚本,具体安装的版本应该是最新版,如果用此脚本安装Docker,以后还可以使用此脚本更新.
激活docker:
sudo /lib/systemd/systemd-sysv-install enable docker
安装完成了,查看版本:
docker --version
执行没问题了,再改一下 /etc/docker/daemon.json 内容如下:
{
“registry-mirrors”: [“https://docker.mirrors.ustc.edu.cn/”],
“runtimes”: {
“nvidia”: {
“path”: “nvidia-container-runtime”,
“runtimeArgs”: []
}
}
}
然后执行命令:
systemctl daemon-reload
systemctl restart docker
正式安装nvidia-docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
update源的时候有个报错:
E: Conflicting values set for option Signed-By regarding source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64/ /: /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg !=
E: The list of sources could not be read.
解决方法:
cd /etc/apt/sources.list.d
sudo rm nvidia-*
最后测试一下:
先看CUDA版本一会要用版本信息:
cat /usr/local/cuda/version.txt
然后到这里找一下对应的你的cuda版本信息:https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/supported-tags.md
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-cudnn8-devel-ubuntu18.04 nvidia-smi
可以查看一下下载的镜像:
sudo docker images -a
这里完成nvidia-docker的正式安装,
权限控制
创建名为docker的组,如果之前已经有该组就会报错,可以忽略这个错误:
sudo groupadd docker
将当前用户加入组docker:
sudo gpasswd -a ${USER} docker
重启docker服务(生产环境请慎用):
sudo systemctl restart docker
添加访问和执行权限:
sudo chmod a+rw /var/run/docker.sock
重新启动:
sudo reboot
参考:https://blog.csdn.net/Harry_Jack/article/details/120415593
https://www.jianshu.com/p/01e1b6172603
https://www.freesion.com/article/44041118266/