环境:Ubuntu16.04, TITAN V,CUDA9.0
安装Docker
sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install docker-ce
sudo apt-get install -y nvidia-docker2
设置阿里云镜像加速器
sudo vi /etc/docker/daemon.json
从https://cr.console.aliyun.com找到自己的镜像加速器
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"registry-mirrors":["https://<your id>.mirror.aliyuncs.com"]
}
配置代理(可选)
/etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://xx.xx.xx.xx:xx"
容器内部代理似乎只要在一个容器中的~/.bashrc中添加代理各个容器都会生效
export http_proxy="http://xx.xx.xx.xx:xx"
启动docker
sudo systemctl daemon-reload
sudo systemctl restart docker
运行可能的问题
too many open files
可能无影响
拉取pytorch镜像
从https://hub.docker.com/查找需要的镜像,例如我需要的是cuda9.0 pytorch4.1
sudo docker pull pytorch/pytorch:0.4.1-cuda9-cudnn7-devel
创建并进入容器
sudo nvidia-docker run --name example --gpus all -ti -v /path/to/some/folder:/workspace --ipc=host 7b329a33d981 /bin/bash
加上--rm参数会在退出后自动删除容器
打包新镜像
在容器内做修改后打包镜像并压缩
sudo docker commit -a "yfraquelle" <container_id> ioid_env:v1
sudo docker save -o yfraquelle_ioid_env_v1.tar ioid_env:v1
加载新镜像
sudo docker load -i yfraquelle_ioid_env_v1.tar
参考链接:
https://www.jianshu.com/p/5b99bc9b0c64
https://zhuanlan.zhihu.com/p/31742065
https://www.pythonf.cn/read/112154
https://discuss.pytorch.org/t/how-to-use-pytorch-docker-image/15929
https://blog.csdn.net/sunmingyang1987/article/details/104555190