1、安装docker
- 卸载apt官方旧版本的docker:
sudo apt-get remove docker docker-engine docker-ce docker.io
- 更新apt包:
sudo apt-get update
- 安装以下包以使apt可以通过HTTPS使用存储库(repository):
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
- 添加docker官方密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
- 设置stable存储库
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
- 再次更新apt包:
sudo apt-get update
- 安装最新docker CE:
sudo apt-get install -y docker-ce
- 查看docker服务是否启动:
systemctl status docker
- 若未启动,则启动docker
sudo systemctl start docker
2、安装nvidia-docker
若想在docker目前只支持运行cpu程序,若想调用主机gpu则需要安装nvidia官方提供的nvidia-docker。
官方地址:https://github.com/NVIDIA/nvidia-docker
若docker版本>19.03 则不需要安装nvidia-docker,只需要安装nvidia-container-tookit。
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
测试安装是否成功,此处会从docker官方仓库下载镜像。
#### Test nvidia-smi with the latest official CUDA image
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
# Start a GPU enabled container on two GPUs
docker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi
# Starting a GPU enabled container on specific GPUs
docker run --gpus '"device=1,2"' nvidia/cuda:10.0-base nvidia-smi
docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:10.0-base nvidia-smi
# Specifying a capability (graphics, compute, ...) for my container
# Note this is rarely if ever used this way
docker run --gpus all,capabilities=utility nvidia/cuda:10.0-base nvidia-smi
若输出gpu信息则成功。
Tue Apr 24 18:58:50 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25 Driver Version: 390.25 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:01:00.0 Off | N/A |
| 0% 53C P5 27W / 280W | 0MiB / 11177MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
官方下载镜像很慢(翻墙大佬请略过以下部分),需配置国内镜像仓库。
sudo vim /etc/docker/daemon.json
打开如下图。
{
"runtimes":{
"nvidia":{
"path":"nvidia-container-runtime","
runtimeArgs":[]
}
}
}
修改为:(文内为阿里云仓库,亲测可用,还有 https://registry.docker-cn.com,http://hub-mirror.c.163.com 等等仓库)
{
"registry-mirrors":["https://3laho3y3.mirror.aliyuncs.com"],
"runtimes":{
"nvidia":{
"path":"nvidia-container-runtime","
runtimeArgs":[]
}
}
}
3、下载cuda/nvidia-ubuntu镜像
docker镜像官网:https://hub.docker.com/
进入官网搜索nvidia/cuda
选择tags,找到10.1-cudnn7-devel-ubuntu16.04(包含ubuntu系统库,cuda10.1,cudnn7),若不想包含系统库可以选用其它镜像。
下载镜像。
sudo docker pull nvidia/cuda:10.1-cudnn7-devel-ubuntu16.04
等待下载完成,运行docker images,查看是否存在镜像。
因镜像可能过大需要调整本地docker 镜像存储库大小,在docker.service中配置.
一般来说,docker.service 在/usr/lib/systemed/system/目录下,但是我测试时,却在/lib/systemed/system/目录下,注意防雷。
打开docker.service.
# cat /usr/lib/systemd/system/docker.service[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=network.target
Wants=docker-storage-setup.service
Requires=docker-cleanup.timer
[Service]
Type=notify
NotifyAccess=all
EnvironmentFile=-/run/containers/registries.conf
EnvironmentFile=-/etc/sysconfig/docker
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
Environment=GOTRACEBACK=crash
Environment=DOCKER_HTTP_HOST_COMPAT=1
Environment=PATH=/usr/libexec/docker:/usr/bin:/usr/sbin
ExecStart=/usr/bin/dockerd-current \
--add-runtime docker-runc=/usr/libexec/docker/docker-runc-current \
--default-runtime=docker-runc \
--exec-opt native.cgroupdriver=systemd \
--userland-proxy-path=/usr/libexec/docker/docker-proxy-current \
$OPTIONS \
$DOCKER_STORAGE_OPTIONS \
$DOCKER_NETWORK_OPTIONS \
$ADD_REGISTRY \
$BLOCK_REGISTRY \
$INSECURE_REGISTRY\
$REGISTRIES
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
Restart=on-abnormal
MountFlags=slave
KillMode=process
[Install]
WantedBy=multi-user.target
更改容器大小
[Service]
...
ExecStart=/usr/bin/dockerd
--storage-driver devicemapper --storage-opt dm.loopdatasize=100G --storage-opt dm.loopmetadatasize=10G --storage-opt dm.fs=ext4 --storage-opt dm.basesize=30G
...
DOCKER最大空间为100G,容器最大空间为30G
改完之后需要重新加载文件,重启docker
systemctl daemon-reload
#重启docker
service docker restart
修改docker镜像存储路径
sudo docker info
输出如下:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 1.13.1
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Init Binary: /usr/libexec/docker/docker-init-current
containerd version: (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: df5c38a9167e87f53a9894d77c0950e178a745e7 (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: fec3683b971d9c3ef73f284f176672c44b448662 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
seccomp
WARNING: You're not using the default seccomp profile
Profile: /etc/docker/seccomp.json
Kernel Version: 3.10.0-862.14.4.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 3
CPUs: 1
Total Memory: 991.7 MiB
Name: fuqiang
ID: F2MD:SKQC:HSZG:LN7H:L3KI:7SN2:JHRP:HMQI:3KK2:4RTO:TPTJ:UCYZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Registries: docker.io (secure)
可以看到Docker Root Dir:/var/lib/docker,就是镜像与容器实例的默认存储位置。往往当镜像很大时,此目录则不够存储,需更换目录。
镜像目标位置:/home/docker
停止docker服务:
systemctl stop docker
数据迁移:
sudo cp -r /var/lib/docker/ /home/docker
docker.service 添加--graph
[Service]
...
ExecStart=/usr/bin/dockerd --graph=your_docker_image_path
--storage-driver devicemapper --storage-opt dm.loopdatasize=100G --storage-opt dm.loopmetadatasize=10G --storage-opt dm.fs=ext4 --storage-opt dm.basesize=30G
...
启动docker服务:
systemctl start docker
systemctl status docker
则更换成功。
4、ubuntu 主机显示docker图形界面
通过网络方式,主机需安装xserver
A.在宿主机
查看宿主机IP
$ ifconfig ##假设为xxx.xxx.xxx.xx
查看当前显示的环境变量值
$ echo $DISPLAY (要在显示屏查看,其他ssh终端不行) ##假设为:0
或通过socket文件分析:
$ ll /tmp/.X11-unix/ ##假设为X0= ---> :0
安装xserver
$ sudo apt install x11-xserver-utils
$ sudo vim /etc/lightdm/lightdm.conf
增加许可网络连接
[SeatDefaults]
xserver-allow-tcp=true
重启xserver
$ sudo systemctl restart lightdm
许可所有用户都可访问xserver
xhost +
B.在docker 容器内
# export DISPLAY=xxx.xxx.xxx.xx:0
踩坑总结:
1、自定义ubuntu镜像,安装cuda,cudnn成功,但是c++ 调用cudnnapi失败,下载了nvidia/cuda镜像调用成功,原因不明。
2、容器大小不足,需要增加容器大小
3、本地镜像库不足,需要更换镜像库。在更换之前需要copy源目录下所有文件到目标目录。
欢迎评论,私信。