docker配置ubuntu16.04+cuda10.1+cudnn7详解

1、安装docker

  • 卸载apt官方旧版本的docker:
sudo apt-get remove docker docker-engine docker-ce docker.io
  • 更新apt包:
sudo apt-get update
  •  安装以下包以使apt可以通过HTTPS使用存储库(repository):
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
  • 添加docker官方密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
  • 设置stable存储库
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
  • 再次更新apt包:
sudo apt-get update
  • 安装最新docker CE:
sudo apt-get install -y docker-ce
  • 查看docker服务是否启动:
systemctl status docker
  • 若未启动,则启动docker
sudo systemctl start docker

2、安装nvidia-docker

若想在docker目前只支持运行cpu程序,若想调用主机gpu则需要安装nvidia官方提供的nvidia-docker。

官方地址:https://github.com/NVIDIA/nvidia-docker

若docker版本>19.03 则不需要安装nvidia-docker,只需要安装nvidia-container-tookit。

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

测试安装是否成功,此处会从docker官方仓库下载镜像。

#### Test nvidia-smi with the latest official CUDA image
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

# Start a GPU enabled container on two GPUs
docker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi

# Starting a GPU enabled container on specific GPUs
docker run --gpus '"device=1,2"' nvidia/cuda:10.0-base nvidia-smi
docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:10.0-base nvidia-smi

# Specifying a capability (graphics, compute, ...) for my container
# Note this is rarely if ever used this way
docker run --gpus all,capabilities=utility nvidia/cuda:10.0-base nvidia-smi

若输出gpu信息则成功。

Tue Apr 24 18:58:50 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25                 Driver Version: 390.25                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   53C    P5    27W / 280W |      0MiB / 11177MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

 

官方下载镜像很慢(翻墙大佬请略过以下部分),需配置国内镜像仓库。

sudo vim /etc/docker/daemon.json

打开如下图。

{

    "runtimes":{
                "nvidia":{
                    "path":"nvidia-container-runtime","
                     runtimeArgs":[]
        }
     }

 }

修改为:(文内为阿里云仓库,亲测可用,还有 https://registry.docker-cn.comhttp://hub-mirror.c.163.com 等等仓库)

{
    "registry-mirrors":["https://3laho3y3.mirror.aliyuncs.com"],
    "runtimes":{
                "nvidia":{
                    "path":"nvidia-container-runtime","
                     runtimeArgs":[]
        }
     }

 }

3、下载cuda/nvidia-ubuntu镜像

docker镜像官网:https://hub.docker.com/

进入官网搜索nvidia/cuda

 

选择tags,找到10.1-cudnn7-devel-ubuntu16.04(包含ubuntu系统库,cuda10.1,cudnn7),若不想包含系统库可以选用其它镜像。

下载镜像。

sudo docker pull nvidia/cuda:10.1-cudnn7-devel-ubuntu16.04

等待下载完成,运行docker images,查看是否存在镜像。

因镜像可能过大需要调整本地docker 镜像存储库大小,在docker.service中配置.

一般来说,docker.service 在/usr/lib/systemed/system/目录下,但是我测试时,却在/lib/systemed/system/目录下,注意防雷。

打开docker.service.

# cat /usr/lib/systemd/system/docker.service[Unit]
Description=Docker Application Container Engine
Documentation=http://docs.docker.com
After=network.target
Wants=docker-storage-setup.service
Requires=docker-cleanup.timer

[Service]
Type=notify
NotifyAccess=all
EnvironmentFile=-/run/containers/registries.conf
EnvironmentFile=-/etc/sysconfig/docker
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
Environment=GOTRACEBACK=crash
Environment=DOCKER_HTTP_HOST_COMPAT=1
Environment=PATH=/usr/libexec/docker:/usr/bin:/usr/sbin
ExecStart=/usr/bin/dockerd-current \
          --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current \
          --default-runtime=docker-runc \
          --exec-opt native.cgroupdriver=systemd \
          --userland-proxy-path=/usr/libexec/docker/docker-proxy-current \
          $OPTIONS \
          $DOCKER_STORAGE_OPTIONS \
          $DOCKER_NETWORK_OPTIONS \
          $ADD_REGISTRY \
          $BLOCK_REGISTRY \
          $INSECURE_REGISTRY\
      $REGISTRIES
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
Restart=on-abnormal
MountFlags=slave
KillMode=process

[Install]
WantedBy=multi-user.target

更改容器大小

[Service]
...
ExecStart=/usr/bin/dockerd 
--storage-driver devicemapper --storage-opt dm.loopdatasize=100G --storage-opt dm.loopmetadatasize=10G --storage-opt dm.fs=ext4 --storage-opt dm.basesize=30G
...

DOCKER最大空间为100G,容器最大空间为30G

改完之后需要重新加载文件,重启docker

systemctl daemon-reload

#重启docker
service docker restart

修改docker镜像存储路径

sudo docker info

输出如下:


Containers: 1
 Running: 0
 Paused: 0
 Stopped: 1
Images: 1
Server Version: 1.13.1
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: systemd
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Init Binary: /usr/libexec/docker/docker-init-current
containerd version:  (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: df5c38a9167e87f53a9894d77c0950e178a745e7 (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: fec3683b971d9c3ef73f284f176672c44b448662 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  WARNING: You're not using the default seccomp profile
  Profile: /etc/docker/seccomp.json
Kernel Version: 3.10.0-862.14.4.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 3
CPUs: 1
Total Memory: 991.7 MiB
Name: fuqiang
ID: F2MD:SKQC:HSZG:LN7H:L3KI:7SN2:JHRP:HMQI:3KK2:4RTO:TPTJ:UCYZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Registries: docker.io (secure)

可以看到Docker Root Dir:/var/lib/docker,就是镜像与容器实例的默认存储位置。往往当镜像很大时,此目录则不够存储,需更换目录。

镜像目标位置:/home/docker

停止docker服务:

systemctl stop docker

数据迁移:

sudo cp -r /var/lib/docker/ /home/docker

 

docker.service  添加--graph

[Service]
...
ExecStart=/usr/bin/dockerd --graph=your_docker_image_path
--storage-driver devicemapper --storage-opt dm.loopdatasize=100G --storage-opt dm.loopmetadatasize=10G --storage-opt dm.fs=ext4 --storage-opt dm.basesize=30G
...

 

启动docker服务:

systemctl start docker
systemctl status docker

则更换成功。

 

4、ubuntu 主机显示docker图形界面

通过网络方式,主机需安装xserver

A.在宿主机
查看宿主机IP
$ ifconfig                          ##假设为xxx.xxx.xxx.xx
查看当前显示的环境变量值
$ echo $DISPLAY   (要在显示屏查看,其他ssh终端不行)  ##假设为:0
或通过socket文件分析:
$ ll /tmp/.X11-unix/                            ##假设为X0= ---> :0

安装xserver
$ sudo apt install x11-xserver-utils
$ sudo vim /etc/lightdm/lightdm.conf 
增加许可网络连接
[SeatDefaults]
xserver-allow-tcp=true
重启xserver
$ sudo systemctl restart lightdm
许可所有用户都可访问xserver
xhost +


B.在docker 容器内
# export DISPLAY=xxx.xxx.xxx.xx:0

踩坑总结:

1、自定义ubuntu镜像,安装cuda,cudnn成功,但是c++ 调用cudnnapi失败,下载了nvidia/cuda镜像调用成功,原因不明。

2、容器大小不足,需要增加容器大小

3、本地镜像库不足,需要更换镜像库。在更换之前需要copy源目录下所有文件到目标目录。

 

欢迎评论,私信。

 

 

 

 

 

  • 6
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 5
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值