**
一、直接运行Dockerfile出错
**
报错:
E: Unable to locate package libcudnn7
E: Version '2.4.8-1+cuda10.1' for 'libnccl2' was not found
E: Version '2.4.8-1+cuda10.1' for 'libnccl-dev' was not found
分析:
首先下载的docker镜像应该没问题:
FROM nvidia/cuda:10.1-devel-ubuntu18.04
从https://hub.docker.com/r/nvidia/cuda里面找到了对应的系统pull下来
然后按照Dockerfile文件一行行排查,发现是安装libcudnn7出现问题
RUN apt-get update && apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends \
build-essential \
cmake \
g++-4.8 \
git \
curl \
docker.io \
vim \
wget \
ca-certificates \
libcudnn7=${CUDNN_VERSION} \
推测是镜像里面没有libcudnn7下载源,进去镜像查看:
docker ps(发现没有运行)
docker images(找到镜像文件)
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> bea2518b9b5d 15 hours ago 2.99GB
<none> <none> 6b7222d063f7 22 hours ago 2.88GB
nvidia/cuda 10.1-devel-ubuntu18.04 00e5c760566b 3 weeks ago 2.94GB
<none> <none> efbcebcd213a 6 weeks ago 2.8GB
nvidia/cuda <none> 95b9fad2b2bf 2 months ago 2.8GB
nvidia/cuda 10.1-devel-ubuntu16.04 a7223cfec628 5 months ago 2.85GB
hello-world latest feb5d9fea6a5 9 months ago 13.3kB
<none> <none> c665322da80e 13 months ago 7.56GB
docker run -d --name cuda nvidia/cuda:10.1-devel-ubuntu18.04 /bin/sh
错误执行方法:
docker run -d --name cuda nvidia/cuda:10.1-devel-ubuntu18.04 /bin/sh
---2ff5e4726288122258cccda865914ffd7ab35b3a9ea42e7e4ad47edaf902bc18
docker ps
---CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
docker ps -l
---CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
---2ff5e4726288 nvidia/cuda:10.1-devel-ubuntu18.04 "/bin/sh" 12 seconds ago Exited (0) 12 seconds ago cuda
docker rm 2ff
---2ff
正确:
docker run -itd --name cuda nvidia/cuda:10.1-devel-ubuntu18.04 /bin/sh(执行)
---9d394cbab53a595902c2bca7f647957882e2f138ad39d0e81ecaa021df4e15e1
docker ps(查看)
---CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
---9d394cbab53a nvidia/cuda:10.1-devel-ubuntu18.04 "/bin/sh" 4 seconds ago Up 3 seconds cuda
docker exec -it 9d3 /bin/sh(进入docker镜像)
exit(退出docker镜像)
查证:
docker exec -it 9d3 /bin/sh(进入docker镜像)
apt install libcudnn7=7.6.5.32-1+cuda10.1
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package libcudnn7(发现确实是镜像中没有下载源)
需要在镜像中手动添加下载源,后续研究加入
二、导入已有docker报错
需要用到已经built的docker,直接导入load发现有问题:
docker load < dro-sfm-image.tar
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.39/images/load?quiet=0: dial unix /var/run/docker.sock: connect: permission denied
解决方法:
sudo chmod a+rw /var/run/docker.sock
成功:
Loaded image ID: sha256:c665322da80e67af8dfea9f2ca38595838e6dd4eae3a9180eb2b1c0348b82d82