nvidia-docker 安装报错记录

最近因为业务需要,很多服务要用docker部署,于是开始研究docker的使用。

代码是python,使用的深度学习框架为tensorflow,  按照官网说明,需要先安装 Docker 和 nvidia-docker。其中Docker的安装比较简单,基本就是参照了这篇文档:https://yeasy.gitbooks.io/docker_practice/install/centos.html

但是安装nvidia-docker时遇到了一些小坑,记录一下。

对于nvidia-docker的安装,基本是参照了官网的说明:https://github.com/NVIDIA/nvidia-docker

# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
  sudo tee /etc/yum.repos.d/nvidia-docker.repo

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo yum install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

前面几个步骤还比较顺利,就是最后执行

docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

这个命令时,一直报错。

报错内容如下:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:424: container init caused \"process_linux.go:407: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=all --compute --utility --require=cuda>=9.0 --pid=15997 /var/lib/docker/overlay2/5e678ed1c028293c3a8d9edc227307b89239e8c41672174811378c40e2dbbec9/merged]\\\\nnvidia-container-cli: requirement error: unsatisfied condition: cuda >= 9.0\\\\n\\\"\"": unknown.

仔细看最后一段“requirement error: unsatisfied condition: cuda >= 9.0” 猜测是cuda 版本问题。

查看本机cuda版本:

 cat /usr/local/cuda/version.txt

显示:CUDA Version 8.0.61

果然版本不行

于是修改上述命令:

docker run --runtime=nvidia --rm nvidia/cuda:8.0-base nvidia-smi

还是报错:

Unable to find image 'nvidia/cuda:8.0-base' locally
docker: Error response from daemon: manifest for nvidia/cuda:8.0-base not found.

应该是没有这个镜像文件

最后百度查到,正确的命令应该是:

docker run --runtime=nvidia --rm nvidia/cuda:8.0-devel nvidia-smi

成功显示:

Thu Feb 14 07:56:19 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 0000:00:0C.0     Off |                    0 |
| N/A   42C    P0    28W / 250W |      0MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

总结:测试命令也是要跟本机实际的cuda版本对应起来才行啊。

  • 7
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
安装nvidia-docker,您可以按照以下步骤进行操作: 1. 首先,您需要设置稳定版本的库和GPG密钥。您可以使用以下命令来添加密钥和库源: ``` distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list ``` \[1\] 2. 安装nvidia-docker。您可以使用以下命令来执行安装脚本: ``` distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list ``` \[3\] 请注意,这些命令需要在具有sudo权限的用户下运行。安装完成后,您就可以使用nvidia-docker来管理和运行基于NVIDIA GPU的容器了。 #### 引用[.reference_title] - *1* [nvidia-docker安装教程](https://blog.csdn.net/Harry_Jack/article/details/120415593)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [nvidia-docker安装指南](https://blog.csdn.net/Ber_Bai/article/details/127725437)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [nvidia-docker安装](https://blog.csdn.net/jndingxin/article/details/125058470)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值