报错如下:
docker run -it --gpus all nvcr.io/nvidia/pytorch:24.08-py3 /bin/bash
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
处理前宿主机GPU节点必须识别显卡:
加载NVIDIA驱动nvidia-smi
解决如下:
缺少依赖nvidia-container-toolkit(需yum安装)
先添加nvidia-container-toolkit源:
yum config-manager --add-repo https://nvidia.github.io/nvidia-docker/centos8/nvidia-docker.repo
查看刚新加源:
cat nvidia-docker.repo
安装nvidia-container-toolkit:
yum install nvidia-container-toolkit
[root@asc2-gn01 yum.repos.d]# cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"registry-mirrors": [
"https://docker.nju.edu.cn",
"https://mirror.baidubce.com",
"https://hub-mirror.c.163.com",
"https://docker.mirrors.ustc.edu.cn"
],
"data-root": "/gpfs/docker"
}
重启docker服务:
systemctl restart docker
进入docker的pytorch容器里,并携带可识别宿主机gpu显卡参数:
docker run -it --gpus all nvcr.io/nvidia/pytorch:24.08-py3 /bin/bash
验证可识别宿主机显卡
nvidia-smi