1. 报错信息
修改Ubuntu驱动后,启动docker容器时报错:
Error response from daemon: could not select device driver "" with capabilities: [[gpu]]
Error: failed to start containers: cu111
2. 确认 NVIDIA 驱动安装
确保 NVIDIA 驱动已正确安装并正常运行。运行以下命令检查:
nvidia-smi
如果能正常显示驱动信息,则驱动已安装。
3. 安装 NVIDIA Docker 运行时
确保安装了 NVIDIA Docker 运行时。使用以下命令安装:
# 添加 NVIDIA Docker 的 GPG 密钥
distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
# 更新包信息并安装 NVIDIA Docker
sudo apt update
sudo apt install -y nvidia-docker2
上述命令可能会出现两个报错:
报错1:
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
执行上述命令时,获取失败:
gpg: no valid OpenPGP data found.
解决方法
手动本地下载文件:复制链接到浏览器打开,就会自动下载:
https://nvidia.github.io/nvidia-docker/gpgkey
上传到服务器,执行下面命令,安装,记得替换路径。
sudo apt-key add "/home/gy77/gpgkey"
报错2:
sudo apt install -y nvidia-docker2
执行上述命令时,nvidia-docker2时报错:
Err:1 https://nvidia.github.io/libnvidia-container/stable/deb/amd64 nvidia-container-toolkit-base 1.15.0-1
Could not handshake: Error in the pull function. [IP: 185.xxx.xxx.xxx 443]
Err:2 https://nvidia.github.io/libnvidia-container/stable/deb/amd64 nvidia-container-toolkit 1.15.0-1
Could not handshake: Error in the pull function. [IP: 185.xxx.xxx.xxx 443]
Err:3 https://nvidia.github.io/libnvidia-container/stable/deb/amd64 nvidia-docker2 2.14.0-1
Could not handshake: Error in the pull function. [IP: 185.xxx.xxx.xxx 443]
E: Failed to fetch https://nvidia.github.io/libnvidia-container/stable/deb/amd64/./nvidia-container-toolkit-base_1.15.0-1_amd64.deb Could not handshake: Error in the pull function. [IP: 185.xxx.xxx.xxx 443]
E: Failed to fetch https://nvidia.github.io/libnvidia-container/stable/deb/amd64/./nvidia-container-toolkit_1.15.0-1_amd64.deb Could not handshake: Error in the pull function. [IP: 185.xxx.xxx.xxx 443]
E: Failed to fetch https://nvidia.github.io/libnvidia-container/stable/deb/amd64/./nvidia-docker2_2.14.0-1_all.deb Could not handshake: Error in the pull function. [IP: 185.xxx.xxx.xxx 443]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
解决方法:
报错含义呐就是上述三个包,没有获取到(因为大家熟知的网络原因)。我们通过给出的三个http链接,本地将三个包下载后,上传到服务器,然后本地安装:
sudo apt install -y ./nvidia-*
4. 配置 Docker 使用 NVIDIA 运行时
编辑 Docker 配置文件 /etc/docker/daemon.json
,确保包含以下内容:
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
5. 重启 Docker 服务
在更改配置后,重启 Docker 服务:
sudo systemctl restart docker