本教程是在NVIDIA显卡的机器上配置深度学习环境,原理是利用NVIDIA-docker。
配置nvidia-docker需要安装NVIDIA驱动和docker 可参考官网
1. 安装NVIDIA驱动
1.1 添加nvidiarepository
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
备注:如果添加报错查看解决方法
1.2 选择驱动版本并安装
ubuntu-drivers devices
显示可用的驱动版本,例如:
driver : nvidia-410 - third-party free
driver : nvidia-415 - third-party free
driver : nvidia-418 - third-party free
driver : nvidia-384 - distro non-free
driver : nvidia-430 - third-party free recommended
driver : xserver-xorg-video-nouveau - distro free builtin
如果要安装430版本,就要如下命令:
sudo apt install nvidia-430
上面显示的驱动可能会有变化,例如这样(中间多了“driver字样”)
driver : nvidia-driver-410 - third-party free
driver : nvidia-driver-415 - third-party free
driver : nvidia-driver-440 - third-party free recommended
driver : nvidia-driver-430 - third-party free
driver : nvidia-driver-390 - third-party free
driver : nvidia-driver-435 - third-party free
driver : xserver-xorg-video-nouveau - distro free builtin
依然安装对应的驱动,例如:
sudo apt install nvidia-driver-430
注意:如果在 BIOS 中将 secure boot 设置为 on,在上述安装过程中可能出现设置 secure boot 密码的相关提示。如果在安全性方面要求不是很苛刻,可以考虑将 secure boot 设置为 off.
1.3 安装完重启
查看驱动是否安装成功:
nvidia-smi
备注:如果提示nvidia-smi找不到,按照上面的操作将secure boot 设置为 off.
2. docker安装
2.1 卸载旧版本
Docker 的旧版本被称为 docker,docker.io 或 docker-engine 。如果已安装,请卸载它们:
sudo apt-get remove docker docker-engine docker.io containerd runc
2.2 使用Docker仓库进行安装
在新主机上首次安装 Docker Engine-Community 之前,需要设置 Docker 仓库。之后,您可以从仓库安装和更新 Docker 。
A. 设置仓库
更新apt包索引。
sudo apt-get update
安装apt依赖包,用于通过HTTPS来获取仓库:
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
添加Docker的官方GPG秘钥:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
使用以下指令设置稳定版本仓库
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
B. 安装 Docker Engine-Community
更新apt包索引
sudo apt-get update
安装最新版本的 Docker Engine-Community 和 containerd ,或者转到下一步安装特定版本:
sudo apt-get install docker-ce docker-ce-cli containerd.io
要安装特定版本的 Docker Engine-Community,请在仓库中列出可用版本,然后选择一种安装。列出您的仓库中可用的版本:
$ apt-cache madison docker-ce
docker-ce | 5:18.09.1~3-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 5:18.09.0~3-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 18.06.1~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
docker-ce | 18.06.0~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
使用第二列中的版本字符串安装特定版本,例如 5:18.09.13-0ubuntu-xenial
sudo apt-get install docker-ce=<VERSION_STRING> docker-ce-cli=<VERSION_STRING> containerd.io
3. 配置nvidia-docker
Ubuntu 16.04/18.04, Debian Jessie/Stretch/Buster
# Add the package repositories
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker
Upgrading with nvidia-docker2 (Deprecated)
# On debian based distributions: Ubuntu / Debian
$ sudo apt-get update
$ sudo apt-get --only-upgrade install docker-ce nvidia-docker2 #注意执行是否忽略的nvidiadocker2
$ sudo systemctl restart docker
# All of the following options will continue working
$ sudo docker run --gpus all nvidia/cuda:9.0-base nvidia-smi
$ sudo docker run --runtime nvidia nvidia/cuda:9.0-base nvidia-smi
$ sudo nvidia-docker run nvidia/cuda:9.0-base nvidia-smi
测试:
nvidia-docker
转载请注明出处:https://blog.csdn.net/tbl1234567.作者:陶表犁