环境:ubuntu18.04
目标就是要能让应用程序可以在容器内调用CUDA API来操作GPU。
实现
1、在容器内应用程序可调用CUDA Runtime API和CUDA Libraries
2、在容器内能使用CUDA Driver相关库。因为CUDA Runtime API其实就是CUDA Driver API的封装,底层还是要调用到CUDA Driver API
3、在容器内可操作GPU设备。
步骤
宿主机安装显卡驱动
sudo sh NVIDIA-Linux-x86_64-465.27.run -no-x-check -no-nouveau-check -no-opengl-files --ui=none --no-questions 2>&1
宿主机验证:nvidia-smi
安装docker和docker-compose
首先更新资源update
sudo apt-get update
安装docker
sudo apt-get install docker
sudo apt-get install docker.io
sudo apt-get install docker-registry
停止、启动、重启docker
sudo systemctl start | stop | restart docker.service
安装docker-compose:sudo apt-get install docker-compose
查看版本
sudo docker -v
sudo docker-compose -v
确认版本如上图后,继续
curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DDCAE044F796ECB0
sudo apt-get update
下载安装包
sudo apt-get -d download nvidia-container-toolkit
sudo apt -d download nvidia-container-toolkit
sudo apt-get -d download nvidia-container-runtime
sudo apt -d download nvidia-docker2
sudo apt -d download nvidia-container-toolkit-base
sudo apt -d download libnvidia-container1
sudo apt -d download libnvidia-container-tools
注意:如有下载失败,从网盘下载
链接: https://pan.baidu.com/s/18WidNQ3l7iGzUtBorTqNPA?pwd=5rbq 提取码: 5rbq
安装
sudo dpkg -i nvidia-container-toolkit-base_1.11.0-1_amd64.deb
sudo dpkg -i libnvidia-container1_1.11.0-1_amd64.deb
sudo dpkg -i libnvidia-container-tools_1.11.0-1_amd64.deb
sudo dpkg -i nvidia-container-toolkit_1.11.0-1_amd64.deb
sudo dpkg -i nvidia-container-runtime_3.11.0-1_all.deb
sudo dpkg -i nvidia-docker2_2.11.0-1_all.deb
修改/etc/docker/daemon.json文件,执行reload
sudo systemctl daemon-reload
{
"insecure-registries": ["123.56.140.4:808","172.16.4.21:808","172.16.1.132:808"],
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
重启机器
sudo kill -SIGHUP $(pidof dockerd)
sudo reboot
验证 :
启动容器:
sudo docker run -itd --name XXXX -e nvidia-visible-device="all" cfc916d36a5b /bin/bash
sudo docker exec -it container-id /bin/sh
进入容器内部,执行nvidia-smi
得到如下图结果,说明部署容器化成功。