八、服务器【Ubuntu】GPU-TeslaP100部署

服务器[Ubuntu]GPU-TeslaP100部署

一、初始设置

1、禁用nouveau

lsmod | grep nouveau

无显示,则不必再设置。

1.1【ubuntu】设置为:

1.1.1 执行 sudo vim /etc/modprobe.d/blacklist.conf, 在文件末尾添加一句blacklist nouveau
1.1.2 执行sudo update-initramfs -u并重启
1.1.3 重启电脑后执行lsmod | grep nouveau,如果没有输出则说明禁用nouveau成功

1.2【centos】参考:https://sixiangdefairy.blog.csdn.net/article/details/108118951

在这里插入图片描述

二、Nvidia驱动

1、驱动下载链接:https://www.nvidia.cn/Download/index.aspx?lang=cn

2、实际下载链接:

wget https://cn.download.nvidia.com/tesla/460.32.03/nvidia-driver-local-repo-ubuntu1604-460.32.03_1.0-1_amd64.deb

3、安装

参考:https://www.nvidia.cn/Download/driverResults.aspx/169718/cn

i) `dpkg -i nvidia-driver-local-repo-ubuntu1604-460.32.03_1.0-1_amd64.deb’ for Ubuntu
ii) `apt-get update`
iii) `apt-get install cuda-drivers`
iv) `reboot`

在这里插入图片描述

3.1 如下图为,ubuntu18.04+teslaP100安装的driver440的版本安装成功图:

在这里插入图片描述

三、CUDA【不需要安装】

1、驱动下载链接:https://developer.nvidia.com/cuda-toolkit-archive

2、实际下载链接:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin
sudo mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.2.1/local_installers/cuda-repo-ubuntu1604-11-2-local_11.2.1-460.32.03-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604-11-2-local_11.2.1-460.32.03-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1604-11-2-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda

3、安装

四、CUDNN【不需要安装】

1、驱动下载链接:https://developer.nvidia.com/rdp/cudnn-archive

2、实际下载链接:

wget https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.1.0.77/11.2_20210127/cudnn-11.2-linux-x64-v8.1.0.77.tgz

3、安装

将文件重命名, 以.tgz作为后缀, 然后使用tar -zxvf file.tgz命令解压即可
解压下载的文件,可以看到cuda文件夹,在当前目录打开终端,执行如下命令:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

4、测试:

查看CUDNN版本:

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

在这里插入图片描述

五、Docker安装

1、切换阿里云源

参考:https://blog.csdn.net/Bankeey/article/details/106478513
sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak
sudo vim /etc/apt/sources.list

填入如下内容:

deb http://mirrors.aliyun.com/ubuntu/ xenial main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ xenial main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-backports main restricted universe multiverse

deb http://archive.ubuntu.com/ubuntu/ xenial main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ xenial-security main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ xenial-updates main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ xenial-proposed main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ xenial-backports main restricted universe multiverse


apt-get update

2、安装docker

参考:https://blog.csdn.net/qq_27731689/article/details/92969266
#在Ubuntu系统中安装较为简单,官方提供了脚本供我们进行安装。

sudo apt install curl
curl -fsSL get.docker.com -o get-docker.sh
sudo sh get-docker.sh --mirror Aliyun

3、启动docker

参考:https://blog.csdn.net/qq_27731689/article/details/92969266

sudo systemctl enable docker
sudo systemctl start docker

4、问题解决(若无此问题,跳过):

问题:

root@ubuntu:/pro_setup/software/nvidia# sudo curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo: sudounable to resolve host ubuntu: 
unable to resolve host ubuntu

解决方案:

vi /etc/hosts

填入:

127.0.1.1 ubuntu

5、nvidia-docker 离线安装成功!!

1)离线安装包下载:http://mirror.cs.uchicago.edu/nvidia-docker/nvidia-container-runtime/stable/ubuntu16.04/amd64/
2)安装步骤参考:https://blog.51cto.com/dldxzjr/2541070

3)安装:
准备以下几个安装包:

libnvidia-container1_1.0.1-1_amd64.deb
libnvidia-container-tools_1.0.1-1_amd64.deb
nvidia-container-runtime_3.1.4-1_amd64.deb
nvidia-container-toolkit_1.0.5-1_amd64.deb

安装:

sudo apt install   ./lib*   ./nvidia*

更新daemon.json

sudo tee /etc/docker/daemon.json <<EOF
{
    "default-runtime":"nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
EOF

重启docker

sudo systemctl daemon-reload
sudo systemctl restart docker
sudo pkill -SIGHUP dockerd

测试:
也可通过:https://hub.docker.com/进行查询版本。

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
docker run --gpus all --rm nvidia/cuda nvidia-smi
  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值