CentOS7 安装Nvidia Tesla T4驱动 CUDA CUDNN

显卡为 Nvidia Tesla T4

前置

安装gcc编译环境以及内核相关的包

# 添加阿里云的安装源

1

2

3

curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo

curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo

sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo

  

# 安装基础环境

1

yum -y install apr autoconf automake bash bash-completion bind-utils bzip2 bzip2-devel chrony cmake coreutils curl curl-devel dbus dbus-libs dhcp-common dos2unix e2fsprogs e2fsprogs-devel file file-libs freetype freetype-devel gcc gcc-c++ gdb glib2 glib2-devel glibc glibc-devel gmp gmp-devel gnupg iotop kernel kernel-devel kernel-doc kernel-firmware kernel-headers krb5-devel libaio-devel libcurl libcurl-devel libevent libevent-devel libffi-devel libidn libidn-devel libjpeg libjpeg-devel libmcrypt libmcrypt-devel libpng libpng-devel libxml2 libxml2-devel libxslt libxslt-devel libzip libzip-devel lrzsz lsof make microcode_ctl mysql mysql-devel ncurses ncurses-devel net-snmp net-snmp-libs net-snmp-utils net-tools nfs-utils nss nss-sysinit nss-tools openldap-clients openldap-devel openssh openssh-clients openssh-server openssl openssl-devel patch policycoreutils polkit procps readline-devel rpm rpm-build rpm-libs rsync sos sshpass strace sysstat tar tmux tree unzip uuid uuid-devel vim wget yum-utils zip zlib* jq

  


# 时间同步

1

systemctl start chronyd && systemctl enable chronyd

  


# 重启

1

reboot

  

# 整体升级

1

yum update -y

  


# 再次重启

1

reboot

  

检查

注意:安装内核包时需要先检查一下当前内核版本是否与所要安装的kernel-devel/kernel-doc/kernel-headers的版本一致,请务必保持两者版本一致,否则后续的编译过程会出问题。

1

2

3

4

5

6

7

8

9

10

11

12

[root@localhost opt]# uname -a

Linux localhost 3.10.0-1160.31.1.el7.x86_64 #1 SMP Thu Jun 10 13:32:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

[root@localhost opt]# yum list | grep kernel-

kernel-devel.x86_64                         3.10.0-1160.31.1.el7       @updates

kernel-doc.noarch                           3.10.0-1160.31.1.el7       @updates

kernel-headers.x86_64                       3.10.0-1160.31.1.el7       @updates

kernel-tools.x86_64                         3.10.0-1160.31.1.el7       @updates

kernel-tools-libs.x86_64                    3.10.0-1160.31.1.el7       @updates

kernel-abi-whitelists.noarch                3.10.0-1160.31.1.el7       updates

kernel-debug.x86_64                         3.10.0-1160.31.1.el7       updates

kernel-debug-devel.x86_64                   3.10.0-1160.31.1.el7       updates

kernel-tools-libs-devel.x86_64              3.10.0-1160.31.1.el7       updates

两种方法可以解决版本不一致的问题:

方法一、升级内核版本,具体升级方法请自行百度, 可以不用设为默认启动内核;

方法二、安装与内核版本一致的kernel-devel/kernel-doc/kernel-headers,例如:

1

yum install "kernel-devel-uname-r == $(uname -r)"

安装显卡驱动

下载

NVIDIA 驱动程序下载

查看支持显卡的驱动最新版本及下载,下载之后是.run后缀。然后上传到服务器任意位置即可

准备

禁用系统默认安装的 nouveau 驱动

1

2

3

4

5

6

7

8

9

10

11

12

13

14

# 修改配置

echo -e "blacklist nouveau\noptions nouveau modeset=0" /etc/modprobe.d/blacklist.conf

# 备份原来的镜像文件

cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak

# 重建新镜像文件

sudo dracut --force

# 重启

reboot

# 查看nouveau是否启动,如果结果为空即为禁用成功

lsmod | grep nouveau

安装DKMS模块

DKMS全称是DynamicKernel ModuleSupport,它可以帮我们维护内核外的驱动程序,在内核版本变动之后可以自动重新生成新的模块。

1

yum -y install dkms

  

安装

执行以下命令进行安装,文件名替换为自己的。

1

2

3

4

sudo sh NVIDIA-Linux-x86_64-410.129-diagnostic.run -no-x-check -no-nouveau-check -no-opengl-files<br>

# -no-x-check   #安装驱动时关闭X服务

# -no-nouveau-check   #安装驱动时禁用nouveau

# -no-opengl-files   #只安装驱动文件,不安装OpenGL文件

按照安装提示进行安装,一路点yes、ok

安装完之后输入以下命令 ,显示如下:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

[root@localhost opt]# nvidia-smi

Wed Jul  7 11:11:33 2021

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.129      Driver Version: 410.129      CUDA Version: 10.0     |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  Tesla T4            Off  | 00000000:41:00.0 Off |                    0 |

| N/A   94C    P0    36W /  70W |      0MiB / 15079MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes:                                                       GPU Memory |

|  GPU       PID   Type   Process name                             Usage      |

|=============================================================================|

|  No running processes found                                                 |

+-----------------------------------------------------------------------------+

  

安装CUDA

安装前检查

1、确定已经安装NVIDIA显卡

1

2

[root@localhost opt]# lspci | grep -i nvidia

41:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

2、确认安装gcc,如果没有安装需要安装

1

2

3

4

5

6

7

[root@localhost opt]# gcc --version

gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)

Copyright © 2015 Free Software Foundation, Inc.

本程序是自由软件;请参看源代码的版权声明。本软件没有任何担保;

包括没有适销性和某一专用目的下的适用性担保。

# yum -y install gcc  gcc-c++

3、禁用Nouveau

1

2

3

4

# 没有输出就是已经禁用了Nouveau

# 如果没有禁用, 看文档上面的禁用Nouveau

[root@localhost opt]# lsmod | grep nouveau

[root@localhost opt]#

4、设置开机启动级别

在加载Nouveau驱动程序或图形界面处于活动状态时,无法安装CUDA驱动程序

1

2

3

[root@localhost opt]# systemctl set-default multi-user.target

Removed symlink /etc/systemd/system/default.target.

Created symlink from /etc/systemd/system/default.target to /usr/lib/systemd/system/multi-user.target.

安装

此处的安装环境为离线环境,需要先下载cuda安装文件,安装文件可以去官网地址下载对应的系统版本,官网下载地址:CUDA Toolkit Archive | NVIDIA Developer

CUDA版本按照自己的需求选择即可, 这里我选择的安装类型为 runfile(local)

1

2

wget https://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run

sudo sh cuda_10.1.243_418.87.00_linux.run

接着,会出现安装界面,输入accept,

第二个界面, 直接选择install

安装后脚本输出, 临时保存一下, 后面需要:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

===========

= Summary =

===========

Driver:   Installed

Toolkit:  Installed in /usr/local/cuda-10.1/

Samples:  Installed in /root/, but missing recommended libraries

Please make sure that

 -   PATH includes /usr/local/cuda-10.1/bin

 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf andrun ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin

To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.1/doc/pdf for detailed information on setting upCUDA.

Logfile is /var/log/cuda-installer.log

添加CUDA进入环境变量

1

2

3

4

5

6

7

# 需要按照自己的cuda安装脚本输出来更改

[root@localhost cuda-10.1]# tail -5 /etc/profile

PATH=$PATH:/usr/local/cuda-10.1/bin/

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64/

export PATH

export LD_LIBRARY_PATH

[root@localhost cuda-10.1]# source /etc/profile

验证

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

[root@localhost cuda-10.1]# nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2019 NVIDIA Corporation

Built on Sun_Jul_28_19:07:16_PDT_2019

Cuda compilation tools, release 10.1, V10.1.243

[root@localhost NVIDIA_CUDA-10.1_Samples]# cd /root/NVIDIA_CUDA-10.1_Samples

[root@localhost NVIDIA_CUDA-10.1_Samples]# make

[root@localhost NVIDIA_CUDA-10.1_Samples]# cd 1_Utilities/deviceQuery

[root@localhost deviceQuery]# ./deviceQuery

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla T4"

  CUDA Driver Version / Runtime Version          10.1 / 10.1

  CUDA Capability Major/Minor version number:    7.5

  Total amount of global memory:                 15080 MBytes (15812263936 bytes)

  (40) Multiprocessors, ( 64) CUDA Cores/MP:     2560 CUDA Cores

  GPU Max Clock rate:                            1590 MHz (1.59 GHz)

  Memory Clock rate:                             5001 Mhz

  Memory Bus Width:                              256-bit

  L2 Cache Size:                                 4194304 bytes

  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)

  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers

  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       49152 bytes

  Total number of registers available per block: 65536

  Warp size:                                     32

  Maximum number of threads per multiprocessor:  1024

  Maximum number of threads per block:           1024

  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)

  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)

  Maximum memory pitch:                          2147483647 bytes

  Texture alignment:                             512 bytes

  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)

  Run time limit on kernels:                     No

  Integrated GPU sharing Host Memory:            No

  Support host page-locked memory mapping:       Yes

  Alignment requirement for Surfaces:            Yes

  Device has ECC support:                        Enabled

  Device supports Unified Addressing (UVA):      Yes

  Device supports Compute Preemption:            Yes

  Supports Cooperative Kernel Launch:            Yes

  Supports MultiDevice Co-op Kernel Launch:      Yes

  Device PCI Domain ID / Bus ID / location ID:   0 / 65 / 0

  Compute Mode:

     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1

Result = PASS

主要关注 Result = PASS 代表测试通过

安装cuDNN

下载

从官网上(cuDNN Archive | NVIDIA Developer)下载相关版本的CUDNN(需要先注册账号才能下载):

注意:要选择CUDA相对应版本的。

安装

上传并解压

1

2

3

4

5

6

7

8

9

10

11

12

[root@localhost opt]# cd /opt/

[root@localhost opt]# tar xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz

cuda/include/cudnn.h

cuda/NVIDIA_SLA_cuDNN_Support.txt

cuda/lib64/libcudnn.so

cuda/lib64/libcudnn.so.7

cuda/lib64/libcudnn.so.7.6.5

cuda/lib64/libcudnn_static.a

[root@localhost opt]# cp cuda/include/cudnn.h /usr/local/cuda/include

[root@localhost opt]# cp cuda/lib64/libcudnn* /usr/local/cuda/lib64

[root@localhost opt]# chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

[root@localhost opt]# chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

  • 0
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值