文章目录
1. 安装NVIDIA驱动
1.1 查询显卡驱动版本
安装基本命令,进行显卡信息查询。
yum install -y lshw
lshw -numeric -C display
显卡结果如下:
*-display
description: 3D controller
product: GK180GL [Tesla K40c] [10DE:1024]
vendor: NVIDIA Corporation [10DE]
physical id: 0
bus info: pci@0000:02:00.0
logical name: /dev/fb0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list fb
configuration: depth=32 driver=nouveau latency=0 mode=1024x768 visual=truecolor xres=1024 yres=768
resources: iomemory:383f0-383ef iomemory:383f0-383ef irq:99 memory:d2000000-d2ffffff memory:383fe0000000-383fefffffff memory:383ff0000000-383ff1ffffff
1.2 驱动下载
到英伟达官网下载对应驱动。
网址:https://www.nvidia.com/Download/index.aspx?lang=en-us
我这里显示驱动为[Tesla K40c]
,所以我下载了驱动NVIDIA-Linux-x86_64-460.91.03.run
.
1.3 屏蔽系统自带的nouveau
修改dist-blacklist.conf
文件
vim /lib/modprobe.d/dist-blacklist.conf
## 屏蔽
#blacklist nvidiafb
## 新增
blacklist nouveau
options nouveau modeset=0
保存文件后,重启系统reboot
。
1.4 重建initramfs image步骤
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
# 修改运行级别为文本模式
systemctl set-default multi-user.target
完成以上操作后,重启系统reboot
。
1.5 驱动安装
先进行依赖安装,权限赋予:
yum install -y gcc && gcc-c++ && make && kernel-devel && kernel-headers
chmod a+x NVIDIA-Linux-x86_64-460.91.03.run
执行安装:
./NVIDIA-Linux-x86_64-460.91.03.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.42.2.el7.x86_64 -k $(uname -r)
1.6 驱动验证
完成安装后,使用nvidia-smi
命令进行驱动检查,结果如下:
Tue Oct 5 22:20:52 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K40c Off | 00000000:02:00.0 Off | 0 |
| 23% 36C P0 66W / 235W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40c Off | 00000000:03:00.0 Off | 0 |
| 23% 35C P0 66W / 235W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K40c Off | 00000000:83:00.0 Off | 0 |
| 23% 34C P0 64W / 235W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K40c Off | 00000000:84:00.0 Off | 0 |
| 23% 37C P0 68W / 235W | 0MiB / 11441MiB | 39% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
2. 安装CUDA
2.1 驱动下载
cuda驱动下载地址:https://developer.nvidia.com/cuda-toolkit-archive
由于我的nvidia
信息中CUDA Version: 11.2
所以我直接安装了该版本。
驱动下载及安装命令如下:
## 驱动下载
wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
2.2 驱动安装
## 权限赋予
chmod a+x cuda_11.2.2_460.32.03_linux.run
## 驱动安装
sudo sh cuda_11.2.2_460.32.03_linux.run
## 安装信息
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.2/
Samples: Installed in /root/, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-11.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.2/lib64, or, add /usr/local/cuda-11.2/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.2/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 460.00 is required for CUDA 11.2 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
2.3 环境变量配置
vim ~/.bashrc
## 在文本末尾加如下参数
export CUDA_HOME=/usr/local/cuda-11.2
export PATH=$CUDA_HOME/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=$CUDA_HOME/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
## 立即生效
source ~/.bashrc
2.4 驱动验证
nvcc -V
## 展示版本信息
vcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
2.5 CUDA测试
编译并测试设备deviceQuery
cd /usr/local/cuda-11.2/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
执行结果如下:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 4 CUDA Capable device(s)
Device 0: "Tesla K40c"
CUDA Driver Version / Runtime Version 11.2 / 11.2
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 11441 MBytes (11996954624 bytes)
(15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores
GPU Max Clock rate: 745 MHz (0.75 GHz)
Memory Clock rate: 3004 Mhz
Memory Bus Width: 384-bit
...
...
...
Device PCI Domain ID / Bus ID / location ID: 0 / 132 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla K40c (GPU0) -> Tesla K40c (GPU1) : Yes
> Peer access from Tesla K40c (GPU0) -> Tesla K40c (GPU2) : No
> Peer access from Tesla K40c (GPU0) -> Tesla K40c (GPU3) : No
> Peer access from Tesla K40c (GPU1) -> Tesla K40c (GPU0) : Yes
> Peer access from Tesla K40c (GPU1) -> Tesla K40c (GPU2) : No
> Peer access from Tesla K40c (GPU1) -> Tesla K40c (GPU3) : No
> Peer access from Tesla K40c (GPU2) -> Tesla K40c (GPU0) : No
> Peer access from Tesla K40c (GPU2) -> Tesla K40c (GPU1) : No
> Peer access from Tesla K40c (GPU2) -> Tesla K40c (GPU3) : Yes
> Peer access from Tesla K40c (GPU3) -> Tesla K40c (GPU0) : No
> Peer access from Tesla K40c (GPU3) -> Tesla K40c (GPU1) : No
> Peer access from Tesla K40c (GPU3) -> Tesla K40c (GPU2) : Yes
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.2, CUDA Runtime Version = 11.2, NumDevs = 4
Result = PASS
编译并测试带宽bandwidthTest
cd ../bandwidthTest
sudo make
./bandwidthTest
执行结果如下:
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: Tesla K40c
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 7.3
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 6.5
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 184.9
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
如果测试的最后结果都是Result = PASS,说明CUDA安装成功。
3. 安装cuDNN
官方下载地址为:https://developer.nvidia.com/rdp/cudnn-archive#a-collapse810-111
根据驱动和系统版本进行下载,然后进行依赖拷贝与授权操作。
## 解压
tar -xzvf cudnn-11.2-linux-x64-v8.1.0.77.tgz
## 复制
cp cuda/include/cudnn.h /usr/local/cuda-11.2/include/
cp cuda/lib64/libcudnn* /usr/local/cuda-11.2/lib64/
## 授权
sudo chmod a+r /usr/local/cuda-11.2/include/cudnn.h /usr/local/cuda-11.2/lib64/libcudnn*