GPU/CUDA/cuDNN/驱动相关查询

花花少年

已于 2024-11-17 10:07:30 修改

阅读量5.4k

点赞数 8

分类专栏：运维文章标签： GPU CUDA NVIDIA

于 2021-09-05 10:34:12 首次发布

本文链接：https://blog.csdn.net/m0_37605642/article/details/120101966

版权

运维专栏收录该内容

125 篇文章

订阅专栏

本文详细介绍了如何在Linux(Ubuntu)系统中查询GPU型号、CUDA版本、cuDNN版本以及GPU规格参数，包括使用命令行工具如lspci、nvidia-smi、cat等方法，并提供了CUDA与cuDNN版本对齐的查询指南。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、参考资料

Linux(Ubuntu)系统查看显卡型号

二、显卡相关查询

1. 查询显卡型号

方法一（推荐）

查询显卡型号：

lspci | grep -i vga

输出结果为一个十六进制数字代码：

01:00.0 VGA compatible controller: NVIDIA Corporation Device 1f82 (rev a1)

查看十六进制代号：

查询网站：http://pci-ids.ucw.cz/mods/PC/10de?action=help?help=pci

查询结果：Name: TU107 [GeForce GTX 1650]

在这里插入图片描述

方法二

查询显卡型号：

nvidia-smi

2. GPU Specs Database

GPU Specs Database

3. 查询显卡算力

显卡算力表

4. 查询驱动版本

cat /proc/driver/nvidia/version

三、 CUDA/cuDNN相关查询

1. 查询CUDA版本

方法一

nvcc -V

输出结果：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

方法二
旧版本支持，新版本可能不存在 version.txt 文件。

cat /usr/local/cuda/version.txt

2. 查询cuDNN版本

Ubuntu 18.4 查看CUDNN版本

方法一

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

方法二

如果方法一失效，尝试方法二。

Step1：查看 cudnn.h 文件。

cat /usr/local/cuda/include/cudnn.h | grep cudnn

/*   cudnn : Neural Networks Library
#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"
#include "cudnn_backend.h"

Step2：cudnn.h 文件没有定义cuDNN版本，找到 cudnn_version.h 文件。

find / -name cudnn_version.h 2>/dev/null

/home/yoyo/Downloads/cuda/include/cudnn_version.h

Step3：查看cuDNN版本信息。

cat /home/yoyo/Downloads/cuda/include/cudnn_version.h

#define CUDNN_MAJOR 8
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 5

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

由此可见，cuDNN的版本为：8.0.5。

3. 查询cuda算力/cuda cores核心数

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

# 清空之前编译的文件
sudo make clean

# 重新编译，-j8表示8线程用于加速
sudo make -j8 

./deviceQuery
# 如果最后一行出现 Result = PASS，说明cuda安装成功

yoyo@yoyo:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GTX 1650"
  CUDA Driver Version / Runtime Version          11.4 / 10.2
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 3904 MBytes (4093444096 bytes)
  (14) Multiprocessors, ( 64) CUDA Cores/MP:     896 CUDA Cores
  GPU Max Clock rate:                            1680 MHz (1.68 GHz)
  Memory Clock rate:                             4001 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

CUDA Capability Major/Minor version number:    7.5
Total amount of global memory:                 3904 MBytes (4093444096 bytes)
(14) Multiprocessors, ( 64) CUDA Cores/MP:     896 CUDA Cores

四、测试CUDA

1. 测试cuda是否安装成功（run方式安装）

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

# 清空之前编译的文件
sudo make clean

# 重新编译，-j8表示8线程用于加速
sudo make -j8 

./deviceQuery
# 如果最后一行出现 Result = PASS，说明cuda安装成功

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060"
  CUDA Driver Version / Runtime Version          11.4 / 10.2
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 12051 MBytes (12636061696 bytes)
MapSMtoCores for SM 8.6 is undefined.  Default to use 64 Cores/SM
MapSMtoCores for SM 8.6 is undefined.  Default to use 64 Cores/SM
  (28) Multiprocessors, ( 64) CUDA Cores/MP:     1792 CUDA Cores
  GPU Max Clock rate:                            1777 MHz (1.78 GHz)
  Memory Clock rate:                             7501 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 2359296 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 10 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060"
  CUDA Driver Version / Runtime Version          11.4 / 11.1
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 12051 MBytes (12636061696 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1777 MHz (1.78 GHz)
  Memory Clock rate:                             7501 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 2359296 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 10 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.1, NumDevs = 1
Result = PASS

重要说明

RTX 3060显卡是 Ampere 架构， cuda 11.1以上版本支持 RTX 3060 显卡；cuda 11.1 以下的版本，无法发挥 RTX 3060 的性能。

第一次的结果：
(28) Multiprocessors, ( 64) CUDA Cores/MP:     1792 CUDA Cores

第二次的结果：
(28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores

2. 测试cuDNN是否安装成功（deb方式安装）

当选择deb方式进行安装时，会在 /usr/src/cudnn_samples_v7 有一些cudnn的sample，编译 mnistCUDNN 进行验证。

# 复制cuDNN samples到home目录下
cp -r /usr/src/cudnn_samples_v7 /$HOME

# 进入home目录
cd $HOME/cudnn_samples_v7/mnistCUDNN/

# 编译mnistCUDNN 
sudo make clean 
sudo make

# 运行mnistCUDNN 
# 如果出现Test passed！表明cuDNN已安装成功
sudo ./mnistCUDNN

五、CUDA与cuDNN版本对齐

cuDNN Support Matrix
cuDNN-Support-Matrix

六、GPU量化支持

Supported hardware
compute-capabilities

Table 4. Supported hardware

CUDA Compute Capability	Example Device	TF32	FP32	FP16	INT8	FP16 Tensor Cores	INT8 Tensor Cores	DLA
8.6	NVIDIA A10	Yes	Yes	Yes	Yes	Yes	Yes	No
8.0	NVIDIA A100/GA100 GPU	Yes	Yes	Yes	Yes	Yes	Yes	No
7.5	Tesla T4	No	Yes	Yes	Yes	Yes	Yes	No
7.2	Jetson AGX Xavier	No	Yes	Yes	Yes	Yes	Yes	Yes
7.0	Tesla V100	No	Yes	Yes	Yes	Yes	No	No
6.2	Jetson TX2	No	Yes	Yes	No	No	No	No
6.1	Tesla P4	No	Yes	No	Yes	No	No	No
6.0	Tesla P100	No	Yes	Yes	No	No	No	No
5.3	Jetson TX1	No	Yes	Yes	No	No	No	No
5.2	Tesla M4	No	Yes	No	No	No	No	No
5.0	Quadro K2200	No	Yes	No	No	No	No	No