GPU/CUDA/cuDNN/驱动相关查询

本文详细介绍了如何在Linux(Ubuntu)系统中查询GPU型号、CUDA版本、cuDNN版本以及GPU规格参数,包括使用命令行工具如lspci、nvidia-smi、cat等方法,并提供了CUDA与cuDNN版本对齐的查询指南。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一、参考资料

Linux(Ubuntu)系统查看显卡型号

二、显卡相关查询

1. 查询显卡型号

方法一(推荐)

查询显卡型号:

lspci | grep -i vga

输出结果为一个十六进制数字代码:

01:00.0 VGA compatible controller: NVIDIA Corporation Device 1f82 (rev a1)

查看十六进制代号:

查询网站:http://pci-ids.ucw.cz/mods/PC/10de?action=help?help=pci

查询结果:Name: TU107 [GeForce GTX 1650]

在这里插入图片描述

方法二

查询显卡型号:

nvidia-smi

2. GPU Specs Database

GPU Specs Database

3. 查询显卡算力

显卡算力表

4. 查询驱动版本

cat /proc/driver/nvidia/version

三、 CUDA/cuDNN相关查询

1. 查询CUDA版本

方法一

nvcc -V

输出结果:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

方法二
旧版本支持,新版本可能不存在 version.txt 文件。

cat /usr/local/cuda/version.txt

2. 查询cuDNN版本

Ubuntu 18.4 查看CUDNN版本

方法一

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

方法二

如果方法一失效,尝试方法二。

Step1:查看 cudnn.h 文件。

cat /usr/local/cuda/include/cudnn.h | grep cudnn
/*   cudnn : Neural Networks Library
#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"
#include "cudnn_backend.h"

Step2:cudnn.h 文件没有定义cuDNN版本,找到 cudnn_version.h 文件。

find / -name cudnn_version.h 2>/dev/null
/home/yoyo/Downloads/cuda/include/cudnn_version.h

Step3:查看cuDNN版本信息。

cat /home/yoyo/Downloads/cuda/include/cudnn_version.h
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 5

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

由此可见,cuDNN的版本为:8.0.5

3. 查询cuda算力/cuda cores核心数

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

# 清空之前编译的文件
sudo make clean

# 重新编译,-j8表示8线程用于加速
sudo make -j8 

./deviceQuery
# 如果最后一行出现 Result = PASS,说明cuda安装成功
yoyo@yoyo:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GTX 1650"
  CUDA Driver Version / Runtime Version          11.4 / 10.2
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 3904 MBytes (4093444096 bytes)
  (14) Multiprocessors, ( 64) CUDA Cores/MP:     896 CUDA Cores
  GPU Max Clock rate:                            1680 MHz (1.68 GHz)
  Memory Clock rate:                             4001 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
CUDA Capability Major/Minor version number:    7.5
Total amount of global memory:                 3904 MBytes (4093444096 bytes)
(14) Multiprocessors, ( 64) CUDA Cores/MP:     896 CUDA Cores

四、测试CUDA

1. 测试cuda是否安装成功(run方式安装)

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

# 清空之前编译的文件
sudo make clean

# 重新编译,-j8表示8线程用于加速
sudo make -j8 

./deviceQuery
# 如果最后一行出现 Result = PASS,说明cuda安装成功
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060"
  CUDA Driver Version / Runtime Version          11.4 / 10.2
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 12051 MBytes (12636061696 bytes)
MapSMtoCores for SM 8.6 is undefined.  Default to use 64 Cores/SM
MapSMtoCores for SM 8.6 is undefined.  Default to use 64 Cores/SM
  (28) Multiprocessors, ( 64) CUDA Cores/MP:     1792 CUDA Cores
  GPU Max Clock rate:                            1777 MHz (1.78 GHz)
  Memory Clock rate:                             7501 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 2359296 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 10 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060"
  CUDA Driver Version / Runtime Version          11.4 / 11.1
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 12051 MBytes (12636061696 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1777 MHz (1.78 GHz)
  Memory Clock rate:                             7501 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 2359296 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 10 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.1, NumDevs = 1
Result = PASS

重要说明

RTX 3060显卡是 Ampere 架构, cuda 11.1以上版本支持 RTX 3060 显卡;cuda 11.1 以下的版本,无法发挥 RTX 3060 的性能

第一次的结果:
(28) Multiprocessors, ( 64) CUDA Cores/MP:     1792 CUDA Cores

第二次的结果:
(28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores

2. 测试cuDNN是否安装成功(deb方式安装)

当选择deb方式进行安装时,会在 /usr/src/cudnn_samples_v7 有一些cudnn的sample,编译 mnistCUDNN 进行验证。

# 复制cuDNN samples到home目录下
cp -r /usr/src/cudnn_samples_v7 /$HOME

# 进入home目录
cd $HOME/cudnn_samples_v7/mnistCUDNN/

# 编译mnistCUDNN 
sudo make clean 
sudo make

# 运行mnistCUDNN 
# 如果出现Test passed!表明cuDNN已安装成功
sudo ./mnistCUDNN

五、CUDA与cuDNN版本对齐

cuDNN Support Matrix
cuDNN-Support-Matrix

六、GPU量化支持

Supported hardware
compute-capabilities

Table 4. Supported hardware
CUDA Compute CapabilityExample DeviceTF32FP32FP16INT8FP16 Tensor CoresINT8 Tensor CoresDLA
8.6NVIDIA A10YesYesYesYesYesYesNo
8.0NVIDIA A100/GA100 GPUYesYesYesYesYesYesNo
7.5Tesla T4NoYesYesYesYesYesNo
7.2Jetson AGX XavierNoYesYesYesYesYesYes
7.0Tesla V100NoYesYesYesYesNoNo
6.2Jetson TX2NoYesYesNoNoNoNo
6.1Tesla P4NoYesNoYesNoNoNo
6.0Tesla P100NoYesYesNoNoNoNo
5.3Jetson TX1NoYesYesNoNoNoNo
5.2Tesla M4NoYesNoNoNoNoNo
5.0Quadro K2200NoYesNoNoNoNoNo
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

花花少年

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值