CUDA/cuDNN/GPU/驱动相关查询

一、查看显卡型号

参考资料

Linux(Ubuntu)系统查看显卡型号

方法一(推荐)

  1. 查看显卡型号
lspci | grep -i vga

# 返回的是一个十六进制数字代码
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1f82 (rev a1)
  1. 查看十六进制代号
# 查询网站
http://pci-ids.ucw.cz/mods/PC/10de?action=help?help=pci

# 查的结果
Name: TU107 [GeForce GTX 1650] 

方法二

  1. 查看显卡型号
nvidia-smi

查看驱动版本

cat /proc/driver/nvidia/version

查看显卡算力

显卡算力表

二、 查看CUDA版本

  1. 方法一
nvcc -V
  1. 方法二
cat /usr/local/cuda/version.txt

查看cuda算力/cuda cores核心数

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

# 清空之前编译的文件
sudo make clean

# 重新编译,-j8表示8线程用于加速
sudo make -j8 

./deviceQuery
# 如果最后一行出现 Result = PASS,说明cuda安装成功
yichao@yichao:/usr/local/cuda/samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GTX 1650"
  CUDA Driver Version / Runtime Version          11.4 / 10.2
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 3904 MBytes (4093444096 bytes)
  (14) Multiprocessors, ( 64) CUDA Cores/MP:     896 CUDA Cores
  GPU Max Clock rate:                            1680 MHz (1.68 GHz)
  Memory Clock rate:                             4001 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 1048576 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
CUDA Capability Major/Minor version number:    7.5
Total amount of global memory:                 3904 MBytes (4093444096 bytes)
(14) Multiprocessors, ( 64) CUDA Cores/MP:     896 CUDA Cores

三、查看cuDNN版本

参考资料 Ubuntu 18.4 查看CUDNN版本

  1. 方法一
# 方法一
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
  1. 方法二
# 如果方法一失效,执行方法二
# 1. 查看cudnn.h文件
cat /usr/local/cuda/include/cudnn.h | grep cudnn

输出
/*   cudnn : Neural Networks Library
#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"
#include "cudnn_backend.h"

# 2. cudnn.h文件没有定义cuDNN版本,找到cudnn_version.h文件
# 将查找结果重定向到标准输出中,过滤掉"权限不够"的文件
find / -name cudnn_version.h 2>&1 | grep -v "权限不够"

输出
/home/yichao/Downloads/cuda/include/cudnn_version.h

# 3. 查看cuDNN版本信息
cat /home/yichao/Downloads/cuda/include/cudnn_version.h

输出
#define CUDNN_MAJOR 8
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 5

#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
cuDNN的版本为:8.0.5

四、测试cuda是否安装成功

cd /usr/local/cuda/samples/1_Utilities/deviceQuery

# 清空之前编译的文件
sudo make clean

# 重新编译,-j8表示8线程用于加速
sudo make -j8 

./deviceQuery
# 如果最后一行出现 Result = PASS,说明cuda安装成功
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060"
  CUDA Driver Version / Runtime Version          11.4 / 10.2
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 12051 MBytes (12636061696 bytes)
MapSMtoCores for SM 8.6 is undefined.  Default to use 64 Cores/SM
MapSMtoCores for SM 8.6 is undefined.  Default to use 64 Cores/SM
  (28) Multiprocessors, ( 64) CUDA Cores/MP:     1792 CUDA Cores
  GPU Max Clock rate:                            1777 MHz (1.78 GHz)
  Memory Clock rate:                             7501 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 2359296 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 10 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060"
  CUDA Driver Version / Runtime Version          11.4 / 11.1
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 12051 MBytes (12636061696 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1777 MHz (1.78 GHz)
  Memory Clock rate:                             7501 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 2359296 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 10 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.1, NumDevs = 1
Result = PASS

重要说明:

  1. RTX 3060显卡是 Ampere 架构, cuda 11.1以上版本支持 RTX 3060 显卡;cuda 11.1 以下的版本,无法发挥 RTX 3060 的性能
第一次的结果:
(28) Multiprocessors, ( 64) CUDA Cores/MP:     1792 CUDA Cores

第二次的结果:
(28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores

五、测试cuDNN是否安装成功(deb方式)

当选择deb方式进行安装时,会在 /usr/src/cudnn_samples_v7 有一些cudnn的例子,编译mnistCUDNN sample进行验证。

# 复制cuDNN samples到home目录下
cp -r /usr/src/cudnn_samples_v7 /$HOME

# 进入home目录
cd $HOME/cudnn_samples_v7/mnistCUDNN/

# 编译mnistCUDNN 
sudo make clean 
sudo make

# 运行mnistCUDNN 
# 如果出现Test passed!表明cuDNN已安装成功
sudo ./mnistCUDNN

六、cuda与cuDNN版本对齐

cuDNN Support Matrix
cuDNN-Support-Matrix

七、GPU量化支持

Supported hardware
compute-capabilities

Table 4. Supported hardware
CUDA Compute CapabilityExample DeviceTF32FP32FP16INT8FP16 Tensor CoresINT8 Tensor CoresDLA
8.6NVIDIA A10YesYesYesYesYesYesNo
8.0NVIDIA A100/GA100 GPUYesYesYesYesYesYesNo
7.5Tesla T4NoYesYesYesYesYesNo
7.2Jetson AGX XavierNoYesYesYesYesYesYes
7.0Tesla V100NoYesYesYesYesNoNo
6.2Jetson TX2NoYesYesNoNoNoNo
6.1Tesla P4NoYesNoYesNoNoNo
6.0Tesla P100NoYesYesNoNoNoNo
5.3Jetson TX1NoYesYesNoNoNoNo
5.2Tesla M4NoYesNoNoNoNoNo
5.0Quadro K2200NoYesNoNoNoNoNo
  • 7
    点赞
  • 31
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

花花少年

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值