GPU运算能力对比

计算能力换算

理论峰值 = GPU芯片数量*GPU Boost主频*核心数量*单个时钟周期内能处理的浮点计算次数

只不过在GPU里单精度和双精度的浮点计算能力需要分开计算,以最新的Tesla P100为例:

双精度理论峰值 = FP64 Cores * GPU Boost Clock * 2 = 1792 *1.48GHz*2 = 5.3 TFlops

单精度理论峰值 = FP32 cores * GPU Boost Clock * 2 = 3584 * 1.58GHz * 2 = 10.6 TFlop

信息

# 1080TI
Total amount of global memory:                 11172 MBytes (11715084288 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1582 MHz (1.58 GHz)
  Memory Clock rate:                             5505 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 2883584 bytes
# 1080
Total amount of global memory:                 8111 MBytes (8504868864 bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP:     2560 CUDA Cores
  GPU Max Clock rate:                            1734 MHz (1.73 GHz)
  Memory Clock rate:                             5005 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes

1080TI

~/NVIDIA_CUDA-8.0_Samples/7_CUDALibraries/batchCUBLAS$ export CUDA_VISIBLE_DEVICES=0
~/NVIDIA_CUDA-8.0_Samples/7_CUDALibraries/batchCUBLAS$ ./batchCUBLAS -m1024 -n1024 -k1024

batchCUBLAS Starting...

GPU Device 0: "GeForce GTX 1080 Ti" with compute capability 6.1


 ==== Running single kernels ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00037980 sec  GFLOPS=5654.24
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x0000000000000000, 0) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00894690 sec  GFLOPS=240.026
@@@@ dgemm test OK

 ==== Running N=10 without streams ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x00000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00294209 sec  GFLOPS=7299.19
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.07993412 sec  GFLOPS=268.657
@@@@ dgemm test OK

 ==== Running N=10 with streams ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x40000000, 2) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00224590 sec  GFLOPS=9561.78
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.05540895 sec  GFLOPS=387.57
@@@@ dgemm test OK

 ==== Running N=10 batched ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x3f800000, 1) beta= (0xbf800000, -1)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00197387 sec  GFLOPS=10879.6
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x4000000000000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.05372214 sec  GFLOPS=399.739
@@@@ dgemm test OK

Test Summary
0 error(s)

1080


liu@iridescent:~/NVIDIA_CUDA-8.0_Samples/7_CUDALibraries/batchCUBLAS$ export CUDA_VISIBLE_DEVICES=1
liu@iridescent:~/NVIDIA_CUDA-8.0_Samples/7_CUDALibraries/batchCUBLAS$ ./batchCUBLAS -m1024 -n1024 -k1024
batchCUBLAS Starting...

GPU Device 0: "GeForce GTX 1080" with compute capability 6.1


 ==== Running single kernels ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00060892 sec  GFLOPS=3526.7
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x0000000000000000, 0) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00993085 sec  GFLOPS=216.244
@@@@ dgemm test OK

 ==== Running N=10 without streams ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x00000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00369406 sec  GFLOPS=5813.35
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.09741306 sec  GFLOPS=220.451
@@@@ dgemm test OK

 ==== Running N=10 with streams ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x40000000, 2) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00317717 sec  GFLOPS=6759.12
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.07991505 sec  GFLOPS=268.721
@@@@ dgemm test OK

 ==== Running N=10 batched ==== 

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x3f800000, 1) beta= (0xbf800000, -1)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00302100 sec  GFLOPS=7108.51
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x4000000000000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.07566714 sec  GFLOPS=283.807
@@@@ dgemm test OK

Test Summary
0 error(s)

Jteson


$ ./batchCUBLAS -m1024 -n1024 -k1024
batchCUBLAS Starting...

GPU Device 0: "NVIDIA Tegra X2" with compute capability 6.2


 ==== Running single kernels ====

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.00372291 sec  GFLOPS=576.83
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x0000000000000000, 0) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.10940003 sec  GFLOPS=19.6296
@@@@ dgemm test OK

 ==== Running N=10 without streams ====

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbf800000, -1) beta= (0x00000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.03462315 sec  GFLOPS=620.245
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 1.09212208 sec  GFLOPS=19.6634
@@@@ dgemm test OK

 ==== Running N=10 with streams ====

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x40000000, 2) beta= (0x40000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.03504515 sec  GFLOPS=612.776
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x0000000000000000, 0)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 1.09177494 sec  GFLOPS=19.6697
@@@@ dgemm test OK

 ==== Running N=10 batched ====

Testing sgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0x3f800000, 1) beta= (0xbf800000, -1)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 0.03766394 sec  GFLOPS=570.17
@@@@ sgemm test OK
Testing dgemm
#### args: ta=0 tb=0 m=1024 n=1024 k=1024  alpha = (0xbff0000000000000, -1) beta= (0x4000000000000000, 2)
#### args: lda=1024 ldb=1024 ldc=1024
^^^^ elapsed = 1.09389901 sec  GFLOPS=19.6315
@@@@ dgemm test OK

Test Summary
0 error(s)

对比

      1080ti                          1080                   Jetson Tx2
  GFLOPS=5654.24                 GFLOPS=3526.7             GFLOPS=576.83
  GFLOPS=7299.19                 GFLOPS=5813.35            GFLOPS=620.245

这里写图片描述

发布了84 篇原创文章 · 获赞 130 · 访问量 41万+
展开阅读全文

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 大白 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览