一、硬件参数
2080Ti | 3090 | 3090Ti | 4090 | 4090D | 5090 | 5090D | |
核心 | TU102-300A | GA102-300 | GA102-350 | AD102-300 | AD102-250 | GB202-300 | GB202-250 |
架构 | Turing | Ampere | Ampere | Ada Lovelace | Ada Lovelace | Blackwell | Blackwell |
SM | 68 | 82 | 84 | 128 | 114 | 170 | 170 |
CUDA Cores / SM | 64 | 128 | 128 | 128 | 128 | 128 | 128 |
CUDA Cores / GPU | 4352 | 10496 | 10752 | 16384 | 14592 | 21760 | 21760 |
Tensor Core | 2nd | 3rd | 3rd | 4th | 4th | 5th | 5th |
Tensor Cores / SM | 8 | 4 | 4 | 4 | 4 | 4 | 4 |
Tensor Cores / GPU | 544 | 328 | 336 | 512 | 456 | 680 | 680 |
GPU 加速频率 (MHz) | 1545 | 1695 | 1860 | 2520 | 2520 | 2407 | 2407 |
显存 | 11 / 22 GB (GDDR6)* | 24 GB (GDDR6X) | 24 GB (GDDR6X) | 24 GB (GDDR6X) | 24 GB (GDDR6X) | 32 GB (GDDR7) | 32 GB (GDDR7) |
显存位宽 (bit) | 352 | 384 | 384 | 384 | 384 | 512 | 512 |
显存速率 (Gbps) | 14 | 19.5 | 21 | 21 | 21 | 28 | 28 |
显存带宽 (GBps) | 616 | 936.2 | 1008 | 1008 | 1008 | 1792 | 1792 |
一缓 (KB per SM) | 64 | 128 | 128 | 128 | 128 | 128 | 128 |
二缓 (MB) | 6 | 6 | 6 | 72 | 72 | 96 | 96 |
TGP (W) | 250 | 350 | 450 | 450 | 425 | 575 | 575 |
制程 | TSMC 12nm FFN | Samsung 8N (8nm) | Samsung 8N (8nm) | TSMC 4N (5nm) | TSMC 4N (5nm) | TSMC 4N (5nm) | TSMC 4N (5nm) |
* 22 GB 是常见的手动扩显存的魔改卡
二、算力
1、CUDA Core 算力
浮点:TFLOPS
整型:TIOPS
取 4090 的算力为100%
2080Ti | 3090 | 3090Ti | 4090 | 4090D | 5090 | 5090D | |
FP32 | 13.45 | 35.58 | 40.00 | 82.6 | 73.5 | 104.8 | 104.8 |
FP16 | 26.9 | 35.58 | 40.00 | 82.6 | 73.5 | 104.8 | 104.8 |
FP64 | 0.4202 | 0.556 | 0.625 | 1.29 | 1.149 | 1.64 | 1.64 |
BF16 | NA | 35.58 | 40.00 | 82.6 | 73.5 | 104.8 | 104.8 |
INT32 | 13.45 | 17.79 | 20.00 | 41.3 | 36.8 | 104.8 | 104.8 |
2080Ti | 3090 | 3090Ti | 4090 | 4090D | 5090 | 5090D | |
FP32 | 16.3% | 43.1% | 48.4% | 100% | 89.0% | 126.9% | 126.9% |
FP16 | 32.6% | 43.1% | 48.4% | 100% | 89.0% | 126.9% | 126.9% |
FP64 | 32.6% | 43.1% | 48.4% | 100% | 89.0% | 126.9% | 126.9% |
BF16 | NA | 43.1% | 48.4% | 100% | 89.0% | 126.9% | 126.9% |
INT32 | 32.6% | 43.1% | 48.4% | 100% | 89.0% | 253.6% | 253.6% |
2、Tensor Core 算力
浮点:TFLOPS
整型:TIOPS
稠密/稀疏
取 4090 的算力为100%
2080Ti | 3090 | 3090Ti | 4090 | 4090D | 5090 | 5090D* | |
FP4 | NA | NA | NA | NA | NA | 1676 / 3352 | NA / 2375 |
FP8 | NA | NA | NA | 660.6 / 1321.2 | 588.4 / 1176.8 | 838 / 1676 | NA / NA |
FP16 | 107.6 | 142 / 284 | 160 / 320 | 330.3 / 660.6 | 294.2 / 588.4 | 419 / 838 | NA / NA |
BF16 | NA | 71 / 142 | 80 / 160 | 165.2 / 330.4 | 147.1 / 294.2 | 209.5 / 419 | NA / NA |
TF32 | NA | 35.6 / 71 | 40 / 80 | 82.6 / 165.2 | 73.5 / 147.1 | 104.8 / 209.5 | NA / NA |
INT8 | 215.2 | 284 / 568 | 320 / 640 | 660.6 / 1321.2 | 588.4 / 1176.8 | 838 / 1676 | NA / NA |
INT4 | 430.3 | 568 / 1136 | 640 / 1280 | 1321.2 / 2642.4 | 1176.8 / 2353.6 | 1676 / 3352 | NA / NA |
2080Ti | 3090 | 3090Ti | 4090 | 4090D | 5090 | 5090D* | |
FP4 | NA | NA | NA | NA | NA | NA | NA |
FP8 | NA | NA | NA | 100% | 89.0% | 126.9% | NA |
FP16 | 32.6% | 43.1% | 48.4% | 100% | 89.0% | 126.9% | NA |
BF16 | NA | 43.1% | 48.4% | 100% | 89.0% | 126.9% | NA |
TF32 | NA | 43.1% | 48.4% | 100% | 89.0% | 126.9% | NA |
INT8 | 32.6% | 43.1% | 48.4% | 100% | 89.0% | 126.9% | NA |
INT4 | 32.6% | 43.1% | 48.4% | 100% | 89.0% | 126.9% | NA |
*5090D 的 Tensor Core 算力有待考证