HuaWei & NVIDIA 之 AI 算力对比
参数/型号 | A800 | A100 | Huawei Ascend 910B | H800 | H100 | H200 |
---|---|---|---|---|---|---|
Year | 2022 | 2020 | 2023 | 2022 | 2022 | 2024 |
Manufacturing | 7nm | 7nm | 7+nm | 4nm | 4nm | 4nm |
Architecture | Ampere | Ampere | HUAWEI Da Vinci | Hopper | Hopper | Hopper |
Max Power | 300/400 W | 300/400 W | 400 W | 350/700 W | 350/700 W | 700W |
GPU Mem | 80G HBM2e | 80G HBM2e | 64G HBM2e | 80G HBM3 | 80G HBM3 | 141GB HBM3e |
GPU Mem BW | 1935/2039 GB/s | 2/3.35 TB/s | 4.8 TB/s | |||
GPU | PCIe Gen4 | PCIe Gen5 | PCIe Gen5 | |||
Interconnect (one-to-one max bw) | NVLINK 400GB/s | NVLINK 600GB/s | HCCS 56GB/s | NVLINK 400GB/s | NVLINK 900GB/s | NVLINK 900GB/s |
GPU | PCIe Gen4 | PCIe Gen5 | PCIe Gen5 | |||
Interconnect (one-to-many total bw) | NVLINK 400GB/s | NVLINK 600GB/s | HCCS 392GB/s | NVLINK 400GB/s | NVLINK 900GB/s | NVLINK 900GB/s |
FP32 TFLOPS | 19.5 | 51 | 67* | |||
TF32 TFLOPS | 156 | 312* | 756 | 989* | ||
BF16 TFLOPS | 156 | 312* | 1513 | 1979* | ||
FP16 TFLOPS | 312 | 624* | 328 | 1513 | 1979* | |
FP8 TFLOPS | NOT support | NOT support | 3026 | 3958* | ||
FP6 TFLOPS | NOT support | NOT support | ||||
FP4 TFLOPS | NOT support | NOT support | ||||
INT8 TFLOPS | 624 | 1248* | 648 | 3026 | 3958* |
HuaWei Ascend 910B 是 HuaWei 于 2023 年推出的高性能 AI 处理器芯片,其对标产品为 Nvidia A100/A800
NVIDIA 中国特供版
参数 | HGX H20 | L20 PCIe (Ada Lovelace) | L2 PCIe (Ada Lovelace) |
---|---|---|---|
GPU Architecture | NVIDIA Hopper | NVIDIA Ada Lovelace | NVIDIA Ada Lovelace |
GPU Memory | 96 GB HBM3 | 48 GB GDDR6 w/ ECC | 24 GB GDDR6 w/ ECC |
GPU Memory Bandwidth | 4.0 TB/s | 864 GB/s | 300 GB/s |
INT8 / FP8 Tensor Core* | 296 TFLOPS | 239 TFLOPS | 193 TFLOPS |
BF16 / FP16 Tensor Core* | 148 TFLOPS | 119.5 TFLOPS | 96.5 TFLOPS |
TF32 Tensor Core* | 74 TFLOPS | 59.8 TFLOPS | 48.3 TFLOPS |
FP32 | 44 TFLOPS | 59.8 TFLOPS | 24.1 TFLOPS |
FP64 | 1 TFLOPS | N/A | N/A |
RT Core | N/A | Yes | Yes |
MIG | Up to 7 MIG | N/A | N/A |
L2 Cache | 60 MB | 96 MB | 36 MB |
Media Engine | 7 NVDEC 7 NVJPEG | 3 NVENC (+AV1) 3 NVDEC 4 NVJPEG | 2 NVENC (+AV1) 4 NVDEC 4 NVJPEG |
Power | 400 W | 275W | TBD |
Form Factor | 8-way HGX | 2-slot FHFL | 1-slot LP |
Interconnect | Pcie Gen5 x16: 128 GB/s ,NVLink: 900GB/s | PCIe Gen4 x16: 64 GB/s | PCIe Gen4 x16: 64 GB/s |
Availability | PS: Nov 2023 MP: Dec 2023 | PS: Nov 2023 MP: Dec 2023 | PS: Dec 2023 MP: Jan 2024 |
NVIDIA Blackwell 架构
NVIDIA GB200 Superchip
为每个 Blackwell GPU 提供高达 192 GB 的 HBM3e 内存
NVIDIA GB200 Superchip
- 组成:由 2 个 NVIDIA Blackwell GPU 和 1 个 NVIDIA Grace CPU 组成,通过 NVIDIA NVLink-C2C 互连技术连接
- 功耗:每个 Blackwell GPU 的满配 TDP 达到 1200W,整个超级芯片的 TDP 达到 2700W
- 第五代 NVIDIA NVLink:提供 1.8TB/s 的 GPU 间互联带宽
- 内存:GB200 Superchip 每个超级芯片配备 864GB 内存(480GB LPDDR5x 和 384GB HBM3e)
算力指标
GPU Spec | GB200 Grace Blackwell Superchip |
---|---|
Configuration | 1 Grace CPU : 2 Blackwell GPUs |
FP4 Tensor Core Dense/Sparse | 20 / 40 petaFLOPS |