NVIDIA A100 TENSOR CORE GPU

计算机视觉研究院专栏

作者:Edison_G

NVIDIA®GPU是推动人工智能革命的主要计算引擎,为人工智能训练和推理工作负载提供了巨大的加速。此外,NVIDIA GPU加速了许多类型的HPC和数据分析应用程序和系统,使客户能够有效地分析、可视化和将数据转化为洞察力。NVIDIA的加速计算平台是世界上许多最重要和增长最快的行业的核心。

计算机视觉研究院

长按扫描维码关注我们

EDC.CV

1. Unprecedented Acceleration at Every Scale

The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at every scale for AI, data analytics, and HPC to tackle the world’s toughest computing challenges. As the engine of the NVIDIA data center platform, A100 can efficiently scale up to thousands of GPUs or, using new Multi-Instance GPU (MIG) technology, can be partitioned into seven isolated GPU instances to accelerate workloads of all sizes. A100’s third- generation Tensor Core technology now accelerates more levels of precision for diverse workloads, speeding time to insight as well as time to market.

2. SYSTEM SPECIFICATIONS (PEAK PERFORMANCE)

3. GROUNDBREAKING INNOVATIONS

The NVIDIA A100 Tensor Core GPU is the flagship product of the NVIDIA data center platform for deep learning, HPC, and data analytics. The platform accelerates over 700 HPC applications and every major deep learning framework. It’s available everywhere, from desktops to servers to cloud services, delivering both dramatic performance gains and cost-saving opportunities.

To learn more about the NVIDIA A100 Tensor Core GPU, visit www.nvidia.com/a100

1  BERT pre-training throughput using Pytorch, including (2/3) Phase 1 and (1/3) Phase 2 | Phase 1 Seq Len = 128, Phase 2 Seq Len =
512 | V100: NVIDIA DGX-1TM server with 8x NVIDIA V100 Tensor Core GPU using FP32 precision | A100: NVIDIA DGXTM A100 server with 8x A100 using TF32 precision.

2  BERT large inference | NVIDIA T4 Tensor Core GPU: NVIDIA TensorRTTM (TRT) 7.1, precision = INT8, batch size 256 | V100: TRT 7.1, precision FP16, batch size 256 | A100 with 7 MIG instances of 1g.5gb; pre-production TRT, batch size 94, precision INT8 with sparsity.

3  V100 used is single V100 SXM2. A100 used is single A100 SXM4. AMBER based on PME-Cellulose, LAMMPS with Atomic Fluid LJ-2.5, FUN3D with dpw, Chroma with szscl21_24_128.

SPECIFICATIONS

 NVIDIA A100 for HGXNVIDIA A100 for PCIe
Peak FP649.7 TF9.7 TF
Peak FP64 Tensor Core19.5 TF19.5 TF
Peak FP3219.5 TF19.5 TF
Peak TF32 Tensor Core156 TF | 312 TF*156 TF | 312 TF*
Peak BFLOAT16 Tensor Core312 TF | 624 TF*312 TF | 624 TF*
Peak FP16 Tensor Core312 TF | 624 TF*312 TF | 624 TF*
Peak INT8 Tensor Core624 TOPS | 1,248 TOPS*624 TOPS | 1,248 TOPS*
Peak INT4 Tensor Core1,248 TOPS | 2,496 TOPS*1,248 TOPS | 2,496 TOPS*
GPU Memory40 GB40 GB
GPU Memory Bandwidth1,555 GB/s1,555 GB/s
InterconnectNVIDIA NVLink 600 GB/s**
PCIe Gen4 64 GB/s 
NVIDIA NVLink 600 GB/s**
PCIe Gen4 64 GB/s 
Multi-instance GPUsVarious instance sizes with up to 7MIGs @5GBVarious instance sizes with up to 7MIGs @5GB
Form Factor4/8 SXM on NVIDIA HGX™ A100PCIe
Max TDP Power400W250W
Delivered Performance of Top Apps100%90%

* With sparsity
** SXM GPUs via HGX A100 server boards, PCIe GPUs via NVLink Bridge for up to 2-GPUs

我们开创“计算机视觉协会”知识星球一年有余,也得到很多同学的认可,我们定时会推送实践型内容与大家分享,在星球里的同学可以随时提问,随时提需求,我们都会及时给予回复及给出对应的答复。

如果想加入我们“计算机视觉研究院”,请扫二维码加入我们。我们会按照你的需求将你拉入对应的学习群!

计算机视觉研究院主要涉及深度学习领域,主要致力于人脸检测、人脸识别,多目标检测、目标跟踪、图像分割等研究方向。研究院接下来会不断分享最新的论文算法新框架,我们这次改革不同点就是,我们要着重”研究“。之后我们会针对相应领域分享实践过程,让大家真正体会摆脱理论的真实场景,培养爱动手编程爱动脑思考的习惯!

计算机视觉研究院

长按扫描维码关注我们

EDC.CV

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

计算机视觉研究院

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值