NVIDIA GPU A100 Ampere(安培) 架构深度解析

本文深入解析NVIDIA的A100 GPU,基于Ampere架构,相较于Volta有20倍性能提升。文章详细介绍了A100的亮点,包括Tensor Core的增强、新特性和HPC加速,以及结构稀疏性、计算数据压缩等性能优化关键特性。此外,还讨论了Elastic GPU、NVLink和NVSwitch的实现技术,以及多实例GPU(MIG)的概念。
摘要由CSDN通过智能技术生成

NVIDIA GPU A100 Ampere(安培)架构深度解析




1. NVIDIA A100 Highlights

NVIDIA A100 SPECS TABLE
在这里插入图片描述



1.1 NVIDIA A100对比Volta有20x性能的性能提升。

在这里插入图片描述



1.2 NVIDIA A100的5个新特性

  • World’s Largest 7nm chip 54B XTORS, HBM2
  • 3rd Gen Tensor Cores Faster, Flexible, Easier to use 20x AI Perf with TF32
  • New Sparsity Acceleration Harness Sparsity in AI Models 2x AI Performance
  • New Multi-Instance GPU Optimal utilization with right sized GPU 7x Simultaneous Instances per GPU
  • 3rd Gen NVLINK and NVSWITCH Efficient Scaling to Enable Super GPU 2X More Bandwidth
    在这里插入图片描述


1.3 AI加速:使用BERT-LARGE进行训练、推理

在这里插入图片描述



1.4 A100 HPC 加速

与NVIDIA Tesla V100相比,A100 GPU HPC应用程序加速 。

在这里插入图片描述

HPC apps detail:

  • AMBER based on PME-Cellulose,
  • GROMACS with STMV (h-bond),
  • LAMMPS with Atomic Fluid LJ-2.5,
  • NAMD with v3.0a1 STMV_NVE
  • Chroma with szscl21_24_128,
  • FUN3D with dpw,
  • RTM with Isotropic Radius 4 1024^3,
  • SPECFEM3D with Cartesian four material model
  • BerkeleyGW based on Chi Sum


1.5 GA100 架构图

NVIDIA GA100由多个GPU处理群集(GPC),纹理处理群集(TPC),流式多处理器(SM)和HBM2内存控制器组成。
在这里插入图片描述

A100 GPU的构架名称为GA100,一个完整GA100架构实现包括以下单元:

  • 8 GPCs, 8 TPCs/GPC, 2 SMs/TPC, 16 SMs/GPC, 128 SMs per full GPU
  • 64 FP32 CUDA Cores/SM, 8192 FP32 CUDA Cores per full GPU
  • 4 third-generation Tensor Cores/SM, 512 third-generation Tensor Cores per full GPU
  • 6 HBM2 stacks, 12 512-bit memory controllers

基于GA100架构的A100 GPU包括以下单元:

  • 7 GPCs, 7 or 8 TPCs/GPC, 2 SMs/TPC, up to 16 SMs/GPC, 108 SMs
  • 64 FP32 CUDA Cores/SM, 6912 FP32 CUDA Cores per GPU
  • 4 third-generation Tensor Cores/SM, 432 third-generation Tensor Cores per GPU
  • 5 HBM2 stacks, 10 512-bit memory controllers

A100 GPU具体设计细节如下:

  • GA100架构一共拥有6个HBM2的内存,每个HBM2内存对应两个内存控制器模块。但A100实际设计的时候内存为40GB,只有5个HBM2模块,对应10个内存控制器。
  • 与V100相比,A100内部拥有两个L2 Cache,因此能提供V100 2倍多的L2 Cache带宽。
  • GA100拥有8个GPC,每个GPC中拥有8个TPC(GPC:图形处理集群、TPC:纹理处理集群),每一个TPC包含2个SM,因此一个完整的GA100芯片应该包含 8*8*2=128个SM。但现在发布的A100 Spec中只包含108个SM,因此目前的A100并不是一个完整的Full GA100构架芯片。


1.6 GA100 SM架构

新的A100 SM大大提高了性能,建立在Volta和Turing SM体系结构中引入的功能的基础上,并增加了许多新功能和增强功能。

A100 SM架构如下图所示。Volta和Turing每个SM具有八个Tensor Co

NVIDIA A100 Tensor Core GPU Architecture UNPRECEDENTED ACCELERATION AT EVERY SCALE Introduction The diversity of compute-intensive applications running in modern cloud data centers has driven the explosion of NVIDIA GPU-accelerated cloud computing. Such intensive applications include AI deep learning training and inference, data analytics, scientific computing, genomics, edge video analytics and 5G services, graphics rendering, cloud gaming, and many more. From scaling-up AI training and scientific computing, to scaling-out inference applications, to enabling real-time conversational AI, NVIDIA GPUs provide the necessary horsepower to accelerate numerous complex and unpredictable workloads running in today’s cloud data centers. NVIDIA® GPUs are the leading computational engines powering the AI revolution, providing tremendous speedups for AI training and inference workloads. In addition, NVIDIA GPUs accelerate many types of HPC and data analytics applications and systems, allowing customers to effectively analyze, visualize, and turn data into insights. NVIDIA’s accelerated computing platforms are central to many of the world’s most important and fastest-growing industries. HPC has grown beyond supercomputers running computationally-intensive applications such as weather forecasting, oil & gas exploration, and financial modeling. Today, millions of NVIDIA GPUs are accelerating many types of HPC applications running in cloud data centers, servers, systems at the edge, and even deskside workstations, servicing hundreds of industries and scientific domains. AI networks continue to grow in size, complexity, and diversity, and the usage of AI-based applications and services is rapidly expanding. NVIDIA GPUs accelerate numerous AI systems and applications including: deep learning recommendation systems, autonomous machines (self-driving cars, factory robots, etc.), natural language processing (conversational AI, real-time language translation, etc.), smart city video analytics, software-defined 5G networks (that can deliver AI-based services at the Edge), molecular simulations, drone control, medical image analysis, and more.
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值