CMP334 | 介绍基于 Amazon Inferentia2 的 Amazon EC2 Inf2 实例

本文链接：https://blog.csdn.net/just2gooo/article/details/134617575

CMP334 | 介绍基于 Amazon Inferentia2 的 Amazon EC2 Inf2 实例

关键字: [Amazon Web Services re:Invent 2023, Inferentia, Large Language Models, Distributed Inference, Inferentia 2, Model Parallelism, Tensor Parallelism, Accelerator Memory Bandwidth]

本文字数: 1300, 阅读完需: 6 分钟

视频

导读

介绍新的 Amazon EC2 Inf2 实例，其中包含Amazon Inferentia2，这是由亚马逊云科技构建的第三个 ML 加速器，并针对 ML 推理进行了优化。参加本节课程，首先了解 Inf2 实例如何在云中为客户最苛刻的 1000 b+ 参数深度学习模型提供最低的每次推理成本。深入探讨 Amazon Inferentia2 是如何构建的，以在 Amazon EC2 上部署下一代 100B+ 参数深度学习模型进行推理。

演讲精华

以下是小编为您整理的本次演讲的精华，共1000字，阅读时间大约是5分钟。如果您想进一步了解演讲内容或者观看演讲全文，请观看演讲完整视频或者下面的演讲原文。

乔·西罗什（Joe Sirosh）作为亚马逊EC2团队的产品经理，满怀热情地向观众介绍了全新的亚马逊EC2 M2实例。这款实例搭载了亚马逊云科技研发的最新机器学习加速器芯片Inferentia 2。在评估观众对新Inferentia产品线的了解程度后，乔强调，他最关注的是搭载Inferentia 2的M2实例在机器学习推断工作负载方面的性能和效率的重大进步。

为了让客户分享使用Inferentia的体验，乔邀请了Qualtrics的高级机器学习工程师Samir和Amazon CodeWhisperer的高级软件开发经理Shrini上台。他们将在稍后时刻分享关于如何使用Inferentia芯片提高机器学习应用程序性能及降低成本的内容。

乔简要介绍了会议议程，涵盖了驱动新应用程序和服务创新的人工智能和机器学习方法、亚马逊云科技如何通过如Inferentia等芯片让每个开发者都能访问机器学习、Qualtrics如何利用Inferentia进行自然语言处理、Inferentia 2的架构创新，以及CodeWhisperer如何利用Inferentia对分布式推理的支持来服务大型语言模型等内容。

接着，乔开始谈论人工智能和机器学习创新的快速发展，这些创新正在各个行业中推动新的智能服务和功能。他还列举了诸如Alexa和Amazon Go等使用机器学习来提供神奇体验的例子。他强调了亚马逊云科技的使命，那就是将机器学习交给每一个开发者，而不仅仅是拥有计算机科学博士学位的专家。如同Inferentia这样的芯片，通过提供优化成本和提高效率的高性能推理，有助于实现机器学习民主化。

Next, Joe invited Samir from Qualtrics to introduce how they utilize the first-generation Inferentia chips to accelerate survey analysis in natural language processing. Samir explained that the Inferentia allows them to boost their processing capacity by 2-3 times while reducing costs, enabling them to expand smartly. He believed that the simplicity of Inferentia and its compatibility with frameworks like PyTorch and TensorFlow make it easy to deploy.

Back on stage, Joe delved into the innovative architecture of Inferentia 2, which makes it a more powerful and flexible computing chip. He explained how Inferentia 2 combines a traditional CPU with a highly optimized data path specifically targeted at machine learning workloads, achieving a balance between performance and flexibility. The on-chip vector processor has up to a 10-fold increase in memory bandwidth compared to the CPU, allowing custom operations to run directly on the accelerator.

Joe also highlighted the support for six numerical data types in Inferentia 2, ranging from high-precision floating-point 32 to efficient floating-point 16 and half-precision 16. This allows developers to optimize performance based on throughput, latency, and accuracy. Additionally, M2 instances are the first to support distributed inference via direct high-speed connections among accelerators, enabling models with billions of parameters to be split across multiple chips.

To demonstrate the performance boost, Joe showed benchmark tests showing that M2 instances can achieve 3-times-higher throughput than GPUs on massive 300 billion parameter models. Moreover, M2 can support models with up to 660 billion parameters before hitting the memory limit, while a GPU reaches its limit at just 300 billion parameters.

Joe explained how the Amazon Web Services (亚马逊云科技) Neuron SDK simplifies the deployment of models on M2 by integrating frameworks like PyTorch and TensorFlow. The SDK's compiler, runtime, and diagnostic tools can optimize performance without any need for model conversion.

Finally, Joe introduced Shrini, who would discuss how they utilize Inferentia's distributed inference capabilities to provide lower latency services for large language models, enabling new dialogue AI applications.

在总结中，Joe阐述了Inferentia 2芯片以及M2实例家族的创新发展如何为机器学习推理带来了突破性的价格性能比。通过支持具有数十亿参数的大型模型并简化部署过程，Inferentia 2成为了亚马逊云科技实现将机器学习普及到每位开发者手中这一愿景的重要里程碑。

下面是一些演讲现场的精彩瞬间：

亚马逊云科技近期推出了搭载Inferentia 2的Amazon EC2 M2实例，这是一款具有高性价比的机器学习推理加速器。

借助类似于Inferentia的新型定制硅芯片以及M2实例的分布式功能，亚马逊云科技正努力实现机器学习的普及化。

相较于传统的CPU，Graviton3处理器提供了一个数量级的内存带宽优势，使得自定义操作可以直接在计算单元上高效执行，避免了数据传输瓶颈。

亚马逊云科技的Inferential 2支持六种独特数据类型，并通过灵活的浮点精度优化机器学习工作负载。

Trainium和M2能够为需要跨多个芯片分布处理的大型AI模型提供高度可扩展的训练和推理能力。

M2能够在不引发类似GPU的内存问题的情况下为大型语言模型提供高性能。

此外，Neuron为边缘基础设施和工作负载提供了实时的可视化和监控功能。

总结

亚马逊近期宣布推出了Amazon EC2 M2实例，这款实例搭载了最新的机器学习加速器Inferentia 2，以实现高性能的自然语言处理。这一新型实例是基于现有的Inferentia产品线进行升级，使得大规模语言模型的成本效益和可扩展性部署成为可能。

Inferentia 2的关键创新之处在于其支持多种数字格式、深度嵌入的灵活向量处理器以及环形拓扑结构，这些特性使其能够高效地服务大型模型。这些特点共同提供了高吞吐量和足够的内存容量来服务具有超过100亿参数的模型。

内部早期测试表明，与GPU实例相比，M2的吞吐量提高了3倍以上。这种提高的性能和可扩展性对于过去由于规模过大或成本过高而无法部署的大模型团队来说将产生巨大的影响。总的来说，新的M2实例通过经济实惠且易于使用的云基础设施，极大地推动了亚马逊通过民主化访问强大AI的目标。

演讲原文

想了解更多精彩完整内容吗？立即访问re:Invent 官网中文网站！

2023亚马逊云科技re:Invent全球大会 - 官方网站

点击此处，一键获取亚马逊云科技全球最新产品/服务资讯！

点击此处，一键获取亚马逊云科技中国区最新产品/服务资讯！

即刻注册亚马逊云科技账户，开启云端之旅！

【免费】亚马逊云科技“100 余种核心云服务产品免费试用”

【免费】亚马逊云科技中国区“40 余种核心云服务产品免费试用”

亚马逊云科技是谁？

亚马逊云科技（Amazon Web Services）是全球云计算的开创者和引领者，自 2006 年以来一直以不断创新、技术领先、服务丰富、应用广泛而享誉业界。亚马逊云科技可以支持几乎云上任意工作负载。亚马逊云科技目前提供超过 200 项全功能的服务，涵盖计算、存储、网络、数据库、数据分析、机器人、机器学习与人工智能、物联网、移动、安全、混合云、虚拟现实与增强现实、媒体，以及应用开发、部署与管理等方面；基础设施遍及 31 个地理区域的 99 个可用区，并计划新建 4 个区域和 12 个可用区。全球数百万客户，从初创公司、中小企业，到大型企业和政府机构都信赖亚马逊云科技，通过亚马逊云科技的服务强化其基础设施，提高敏捷性，降低成本，加快创新，提升竞争力，实现业务成长和成功。