Meta集群的演进

OString24

于 2025-02-05 11:32:50 发布

阅读量411

点赞数 3

文章标签：服务器网络运维

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/huntershuai/article/details/145319592

版权

在这里插入图片描述

The GPUs for the Zion, ZionEX, and now Grand Teton servers all make use of the OCP Application Module (OAM) form factor, created by Facebook and Microsoft three years ago. The prior GPU-accelerated AI machines – which includes Big Sur from 2016, Big Basin from 2017, and Big Basin 2 from 2018 – all used PCI-Express GPU accelerators and did not make use of the Nvidia custom SXM sockets with their NVLink networking that Nvidia reserves for its highest performing systems.

在这里插入图片描述

ZionEx

ZionEX互联更详细的拆解图：

OAM（GPU/Asic）之间通过背板进行互联；
CPU （Cooper Lake Xeon SPs in the Zion are gluelessly linked by Intel’s UltraPath Interconnect (UPI) in a twisted hypercube topology ）
- 在多插槽系统中，每个 CPU 都通过 UPI 链路直接互联，或者通过中间Scalable Memory Interconnect (SMI) 间接互联。
- UPI 支持多条链路（通常 2~3 条），每条链路带宽高达 10.4 GT/s 或更高（具体取决于 CPU 代际）。
- 每条 UPI 链路可以动态路由数据包，保证高效通信
Scaleout 网络依赖ClearCreek 层，每个OAM都可以通过switch 上匹配的Rdma Nic 与集群中的任意一个OAM 进行通信（包含机柜内的）

8卡+ CC layer互联实物图
在这里插入图片描述
从下到上，总共三层，分别是AL,CC,EP layer box，CC 和EP之间通过Whiser cable 互联

Angel Landing Layer

在这里插入图片描述

Clear Creek Layer

Grand Teton

在这里插入图片描述

reference

https://www.nextplatform.com/2022/10/20/the-iron-that-will-drive-ai-at-meta-platforms/
https://www.opencompute.org/documents/facebook-zion-system-spec-1-0-pdf

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。