【国产AI服务器】国产替代博通PCIe 5.0 Switch应用实例,支持众星微定制

国产替代博通PCIe 5.0 Switch应用实例

一,FANOUT switch

Fanout switch

二,Graphics Processing Unit

Graphics applications typically use x16-width links. Peer-to-peer communication is supported in switch hardware, allowing the shortest possible path between the peers. As with the fanout switch, the links can be at any supported PCle speed, for any supported width.

如上图:

GPU1 can communicate with GPU2 using switch Switch1.

GPU1 can communicate with GPU3 using switch1 through a fabric link to switch2.

注意:P2P Requires software support.

三,NvMe Hot Add and Hot Remove

This switch supports a system that can add a PCle device to an empty slot either in a server or in the host without disrupting the rest of the PCle tree.

In base switch mode (BSw), system software must preallocate resources for empty slots at host boot time. 参看下图:

The challenge is to reserve PCle resources at configuration so that you can add devices later. Functionally, synthetic switch (sSw) mode uses management software to create a placeholder endpoint to reserve PCle resources. 如下图:

See Operation Modes for mode information. You can remove devices from either the server or the host at any time without disrupting other ports. Use DPC to keep endpoint errors from spreading to the rest of the tree. Enable read tracking to prevent completion timeouts.

四:High Availability Storage Use Case

The switch works with dual-ported NVMe drives to provide high availability storage, that is, no single point of failure. The topology that follows achieves high availability storage through redundancy-two paths to each NvMe device. Each of the two hosts has its own independent switch hierarchy.

If the switches are in isSw mode, the link between switches connects two synthetic switch ports.

If the switches are in BSw mode, the link between switches connects a downstream port to an upstream port.

In the preceding figure, the possible failures that follow are addressed:

If one host fails, the other host can still get to every NVMe drive.

If a switch fails, an alternate path exists to access the data on the endpoint.

If any link fails, the upstream side considers the downstream side to have failed.

  • If the link to an upstream port of a switch fails, the switch appears as failed to the host.
  • If the link to an NvMe drive fails, the endpoint appears as failed to the host that uses that path.

If an endpoint (an NvMe drive in the preceding figure) fails, the data on that drive is no longer accessible.

In this example, the upstream switches (both left and right sides) should be synthetic switches to support the most robust NvMe surprise device add or remove.

五,Redundant Storage

Previous generation mirrored systems might have identical boards connected with back-to-back non-transparent (NT) bridges connected through a crosslink. The PEX89104 switch model replaces the NT-NT link with a fabric link. In all such mirrored systems, each host finds a single NT endpoint through which it communicates to the other host.

The fabric link provides a (hidden-to-host) data path between the two host systems. When one host fails, the other host can take over the endpoints of the first switch.

In this use case, the switch must be in synthetic mode to enable the NT endpoint. A fabric link connects the two switches. Each switch should configure a synthetic switch port to connect to the fabric link.

六:Multidomain General-Purpose GPU

The switch supports general-purpose GPU (GPGPU) peer-to-peer transactions within a domain and between domains. In the example that follows, the switches are in iSSw mode, connected by a fabric link.


关于PCIe switch Fabric top的笔记:

The standard PCIe tree topology has some limitations. In SSw mode, the switch supports alternate topologies to address those limitations. The link between two PCIe switches in SSw mode can connect two fabric ports, upon which all TLPs are routed with a destination ID. This approach allows for any one source to get to any one destination using a programmable fabric path.

A global ID (GID) routes a TLP across a fabric. If there is more than one path available, the destination GID indexes a destination lookup table (D-LUT) to get a choice vector. From the choice vector, a particular fabric port choice is made.

Global ID构成:The GID is composed of {domain[7:0],ID[15:0]}, where the ID value is a PCIe standard ID composed of {bus[7:0], device[5:0], and function[2:0]}.

Management software sets up address traps, ID traps, or both in the hardware to take a normal PCIe TLP and create a destination GID. Management software also programs the route table in the D-LUT and the choice mapping registers.

Contact your Broadcom field applications engineer for more details.

NOTE Fabric topologies require firmware support.

该段落讨论了如何通过iSSw模式下的switch实现不同topologies。它解释了如何使用GID路由TLPs,并且强调了firmware支持的重要性。

以下举例fabric port的应用:

Simple Tree Topology

In the PCIe tree topology example that follows, the root complex connects on the upstream side to a host port on the A0 PCIe switch. The A0 PCIe switch connects through fabric (f) ports to B0, B1, and B2. Downstream (d) ports connect to standard PCIe devices (not shown in the figure). A standard PCIe device could be an endpoint or another set of switches that lead to a set of endpoints.

One benefit of using fabric ports to connect switches in a standard tree is scalability. The buses consumed by the switches can be hidden from the host allowing more downstream endpoints.

Dual Tree Topology

A common topology for redundancy is to connect two identical systems with a fabric link. In this topology, if one host goes down, the other host can take over the failed host's endpoints. The figure that follows shows two fabric links between the switches to add another point of redundancy.

Additionally, the fabric link could provide a higher performing path (more throughput and less latency) between peer endpoints, one on A0 and one on A1, instead of using a connection (the dashed line in the figure) between the two root complexes.

Dual Tree Example如下图:

Mesh Topology

A mesh topology connects every node to every other node. The dual switch topology shown in Dual Tree Topology is a 2-node mesh. The PCIe switch supports up to a 13-node mesh. The figure that follows shows a 4-node mesh topology.

国产替代博通PCIe 5.0 Switch应用实例,深圳信迈支持众星微PCIE Switch板卡定制

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值