AI算力芯片(ASIC/GPGPU/XPU)之以太网测试用例支持(400G 56G PAM4 & 800G 112G PAM4 SerDes、QP、RDMA、PFC、DCQCN、NCCL通信库)

目录

AI专用算力芯片

ASIC/GPU/XPU形态

AI计算板卡/模组之间的机箱集群互联技术

NVLINK,Inifiband,Ethernet

有必要重新介绍一下XENA公司

原:Xena Networks/信雅纳网络

Valkyrie产品系列

Vulcan产品系列

Chimera产品系列

Vantage产品系列

Safire产品系列

Virtual产品系列

现:Teledyne Lecroy Xena

Teledyne to Acquire Xena Networks

Xena产品系列

Z系列

Z01 Odin

Z10 Odin

Z100 Loki

Z400 Thor

Z800 Freya

Z1600 Sif

E系列

E100

E400

E800

针对Ethernet以太网高速互联,Xena提供的测试用例

L1层的测试用例

RS-FEC

Auto-Negotiation

Link Training

AN/LT- Debug

Tx Taps/ Rx Taps

 PPM Adjustment

I2C R/W

SIV(Singnal Intergrity View)

Link Flap

Error Injection

FEC Error Injection                

PMA Error Injection

L23层的测试用例

Wire-speed Traffic

50/100/200/400G: Z400q Thor,Z800q Freya,Z800o Freya

100/200/400/800G:Z800q Freya,Z800o Freya

MAC/IP/TCP/RDMA业务报文流量

协议字段的跳变

Packet Error模拟

PFC

L3+层的测试用例

RDMA QP

User-Define Protocol

NCCL通信库

损伤-负面压力仿真

Link Flap链路震荡

PCS/PMA Error注入

报文延迟/抖动注入

重复业务报文

带宽限速

锁定到特征报文施加损伤

API二次开发接口

CLI

XOA

HLAPI

兼容性/互通性保障

Ethernet Alliance以太网联盟

UNH-IOL


AI专用算力芯片

ASIC/GPU/XPU形态

AI专用芯片各类产品形态:ASIC/GPU/XPU/FPGA,... 

AI计算板卡/模组之间的机箱集群互联技术

NVLINK,Inifiband,Ethernet

便宜又大碗的,只有以太网技术~

在规模较小的时候,AI加速板卡与板卡之间的通信是在一个服务器机箱内,依靠PCIE/CXL即可搞定,但随着训练模型的增大,AIGC等火热,跨机箱的通信是个迈不过去的坎了,高带宽,低延迟,高可靠性都混杂在一起的需求属实难办。

NVLINK是绑定在全家桶里面销售的,没有单卖的方案。IB属实不算开发的技术,成本贼高暂不提。Ethernet属于便宜又大碗的,生态开放,虽然比较古老,有各种小毛病需要克服,但修修补补还是挺能打的。

尤其是国内市场信创产业的要求,Ethernet以太网技术的互联将是AI算力板卡模块之间高速互联/集群互联的阳光出路。

有必要重新介绍一下XENA公司

原:Xena Networks/信雅纳网络

 

专注做有线以太网测试仪表的丹麦厂商,提供L27完整的测试解决方案,只要是带以太网网口的,我们都能测。

产品性能与思博伦Spirent, 意达康Ixia可以对飙的,价格与国产打流仪表很接近。

以上这两句是以前拜访客户的时候常说的,... 虽有一些夸大的成分,但也基本属实。

然后那时候有好几个产品系列支持不同的测试场景:

Valkyrie产品系列

针对Layer23的性能测试测试产品:Odin-10/100/1000M千兆,Odin-100M/1/2.5/5/10G万兆,Loki-10/25/40/50/100G QSFP28 100G,Thor-10/25/40/50/100/200/400G QSFP-DD 400G,还有支持汽车以太网的支持TSN特性的版本,以及支持工业以太网特性的AE版本

主要用以验证吞吐量/延迟/抖动/丢包率/以太网业务性能和业务品质

Vulcan产品系列

针对Layer47层的性能测试产品:CC,CPS,TPS等各类并发,真实的负载业务流量仿真,主要是用在测试防火墙,DPI,网闸等安全类设备的功能和性能上

Chimera产品系列

针对10/25/40/50/100G高速以太网的负面压力测试工具,可以提供10ns级别的时延注入,抖动注入,误码注入,报文重复,比特修改等

Vantage产品系列

针对亚太市场的产线的网络性能验收测试工具,可以集成客户的MES生成制造系统进行网络性能的验收测试

Safire产品系列

针对企业IT运维场景,SD-WAN,NGFW评测等场景推出的防火墙测试评测工具

Virtual产品系列

针对云原生的环境进行部署测试场景,提供L23和L47的完整性能测试验证

现:Teledyne Lecroy Xena

然后的然后,我们在去年的10月份,并入到了Teledyne Lecroy. Xena成为了它的一个子品牌

产品线也就该砍掉的砍掉的,该整合的整合,把L47层防火墙测试产品拿掉了

剩下的产品线都归在Xena产品线,原公司名称成了产品系列的名称

而且为了更好的对齐Teledyne Lecroy的产品系列,命名形式也改了

对应的Z系列为Traffic Generator & Analysisor,E系列为Emulator

Teledyne to Acquire Xena Networks

October 3, 2023

​​Will Expand Teledyne LeCroy’s Protocol Test Portfolio to include Terabit Ethernet Traffic Generation and Network Emulation

THOUSAND OAKS, Calif. – October 3, 2023 – Teledyne Technologies Incorporated (NYSE:TDY) (“Teledyne”) announced today that it has entered into an agreement to acquire Xena Networks ApS (“Xena Networks”). Xena Networks, headquartered outside of Copenhagen, Denmark, is a leading provider of high-speed Terabit Ethernet validation, quality assurance, and production test solutions.

“The Xena Networks acquisition will further establish our leadership in the protocol test market, extending our reach to Ethernet system validation engineers, quality assurance labs, and production lines for test, evaluation, and acceptance of Ethernet components and systems,” said Robert Mehrabian, Chairman, President, and Chief Executive Officer of Teledyne. “The acquisition of LeCroy in 2012 provided a healthy and growing portfolio of protocol test businesses focused on PCI Express, USB as well as storage and networking technologies. Since then, we expanded Teledyne LeCroy’s protocol test business with multiple acquisitions, including Quantum Data (for video), Frontline (for Bluetooth and WiFi) and OakGate (for storage devices test solutions). Xena Networks will be a powerful addition to this strong and growing protocol test portfolio.”

Artificial intelligence and machine learning, high-performance computing and 5G all require ever higher speeds of data transmission and fuel the need for new solutions to test next generation Terabit Ethernet network components. The same network equipment manufacturers that today use Teledyne LeCroy’s network protocol analysis and error injection tools also require high-performance Ethernet traffic generation and network emulation test tools to validate product designs. Xena Networks test tools offer in-depth Ethernet link training and auto negotiation test capabilities, which next generation Terabit Ethernet products need to ensure that expected performance is achieved. “Combining the traffic generation and network emulation capabilities of Xena Networks with the protocol analysis functionality of Teledyne LeCroy will deliver a unique value proposition in support of semiconductor and network equipment manufacturers, network service providers, and hyperscale and cloud computing providers,” said Kevin Prusso, Vice President and General Manager of Teledyne LeCroy.

Jacob Vestergaard Nielsen, Xena Networks Chief Executive Officer, said, “We’re excited to join Teledyne LeCroy and leverage its wide coverage of protocols technologies. In particular, Teledyne’s network protocol analysis solutions complement well our traffic generation, physical layer, and network emulation products with support of up to 800Gbps Terabit Ethernet data rates, to the benefit of our customers.”

About Teledyne LeCroy
Teledyne LeCroy is a leading manufacturer of advanced protocol analyzers, oscilloscopes, and other test instruments that verify performance, validate compliance, and debug complex electronic systems quickly and thoroughly. Since its founding in 1964, the Company has focused on incorporating powerful tools into innovative products that enhance “Time-to-Insight.” Faster time to insight enables users to rapidly find and fix defects in complex electronic systems, dramatically improving time-to-market for a wide variety of applications and end markets. Teledyne LeCroy is based in Chestnut Ridge, New York. For more information, visit Teledyne LeCroy’s website at teledynelecroy.com.

Forward-Looking Statements Cautionary Notice
This press release contains forward-looking statements, as defined in the Private Securities Litigation Reform Act of 1995, relating to a pending acquisition of a company subject to customary closing conditions. Actual results could differ materially from these forward-looking statements. Many factors, as well as market and economic conditions beyond either company’s control, could change anticipated results. There are additional risks associated with operating businesses internationally, including those arising from United States and foreign government policy and regulatory changes or actions and exchange rate fluctuations.​

Xena产品系列

Z系列
Z01 Odin

10/100/1000M, BASE-T,RJ45,SFP,Base-T1

Z10 Odin

100M/1G/2.5G/5G/10G, SFP+,BASE-T

Z100 Loki

10/25/40/50/100G, QSFP28, NRZ

Z400 Thor

10/25/40/50/100/200/400G ,QSFP-DD, 56G PAM4, 28G NRZ

Z800 Freya

50/100/200/400G , 56G PAM4, 100/200/400/800G ,56G PAM4, 112G PAM4,QSFP-DD, OSFP

Z1600 Sif

200/400/800/1600G OSFP,224G PAM4

E系列
E100

10/25/40/50/100G QSFP28网络损伤仿真测试仪,最大带宽为100G的网络损伤

E400

50/100/200/400G QSFP-DD网络损伤仿真测试仪,最大端口带宽为400G的网络损伤

E800

100/200/400/800G QSFP-DD OSFP网络损伤仿真测试仪,最大带宽为800G的网络损伤

针对Ethernet以太网高速互联,Xena提供的测试用例

L1层的测试用例

RS-FEC

大概就是下图可以概括的东西,分别统计每个LANE(可以精细到Virtual lane)的FEC Block的误码情况,查看多少的symbol code被纠正过来了,多少个Symbol error是不可纠的的一个统计分布情况。

Auto-Negotiation

端口自适应,对应的规范是 IEEE 802.3 Clause 73 and ETH. 400G/800G specifications

通俗点讲就是验证在铜缆状态下,线缆的端口是能不能都能链接到800G,如果不行,那就工作在400G,如果还是不行,那就是工作在200G,.... 100/50G,跟之前的10/100/1000M端口自适应是基本类似的概念了

Link Training

链路学习,对应的规范是 IEEE 802.3 Clause 136 and 161

AN/LT- Debug

可记录每个AN/LT Trace完整的通信过程,记录详细的底层日志进行分析,支持端节点的仿真,进行单步的调试分析

Tx Taps/ Rx Taps

调节PHY/SerDes的均衡信息,来测试验证对端的匹配适应情况

PPM Adjustment

±400PPM的调节,步进式或平滑曲线的动态跳变

I2C R/W

基于I2C的方式去操作控制光模块的寄存器信息,可以用在CMIS 4.0,CMIS5.0等规范的校验

SIV(Singnal Intergrity View)

眼图测试,信号完整性测试

Link Flap

链路闪断测试

Error Injection

错误插入模拟验证

FEC Error Injection

针对每个FEC编码校验块的错误注入          

PMA Error Injection

针对PCS/PMA层的错误注入,在仪表端为PMA Error,经过后续的编码后,体现在DUT端为PCS Error

L23层的测试用例

Wire-speed Traffic

线速的以太网报文流量业务压力测试,构建MAC/IP/TCP/UDP/RDMA等不同的业务报文

50/100/200/400G: Z400q Thor,Z800q Freya,Z800o Freya

QSFP-DD接口,基于56G PAM4的400G接口

OSFP接口,基于56G PAM4的400G接口

100/200/400/800G:Z800q Freya,Z800o Freya

QSFP-DD接口,基于112G PAM4的800G接口,同时向前兼容56G SerDes

OSFP接口,基于112G PAM4的800G接口,同时向前兼容56G SerDes

MAC/IP/TCP/RDMA业务报文流量

协议字段的跳变

MAC/IP/TCP等不同协议字段的跳变,使用Streams+跳变来实现千万计的Flows并发

Packet Error模拟

FCS Error,CRC Error, Checksum Error

PFC

Priority-based Flow Control,基于优先级的流量控制测试

L3+层的测试用例

基于TCP的业务测试 & 基于UDP的业务测试

RDMA QP

兆级别的QP建链仿真,并发维护验证,并发性能评测,按QP业务场景的延迟抖动会话数测试

DCQCN算法支持

操作验证ECN字段实现的业务流量级别的拥塞控制仿真和验证

User-Define Protocol

完全用户自定义的协议测试,私有协议验证和仿真

NCCL通信库

Nvidia Collective multi-GPU Communication Library,通信库测试仿真验证(all-gather, reduce, broadcast)

大规模的通信模型仿真,GPGPU的千卡,万卡集群的通信仿真

损伤-负面压力仿真

Link Flap链路震荡

模拟光模块的一些异常行为,可以是10ns级别的抖动行为

PCS/PMA Error注入

各类物理层的误码注入,异常注入

报文延迟/抖动注入

Packet /Frame 级别的损伤

重复业务报文

Packet /Frame 级别的损伤

带宽限速

Packet /Frame 级别的损伤,拥塞情况的模拟

锁定到特征报文施加损伤

通过过滤器的支持,可锁定特定的MAC地址,IP地址,协议字段等特性,仅对特写报文进行损伤施加业务

API二次开发接口

CLI

基于脚本的二次开发

XOA

二次封装的Python API库文件支持

HLAPI

其他高阶封装的API文件支持,C#,C++,Labview等

兼容性/互通性保障

Ethernet Alliance以太网联盟

参与以太网联盟的相关活动

UNH-IOL

The University of New Hampshire InterOperability Laboratory (UNH-IOL)互插拔活动


待补充

2024年3月25日20:29:15,后续再补上一些细节和图片

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值