mlx rdma网卡指标参数简介

mlx rdma网卡指标参数简介


综述

mlx5 driver在linux sysfs下有一系列的mlx网卡参数和计数器分布在/sys/class/infiniband/mlx5_x/ports/1/counters/sys/class/infiniband/mlx5_x/ports/1/hw_counters目录下,这些参数统计了某种类型的事件发生的次数,如某种错误数,收包数等等。理解这些参数,可以帮助我们更好的理解mlx网卡的运行状态,通过监控,可以更快的定位rdma报错的根因

hw_counter

  • rnr_nak_retry_err:本机作为发送方,收到对端发来的RNR NAK包的数量。如果接收方qp的srq没有空闲了,这个计数会涨
  • out_of_buffer:本机作为接收方,收包的时候发现没有buffer了,如果自己qp的srq满了,这个计数会涨
  • out_of_sequence:收包乱序
  • local_ack_timeout_err:发送的rdma请求超时计数
  • packet_seq_err:本机收到NAK包计数
  • req_cqe_error:本机CQE报错计数
  • duplicate_request:本机收到重复包
  • np_ecn_marked_roce_packets:本机收到的ecn

counter

  • port_rcv_data: Total number of data octets, divided by 4 (lanes), received on all VLs. This is 64 bit counter.
  • port_rcv_packets: Total number of packets (this may include packets containing Errors. This is 64 bit counter.
  • port_xmit_data: Total number of data octets, divided by 4 (lanes), transmitted on all VLs. This is 64 bit counter.
  • port_xmit_packets: Total number of packets transmitted on all VLs from this port. This may include packets with errors.
  • unicast_rcv_packets: Total number of unicast packets, including unicast packets containing errors.
  • unicast_xmit_packets: Total number of unicast packets transmitted on all VLs from the port. This may include unicast packets with errors.

参考链接

  1. Understanding mlx5 Linux Counters and Status Parameters
  2. Understanding mlx5 ethtool Counters
  3. Nak Errors
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值