RDMA技术浅析(三)

本文档记录了一次尝试在 CentOS 7.9 系统上配置 Mellanox ConnectX-4Lx 网卡进行RDMA测试的过程。通过安装OFED驱动,更新固件,但发现网卡不支持InfiniBand模式,只支持Ethernet。查阅资料后确认ConnectX-4Lx不支持IB模式,导致RDMA测试无法进行。
摘要由CSDN通过智能技术生成

环境

纸上谈兵了这么多,我们还是来做一下rdma的测试看看。公司正好有mellanox的网卡,网卡是

[root@localhost ~]# lspci -vvv |grep Eth
01:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
01:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

Linux版本

[root@localhost ~]# cat /etc/redhat-release CentOS Linux release 7.9.2009 (Core) [root@localhost ~]# uname -r 3.10.0-1160.el7.x86_64

固件版本是

[root@localhost bak]# flint -d /dev/mst/mt4117_pciconf0 -i fw-ConnectX4Lx-rel-14_31_1014-MCX4121A-ACA_Ax-UEFI-14.24.13-FlexBoot-3.6.403.bin  burn
Current FW version on flash:  14.23.1020
New FW version:               14.31.1014

FSMST_INITIALIZE -   OK
Writing Boot image component -   OK
-I- To load new FW run mlxfwreset or reboot machine.

安装OFED

mellanox的ofed下载地址如下:

https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/

下载自己操作系统对应的版本

tar xvf MLNX_OFED_SRC-5.5-1.0.3.2.tgz cd MLNX_OFED_SRC-5.5-1.0.3.2/ ./install.pl

安装完之后,看到了GUID和若干PASS的状态

[root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# hca_self_test.ofed
---- Performing Adapter Device Self Test ----
Number of CAs Detected ................. 2
PCI Device Check ....................... PASS
Kernel Arch ............................ x86_64
Host Driver Version .................... OFED-internal-5.5-1.0.3: 3.10.0-1160.el7.x86_64
Host Driver RPM Check .................. PASS
Firmware on CA #0 NIC .................. v14.23.1020
Firmware on CA #1 NIC .................. v14.23.1020
Host Driver Initialization ............. PASS
Number of CA Ports Active .............. 0
Port State of Port #1 on CA #0 (NIC)..... DOWN (Ethernet)
Port State of Port #1 on CA #1 (NIC)..... DOWN (Ethernet)
Error Counter Check on CA #0 (NIC)...... PASS
Error Counter Check on CA #1 (NIC)...... PASS
Kernel Syslog Check .................... PASS
Node GUID on CA #0 (NIC) ............... 98:03:9b:03:00:48:bd:c8
Node GUID on CA #1 (NIC) ............... 98:03:9b:03:00:48:bd:c9

可以输入一些命令查看ib的状态

[root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# ibdev2netdev        //查看以太网设备和IB设备/端口之间的关联
mlx5_0 port 1 ==> eth1 (Down)
mlx5_1 port 1 ==> eth2 (Down)
[root@localhost MLNX_OFED_SRC-5.5-1.0.3.2]# ibv_devinfo
hca_id: mlx5_0
transport:                      InfiniBand (0)                    //IB协议
fw_ver:                         14.23.1020
node_guid:                      9803:9b03:0048:bdc8
sys_image_guid:                 9803:9b03:0048:bdc8
vendor_id:                      0x02c9
vendor_part_id:                 4117
hw_ver:                         0x0
board_id:                       MT_2420110034
phys_port_cnt:                  1
port:   1
state:                  PORT_DOWN (1)
max_mtu:                4096 (5)
active_mtu:             1024 (3)
sm_lid:                 0
port_lid:               0
port_lmc:               0x00
link_layer:             Ethernet
hca_id: mlx5_1
transport:                      InfiniBand (0)
fw_ver:                         14.23.1020
node_guid:                      9803:9b03:0048:bdc9
sys_image_guid:                 9803:9b03:0048:bdc8
vendor_id:                      0x02c9
vendor_part_id:                 4117
hw_ver:                         0x0
board_id:                       MT_2420110034
phys_port_cnt:                  1
port:   1
state:                  PORT_DOWN (1)
max_mtu:                4096 (5)
active_mtu:             1024 (3)
sm_lid:                 0
port_lid:               0
port_lmc:               0x00
link_layer:             Ethernet

从上面的打印来看,目前的state还是PORT_DOWN,而且link_layer不是IB模式,网上说要修改LINK_TYPE_P1为1(1是IB模式,2是ethernet模式)

[root@localhost ~]# mlxconfig -d /dev/mst/mt4117_pciconf0 query |grep LINK

但是没找到LINK_TYPE_P1这个选项。

怀疑是不是固件版本的问题

更新固件试试

网上查了一下,需要下一个MST的工具包

https://network.nvidia.com/products/adapter-software/firmware-tools/

tar xvf mft-4.18.0-106-x86_64-rpm.tgz
cd mft-4.18.0-106-x86_64-rpm/
./install.sh
mst start
service mst status

下载最新版本的固件

https://network.nvidia.com/support/firmware/connectx4lxen/

[root@localhost bak]# flint -d /dev/mst/mt4117_pciconf0 -i fw-ConnectX4Lx-rel-14_31_1014-MCX4121A-ACA_Ax-UEFI-14.24.13-FlexBoot-3.6.403.bin  burn
Current FW version on flash:  14.23.1020
New FW version:               14.31.1014

FSMST_INITIALIZE -   OK
Writing Boot image component -   OK
-I- To load new FW run mlxfwreset or reboot machine.

没有效果

下载老一点的驱动,5.1的,替换5.5的驱动,还是不行

后来在这个网址看到如下信息:

https://access.redhat.com/articles/3082811

Note that the card in the example output is an Ethernet-only card, so there is no port type setting.

这里就提到了connect4x lx网卡是不支持IB的,但是为啥mlxconfig query又显示transport是IB呢,太奇怪了。

感觉无法做这个测试了。

transport: InfiniBand (0)

而且connect4x lx和connect4x都是mlx5芯片的 ,原生就应该支持IB,为啥要搞出个不支持rdma的板卡呢。

https://mymellanox.force.com/mellanoxcommunity/s/question/0D51T00008dGyJMSA0/how-to-use-mellanox-connectx4-lx

这个网址同样提到

Unfortunately, I'm starting to think that I have the wrong card (and that this only works for Ethernet), because I am unable to change the link type of this card to infiniband. I have followed all the instructions, but it says that the option (LINK_TYPE) isn't found when I try via the command line.​

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值