ubuntu22配置ib网卡驱动&ib交换机配置

背景

训练节点架构规划,这里主要记录ib网卡驱动配置和ib交换机端配置

实施

客户端ib网卡配置

系统版本

cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

内核版本

Linux hostname 5.15.0-118-generic #128-Ubuntu SMP Fri Jul 5 09:28:59 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

硬件信息

lspci |grep -i mellanox
83:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]

下载ib网卡驱动

curl -x socks5://xxx:xx -LO https://content.mellanox.com/ofed/MLNX_OFED-24.04-0.7.0.0/MLNX_OFED_LINUX-24.04-0.7.0.0-ubuntu22.04-x86_64.tgz

配置

tar zxf MLNX_OFED_LINUX-24.04-0.7.0.0-ubuntu22.04-x86_64.tgz
cd MLNX_OFED_LINUX-24.04-0.7.0.0-ubuntu22.04-x86_64/

安装

./mlnxofedinstall
Logs dir: /tmp/MLNX_OFED_LINUX.17643.logs
General log file: /tmp/MLNX_OFED_LINUX.17643.logs/general.log

Below is the list of MLNX_OFED_LINUX packages that you have chosen
(some may have been added by the installer due to package dependencies):

ofed-scripts
mlnx-tools
mlnx-ofed-kernel-utils
mlnx-ofed-kernel-dkms
iser-dkms
isert-dkms
srp-dkms
rdma-core
libibverbs1
ibverbs-utils
ibverbs-providers
libibverbs-dev
libibverbs1-dbg
libibumad3
libibumad-dev
ibacm
librdmacm1
rdmacm-utils
librdmacm-dev
mstflint
ibdump
libibmad5
libibmad-dev
libopensm
opensm
opensm-doc
libopensm-devel
libibnetdisc5
infiniband-diags
mft
kernel-mft-dkms
perftest
ibutils2
ibsim
ibsim-doc
ucx
sharp
hcoll
knem-dkms
knem
openmpi
mpitests
dpcp
srptools
mlnx-ethtool
mlnx-iproute2
rshim
ibarr

This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

Do you want to continue?[y/N]:y

Checking SW Requirements...
One or more required packages for installing MLNX_OFED_LINUX are missing.
Attempting to install the following missing packages:
swig quilt libltdl-dev autotools-dev automake flex libfuse2 debhelper libnl-route-3-200 libgfortran5 autoconf libnl-3-dev chrpath pkg-config bison tk m4 libnl-route-3-dev graphviz gfortran dkms
Removing old packages...
Installing new packages
Installing ofed-scripts-24.04.OFED.24.04.0.7.0...
Installing mlnx-tools-24.04.0.2404066...
Installing mlnx-ofed-kernel-utils-24.04.OFED.24.04.0.7.0.1...
Installing mlnx-ofed-kernel-dkms-24.04.OFED.24.04.0.7.0.1...
Installing iser-dkms-24.04.OFED.24.04.0.7.0.1...
Installing isert-dkms-24.04.OFED.24.04.0.7.0.1...
Installing srp-dkms-24.04.OFED.24.04.0.7.0.1...
Installing rdma-core-2404mlnx51...
Installing libibverbs1-2404mlnx51...
Installing ibverbs-utils-2404mlnx51...
Installing ibverbs-providers-2404mlnx51...
Installing libibverbs-dev-2404mlnx51...
Installing libibverbs1-dbg-2404mlnx51...
Installing libibumad3-2404mlnx51...
Installing libibumad-dev-2404mlnx51...
Installing ibacm-2404mlnx51...
Installing librdmacm1-2404mlnx51...
Installing rdmacm-utils-2404mlnx51...
Installing librdmacm-dev-2404mlnx51...
Installing mstflint-4.16.1...
Installing ibdump-6.0.0...
Installing libibmad5-2404mlnx51...
Installing libibmad-dev-2404mlnx51...
Installing libopensm-5.19.0.MLNX20240421.b7c161a9...
Installing opensm-5.19.0.MLNX20240421.b7c161a9...
Installing opensm-doc-5.19.0.MLNX20240421.b7c161a9...
Installing libopensm-devel-5.19.0.MLNX20240421.b7c161a9...
Installing libibnetdisc5-2404mlnx51...
Installing infiniband-diags-2404mlnx51...
Installing mft-4.28.0...
Installing kernel-mft-dkms-4.28.0.92...
Installing perftest-24.04.0...
Installing ibutils2-2.1.1...
Installing ibsim-0.12...
Installing ibsim-doc-0.12...
Installing ucx-1.17.0...
Installing sharp-3.7.0.MLNX20240421.48444036...
Installing hcoll-4.8.3227...
Installing knem-dkms-1.1.4.90mlnx3...
Installing knem-1.1.4.90mlnx3...
Installing openmpi-4.1.7a1...
Installing mpitests-3.2.23...
Installing dpcp-1.1.48...
Installing srptools-2404mlnx51...
Installing mlnx-ethtool-6.7...
Installing mlnx-iproute2-6.7.0...
Installing rshim-2.0.28...
Installing ibarr-0.1.3...
Selecting previously unselected package mlnx-fw-updater.
(Reading database ... 126356 files and directories currently installed.)
Preparing to unpack .../mlnx-fw-updater_24.04-0.7.0.0_amd64.deb ...
Unpacking mlnx-fw-updater (24.04-0.7.0.0) ...
Setting up mlnx-fw-updater (24.04-0.7.0.0) ...

Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

Initializing...
Attempting to perform Firmware update...
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX6
  Part Number:      MCX653105A-HDA_Ax
  Description:      ConnectX-6 VPI adapter card; HDR IB (200Gb/s) and 200GbE; single-port QSFP56; PCIe4.0 x16; tall bracket; ROHS R6
  PSID:             MT_00003
  PCI Device Name:  83:00.0
  Base GUID:        58a2e652
  Versions:         Current        Available     
     FW             20.39.1002     20.41.1000    
     PXE            3.7.0201       3.7.0400      
     UEFI           14.32.0012     14.34.0012    

  Status:           Update required

---------
Found 1 device(s) requiring firmware update...

Device #1: Updating FW ...     
FSMST_INITIALIZE -   OK          
Writing Boot image component -   OK                                                                                                                                                                                                                                                                                           Done

Restart needed for updates to take effect.
Log File: /tmp/s59xQUEIEL
Real log file: /tmp/MLNX_OFED_LINUX.17643.logs/fw_update.log
Device (83:00.0):
	83:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
	Link Width: x16
	PCI Link Speed: 16GT/s

Installation passed successfully
To load the new driver, run:
/etc/init.d/openibd restart
root@host:~/MLNX_OFED_LINUX-24.04-0.7.0.0-ubuntu22.04-x86_64# /etc/init.d/openibd restart
Unloading HCA driver:                                      [  OK  ]
Loading HCA driver and Access Layer:                       [  OK  ]

测试

root@hostname:~hca_self_test.ofed

---- Performing Adapter Device Self Test ----
Number of CAs Detected ................. 1
PCI Device Check ....................... PASS
Kernel Arch ............................ x86_64
Host Driver Version .................... MLNX_OFED_LINUX-24.04-0.7.0.0 (OFED-24.04-0.7.0): 5.15.0-118-generic
Host Driver RPM Check .................. PASS
Firmware on CA #0 HCA .................. v20.41.1000
20.39.1002
Host Driver Initialization ............. PASS
Number of CA Ports Active .............. 1
Port State of Port #1 on CA #0 (HCA)..... UP 4X HDR (InfiniBand)
Error Counter Check on CA #0 (HCA)...... PASS
Kernel Syslog Check .................... PASS
Node GUID on CA #0 (HCA) ............... 58:a2:d6:52
------------------ DONE ---------------------

验证

root@host:~# ibstat
CA 'mlx5_0'
	CA type: MT4123
	Number of ports: 1
	Firmware version: 20.41.1000
	Hardware version: 0
	Node GUID: 0x58a2e103052
	System image GUID: 0x58a2e10300a9d652
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 200
		Base lid: 9
		LMC: 0
		SM lid: 1
		Capability mask: 0xa6518
		Port GUID: 0x58a2e0a9d652
		Link layer: InfiniBand
root@host:~# ibstatus
Infiniband device 'mlx5_0' port 1 status:
	default gid:	 fe80:0000:0000:0000:58a2:00a9:d652
	base lid:	 0x9
	sm lid:		 0x1
	state:		 4: ACTIVE
	phys state:	 5: LinkUp
	rate:		 200 Gb/sec (4X HDR)
	link_layer:	 InfiniBand

配置ip地址就和正常的网卡配置IP地址一样

注意的点是只配置IP地址和掩码即可,无需配置网关

cat /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
  ethernets:
    ens1f0:
      dhcp4: false
      addresses: [xxx]
      optional: true
      routes:
        - to: default
          via: xxx
      nameservers:
        addresses:
          - xxx
    ibs108: # 这里是ib网卡的配置
      dhcp4: false
      addresses: [xxx]
      nameservers:
        addresses:
          - xxx
    ens1f1:
      dhcp4: true
    ens5f0:
      dhcp4: false
    ens5f1:
      dhcp4: false
    ens5f2:
      dhcp4: false
    ens5f3:
      dhcp4: false
    enxda5cac3f11ab:
      dhcp4: true
  version: 2

netplan apply 即可

可以为ib网卡配置优化脚本
如增大MTU值

测试参数稍后补充

ib交换机配置

使用串口consle线连接ib交换机需要调整波特率为115200,也需要看具体的型号,有一些型号是9600,其他的没啥要注意的

和普通交换机差不多,要确认子网管理器已正常启用,

为管理口配置IP地址

配置ssh登录密码

<hostc>ssh IP地址
Username: admin
Press CTRL+C to abort.
Connecting to IP地址 port 22.
Mellanox MLNX-OS Switch Management
Password: 
Enter a character ~ and a dot to abort.
Last login: Wed Aug 21 13:00:24 UTC 2024 from xxx on pts/0
Number of total successful connections since last 1 days: 15
Mellanox Switch
switch-4a128e [standalone: master] > 

进入特权模式enable
进入配置模式 conf t
查看配置show running-config
查看端口配置状态,是否启用ib协议等信息

switch-4a128e [standalone: master] (config) # show interfaces ib status 
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Interface      Description                                IB Subnet            Speed           Current line rate   Logical port state   Physical port state   
---------------------------------------------------------------------------------------------------------------------------------------------------------------
IB1/1                                                     infiniband-default   hdr             200.0 Gbps          Active               LinkUp                
IB1/2                                                     infiniband-default   hdr             200.0 Gbps          Active               LinkUp                
IB1/3                                                     infiniband-default   hdr             200.0 Gbps          Active               LinkUp                
IB1/4                                                     infiniband-default   -               -                   Down                 Polling               
IB1/5                                                     infiniband-default   -               -                   Down                 Polling               
IB1/6                                                     infiniband-default   -               -                   Down                 Polling               
IB1/7                                                     infiniband-default   -               -                   Down                 Polling               
IB1/8                                                     infiniband-default   -               -                   Down                 Polling               
IB1/9                                                     infiniband-default   hdr             200.0 Gbps          Active               LinkUp                
IB1/10                                                    infiniband-default   hdr             200.0 Gbps          Active               LinkUp                
IB1/11                                                    infiniband-default   -               -                   Down                 Polling               
IB1/12                                                    infiniband-default   hdr             200.0 Gbps          Active               LinkUp                
IB1/13                                                    infiniband-default   hdr             200.0 Gbps          Active               LinkUp                
IB1/14                                                    infiniband-default   -               -                   Down                 Polling               
IB1/15                                                    infiniband-default   -               -                   Down                 Polling               
IB1/16                                                    infiniband-default   -               -                   Down                 Polling               
IB1/17                                                    infiniband-default   hdr             200.0 Gbps          Active               LinkUp                
IB1/18                                                    infiniband-default   hdr             200.0 Gbps          Active               LinkUp                
IB1/19                                                    infiniband-default   hdr             200.0 Gbps          Active               LinkUp                
IB1/20                                                    infiniband-default   -               -                   Down                 Polling               
xxxxxxx省略信息

保存配置
switch-4a128e [standalone: master] # write memory
查看SN
switch-4a128e [standalone: master] # show inventory

refer

ib网卡配置
https://blog.csdn.net/laijianzong/article/details/127545152

Mellanox Technologies Ltd介绍
https://36kr.com/p/2368067301091716
ib交换机配置参考
https://www.hua-hang.cn/case/269.html
学习ib架构参考
https://blog.csdn.net/sz_woshishazi/category_12032159.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值