官网参考链接:https://community.mellanox.com/s/article/understanding-subnet-manager–sm–high-availability–ha–on-mellanox-infiniband-switches
一、Mellanox SM HA Solution (Mellanox InfiniBand Switches)
- When enabling SM HA (configuration synchronization) on Mellanox IB switches, the SM database is synchronized with all the switches enabled with SM.
- The synchronization is done out-of-band using an Ethernet management network. All switches participating in the SM HA should be connected to the same management subnet (same network) without the need to go through a router. This is because the switches send multicast control frames that do not cross routers normally.
- All the switches that participate in the Mellanox SM HA are joined to the InfiniBand subnet ID. Once joined, the synchronized SMs are launched. One of the nodes is elected as SM Master and the others are Slaves.
- The SM HA allows the systems’ manager to enter and modify all InfiniBand SM configuration of the different subnet managers from a single location using a Virtual IP (VIP). All subnet managers can be controlled, started, or stopped from this VIP address. The user is expected to use the VIP address for SM configuration. Trying to configure SM parameters on a master or slave IP will be disabled.
二、实验环境
IB交换机 | IP |
---|---|
SF6036-01 | 172.16.0.251 |
SF6036-02 | 172.16.0.252 |
三、配置
1、配置集群VIP
SF6036-01 [standalone: master] > enable
SF6036-01 [standalone: master] # config terminal
SF6036-01 [standalone: master] (config) # ib ha cluster ip 172.16.0.253 255.255.240.0
SF6036-01 [cluster: master] (config) #
2、添加第二个交换机到集群
SF6036-02 [standalone: master] (config) # ib ha cluster
SF6036-02 [cluster: standby] (config) #
3、开启集群
SF6036-01 [cluster: master] (config) # ib smnode SF6036-01 enable
SF6036-01 [cluster: master] (config) # ib smnode SF6036-02 enable
4、设置优先级(0-15)
SF6036-01 [cluster: master] (config) # ib smnode SF6036-01 sm-priority 1
SF6036-01 [cluster: master] (config) # ib smnode SF6036-02 sm-priority 2
四、查看集群
可以进行测试,断掉一台IB交换机的电源,Master会转移,且不会影响业务运行
1、查看IB高可用状态
SF6036-01 [cluster: master] (config) # show ib ha
Global HA state
==================
IB Subnet HA name: cluster
HA IP address: 172.16.0.253/20
Active HA nodes: 2
HA node local information
Name: SF6036-01 (active) <--- (local node)
SM-HA state: master
IP: 172.16.0.251
Virtual switch membership: infiniband-default
HA node local information
Name: SF6036-02 (active)
SM-HA state: standby
IP: 172.16.0.252
Virtual switch membership: infiniband-default
SF6036-01 [cluster: master] (config) # show ib ha brief
Global HA state
==================
IB Subnet HA name: cluster
HA IP address: 172.16.0.253/20
Active HA nodes: 2
ID SM-HA state IP Virtual switch membership
--------------------------------------------------------------------------------
*SF6036-01 master 172.16.0.251 infiniband-default
SF6036-02 standby 172.16.0.252 infiniband-default
2、查看IB SM状态
SF6036-01 [cluster: master] (config) # show ib smnodes
HA state of switch infiniband-default
========================================
IB Subnet HA name: cluster
HA IP address: 172.16.0.253/20
Active HA nodes: 2
HA node local information
Name: SF6036-01 (active) <--- (local node)
SM-HA state: master
SM Licensed: yes
SM Running: running
SM Enabled: enabled - master
SM Priority: 1
IP: 172.16.0.251
HA node local information
Name: SF6036-02 (active)
SM-HA state: standby
SM Licensed: yes
SM Running: running
SM Enabled: enabled
SM Priority: 2
IP: 172.16.0.252
3、连接查看状态
此时我们可以通过172.16.0.253(VIP)进行连接!!!
保存配置信息(也可以在界面上面点击save
保存):
SF6036-01 [cluster-6036: master] (config) # write memory
SF6036-02 [cluster-6036: standby] (config) # write memory