SUSE HA “partition WITHOUT quorum” 和 corosync “[TOTEM ] Digest does not match”报错问题处理

SUSE 15 HA中的节点重启后集群服务不正常,集群处于“partition WITHOUT quorum”的状态,同时两个节点都提示另一个节点的状态是“UNCLEAN (offline)”,如下所示:

节点1的crm status输出:

hanadb01:~ # crm status
Status of pacemakerd: 'Pacemaker is running' (last updated 2024-05-29 09:18:06 +08:00)
Cluster Summary:
  * Stack: corosync
  * Current DC: hanadb01 (version 2.1.5+20221208.a3f44794f-150500.4.9-2.1.5+20221208.a3f44794f) - partition WITHOUT quorum
  * Last updated: Wed May 29 09:18:07 2024
  * Last change:  Wed May 29 09:11:22 2024 by hacluster via crmd on hanadb01
  * 2 nodes configured
  * 7 resource instances configured

Node List:
  * Node hanadb02: UNCLEAN (offline)
  * Online: [ hanadb01 ]

Full List of Resources:
  * stonith-sbd (stonith:external/sbd):  Stopped
  * Clone Set: cln_SAPHanaTopology_HA1_HDB10 [rsc_SAPHanaTopology_HA1_HDB10]:
    * Stopped: [ hanadb01 hanadb02 ]
  * Clone Set: msl_SAPHana_HA1_HDB10 [rsc_SAPHana_HA1_HDB10] (promotable):
    * Stopped: [ hanadb01 hanadb02 ]
  * rsc_ip_HA1_HDB10    (ocf::heartbeat:IPaddr2):        Stopped
  * rsc_ip_HA1_HDB10_readenabled        (ocf::heartbeat:IPaddr2):        Stopped

节点2的crm status输出:

hanadb02:~ # crm status
Status of pacemakerd: 'Pacemaker is running' (last updated 2024-05-29 09:26:45 +08:00)
Cluster Summary:
  * Stack: corosync
  * Current DC: hanadb02 (version 2.1.5+20221208.a3f44794f-150500.4.9-2.1.5+20221208.a3f44794f) - partition WITHOUT quorum
  * Last updated: Wed May 29 09:26:46 2024
  * Last change:  Wed May 29 09:21:57 2024 by root via cibadmin on hanadb02
  * 2 nodes configured
  * 7 resource instances configured

Node List:
  * Node hanadb01: UNCLEAN (offline)
  * Online: [ hanadb02 ]

Full List of Resources:
  * stonith-sbd (stonith:external/sbd):  Stopped
  * Clone Set: cln_SAPHanaTopology_HA1_HDB10 [rsc_SAPHanaTopology_HA1_HDB10]:
    * Stopped: [ hanadb01 hanadb02 ]
  * Clone Set: msl_SAPHana_HA1_HDB10 [rsc_SAPHana_HA1_HDB10] (promotable):
    * Stopped: [ hanadb01 hanadb02 ]
  * rsc_ip_HA1_HDB10    (ocf::heartbeat:IPaddr2):        Stopped
  * rsc_ip_HA1_HDB10_readenabled        (ocf::heartbeat:IPaddr2):        Stopped

查看corosync服务的运行状态,提示“ [TOTEM ] Digest does not match”的错误:

hanadb02:~ # systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
     Active: active (running) since Wed 2024-05-29 09:23:34 CST; 4min 27s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
    Process: 1954 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
   Main PID: 2166 (corosync)
      Tasks: 2
     CGroup: /system.slice/corosync.service
             └─ 2166 corosync

May 29 09:28:00 hanadb02 corosync[2166]:   [TOTEM ] Invalid packet data
May 29 09:28:00 hanadb02 corosync[2166]:   [TOTEM ] Digest does not match
May 29 09:28:00 hanadb02 corosync[2166]:   [TOTEM ] Received message has invalid digest... ignoring.
May 29 09:28:00 hanadb02 corosync[2166]:   [TOTEM ] Invalid packet data
May 29 09:28:01 hanadb02 corosync[2166]:   [TOTEM ] Digest does not match
May 29 09:28:01 hanadb02 corosync[2166]:   [TOTEM ] Received message has invalid digest... ignoring.
May 29 09:28:01 hanadb02 corosync[2166]:   [TOTEM ] Invalid packet data
May 29 09:28:01 hanadb02 corosync[2166]:   [TOTEM ] Digest does not match
May 29 09:28:01 hanadb02 corosync[2166]:   [TOTEM ] Received message has invalid digest... ignoring.
May 29 09:28:01 hanadb02 corosync[2166]:   [TOTEM ] Invalid packet data

经过检查,发现集群两台主机的/etc/corosync/authkey文件内容不一致,将其中一个节点的文件拷贝到另外一个节点,然后重启corosync服务,集群恢复正常。


参考:
corosync:[TOTEM] Digest does not match

  • 3
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值