Corosync+Pacemaker集群中删除关机节点相关信息

问题需求

某节点因发生故障被关机,经核查后暂时无法将该节点修复正常并加入集群中,因而,现需要将四节点集群变成三节点集群,即将关机的node3节点相关信息从集群中删除,确保集群以三节点集群状态继续正常运行。
本文中关机节点以node3为例,node3被关机时的集群状态如下:

[huai@node0 ~]# pcs status
Cluster name: my_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: node2 (version 2.0.7-1.oe2205-ba59ce7147) - partition with quorum
  * Last updated: Wed Jan 11 17:55:03 2023
  * Last change:  Wed Jan 11 17:13:13 2023 by hacluster via crmd on node1
  * 4 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ node0 node1 node2 ]
  * OFFLINE: [ node3 ]

Full List of Resources:
  * Resource Group: skl_data_group:
    * skl_shared       (ocf::heartbeat:Filesystem):     Started node0
	* skl_metadata       (ocf::heartbeat:Filesystem):     Started node0
  * Resource Group: skl_service:
    * skl_mysql        (ocf::heartbeat:mysql):     Started node0
    * skl_tomcat        (ocf::heartbeat:tomcat):         Started node0
    * webip    (ocf::heartbeat:IPaddr2):        Started node0
  * httpd    (lsb:apache2):  Started node0

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

问题分析

既然是需要删除节点,首先想到的是使用PCS命令中的remove方式,但只使用remove方式是否可行呢?

[huai@node0 ~]# pcs cluster node remove node3 --force
Error: Unable to connect to node3 (Failed to connect to node3 port 2224 after 2005 ms: No route to host), use --skip-offline to override
Warning: Unable to connect to node3 (Failed to connect to node3 port 2224 after 3066 ms: No route to host)
Warning: Unable to determine whether this action will cause a loss of the quorum
Error: Errors have occurred, therefore pcs is unable to continue

通过上面的验证不难发现,使用remove方式删除关机的node3节点时,仍需要去连接node3节点,但node3节点已经关机了,必然连不上node3进而影响到整个删除操作,换句话说,直接使用remove方式必然不可行。
那么,能否在删除关机节点时让集群内部不去连接这个关机节点呢?答案其实是可以的。
再仔细观察上面的错误信息,可以发现一个关键打印信息“use --skip-offline to override”,也就是说,使用remove方式删除节点时可以使用--skip-offline参数来跳过去连接关机的节点的操作,进而保证集群中将该关机节点信息删除并更新集群中所有节点的corosync.conf配置信息。

问题解决

[huai@node0 ~]# pcs cluster node remove node3 --force --skip-offline
Warning: Omitting node 'node3'
Warning: Unable to connect to node3 (Failed to connect to node3 port 2224 after 3130 ms: No route to host)
Warning: Unable to determine whether this action will cause a loss of the quorum
Destroying cluster on hosts: 'node3'...
Warning: Unable to connect to node3 (Failed to connect to node3 port 2224 after 3071 ms: No route to host)
Warning: Removed node 'node3' could not be reached and subsequently deconfigured. Run 'pcs cluster destroy' on the unreachable node.
Sending updated corosync.conf to nodes...
node2: Succeeded
node0: Succeeded
node1: Succeeded
node0: Corosync configuration reloaded

可以看到,通过以上命令可以将node3的相关信息在集群中顺利删除掉。另外,成功删除node3节点后,pacemaker集群状态如下:

[huai@node0 ~]# pcs status
Cluster name: my_cluster
Cluster Summary:
  * Stack: corosync
  * Current DC: node0 (version 2.0.7-1.oe2205-ba59ce7147) - partition with quorum
  * Last updated: Wed Jan 11 16:17:49 2023
  * Last change:  Wed Jan 11 16:17:39 2023 by hacluster via crm_node on node2
  * 3 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ node0 node1 node2 ]

Full List of Resources:
  * Resource Group: skl_data_group:
    * skl_shared       (ocf::heartbeat:Filesystem):     Started node0
	* skl_metadata       (ocf::heartbeat:Filesystem):     Started node0
  * Resource Group: skl_service:
    * skl_mysql        (ocf::heartbeat:mysql):     Started node0
    * skl_tomcat        (ocf::heartbeat:tomcat):         Started node0
    * webip    (ocf::heartbeat:IPaddr2):        Started node0
  * httpd    (lsb:apache2):  Started node0

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

成功删除node3节点后,corosync.conf配置信息如下:

[huai@node0 ~]# cat /etc/corosync/corosync.conf
totem {
    version: 2
    cluster_name: my_cluster
    secauth: on
    transport: knet
    rrp_mode: passive
    crypto_cipher: aes256
    crypto_hash: sha256
}

nodelist {
    node {
        ring0_addr: 192.168.20.5
        ring1_addr: 192.168.21.5
        name: node0
        nodeid: 1
    }

    node {
        ring0_addr: 192.168.20.6
        ring1_addr: 192.168.21.6
        name: node1
        nodeid: 2
    }

    node {
        ring0_addr: 192.168.20.7
        ring1_addr: 192.168.21.7
        name: node2
        nodeid: 3
    }
}

quorum {
    provider: corosync_votequorum
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
    timestamp: on
}
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值