KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例

案例说明:
在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作。若主节点已经失效,希望将异地备机提升为主节点。$bin/repmgr standby promote

适用版本: KingbaseES V8R6

集群节点信息:

ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node101 | standby |   running | node102  | running | 19312 | no      | 1 second(s) ago
 2  | node102 | primary | * running |          | running | 20658 | no      | n/a

主备流复制状态信息:

test=# select * from sys_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   |  sent_lsn  | write_lsn  | flush_lsn  | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |          reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--
 20165 |       10 | system  | node101          | 192.168.1.101 |                 |       10747 | 2022-09-08 11:12:59.798843+08 |            | streaming | 4/4C0018A0 | 4/4C0018A0 | 4/4C0018A0 | 4/4C0018A0 |           |           |            |             1 | sync       | 2022-09-08 11:16:45.423742+08
(1 row)

关闭failover自动切换:

[kingbase@node102 bin]$ cat ../etc/repmgr.conf|grep failover
#failover='automatic'
failover='manual'

一、模拟主库数据库服务宕机

[kingbase@node102 bin]$ ./sys_ctl stop -D /data/kingbase/r6ha/data
waiting for server to shut down....... done

二、查看备库状态1、数据库进程状态(仍为备库进程)

kingbase 19132     1  0 11:13 ?        00:00:00 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kingbase -D /data/kingbase/r6ha/data
kingbase 19133 19132  0 11:13 ?        00:00:00 kingbase: logger
kingbase 19134 19132  0 11:13 ?        00:00:00 kingbase: startup   recovering 00000018000000040000004D
kingbase 19135 19132  0 11:13 ?        00:00:00 kingbase: checkpointer
kingbase 19136 19132  0 11:13 ?        00:00:00 kingbase: background writer
kingbase 19137 19132  0 11:13 ?        00:00:00 kingbase: stats collector
kingbase 19310 19132  0 11:13 ?        00:00:00 kingbase: system esrep 192.168.1.101(15211) idle

2、查看备库hamgr.log(获取切换过程)

# 备库repmgrd进程监控主库数据库服务状态(PQping()),当主库返回"PQPING_NO_RESPONSE"后,尝试
再次连接主库,超过阈值后,执行切换。

[2022-09-08 11:18:10] [INFO] sleeping 6 seconds until next reconnection attempt
[2022-09-08 11:18:16] [INFO] checking state of node 2, 1 of 10 attempts
[2022-09-08 11:18:16] [DEBUG] is_server_available_params(): ping status for "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr" is PQPING_NO_RESPONSE
[2022-09-08 11:18:16] [WARNING] unable to ping "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
[2022-09-08 11:18:16] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2022-09-08 11:18:16] [INFO] sleeping 6 seconds until next reconnection attempt
........

[2022-09-08 11:19:10] [INFO] checking state of node 2, 10 of 10 attempts
[2022-09-08 11:19:10] [DEBUG] is_server_available_params(): ping status for "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr" is PQPING_NO_RESPONSE
.......

# 准备执行切换,但是在repmgr.conf中配置failover=‘manual’,将不会执行自动切换。

[2022-09-08 11:19:10] [DEBUG] do_election(): electoral term is 1
[2022-09-08 11:19:10] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate, and will not follow the new primary
[2022-09-08 11:19:10] [DETAIL] "failover" is set to "manual" in repmgr.conf
[2022-09-08 11:19:10] [HINT] manually execute "repmgr standby follow" to have this node follow the new primary
[2022-09-08 11:19:10] [DEBUG] election result: NOT CANDIDATE

三、在备库执行promote手工切换

1、执行手工切换

[kingbase@node101 bin]$ ./repmgr standby promote
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
NOTICE: promoting standby to primary
DETAIL: promoting server "node101" (ID: 1) using sys_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
DEBUG: setting node 1 as primary and marking existing primary as failed
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node101" (ID: 1) was successfully promoted to primary

2、查看切换后数据库进程状态(切换为主库状态)

[kingbase@node101 bin]$ ps -ef |grep kingbase

kingbase 19132     1  0 11:13 ?        00:00:00 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kingbase -D /data/kingbase/r6ha/data
kingbase 19133 19132  0 11:13 ?        00:00:00 kingbase: logger
kingbase 19135 19132  0 11:13 ?        00:00:00 kingbase: checkpointer
kingbase 19136 19132  0 11:13 ?        00:00:00 kingbase: background writer
kingbase 19137 19132  0 11:13 ?        00:00:00 kingbase: stats collector
kingbase 19780 19132  0 11:13 ?        00:00:00 kingbase: system test ::1(33354) idle
kingbase 19784 19132  0 11:13 ?        00:00:00 kingbase: system esrep 192.168.1.101(15243) idle
kingbase 20826 19132  0 11:20 ?        00:00:00 kingbase: walwriter
kingbase 20827 19132  0 11:20 ?        00:00:00 kingbase: autovacuum launcher
kingbase 20828 19132  1 11:20 ?        00:00:00 kingbase: archiver   archiving 0000001500000002000000FC

四、将原主库恢复为新备库加入集群

1、创建备库标识文件[kingbase@node102 log]$ touch /data/kingbase/r6ha/data/standby.signal

2、启动数据库服务[kingbase@node102 bin]$ ./sys_ctl start -D /data/kingbase/r6ha/data

3、注册备库节点

[kingbase@node102 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node102" (ID: 2)
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
INFO: connecting to primary database
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
DEBUG: remote_command():
  ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A updateinfo
INFO: standby registration complete
NOTICE: standby node "node102" (ID: 2) successfully registered

4、查看集群节点状态

[kingbase@node102 bin]$ ./repmgr cluster show

 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                               
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node101 | primary | * running |          | default  | 100      | 25       | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node102 | standby |   running | node101  | default  | 100      | 24       | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

五、总结
对于KingbaseES V8R6集群在failover切换时,如果备库不能自动切换为主库,或主库宕机后切换失败,都可以使用‘repmgr standby promote’强制手工切换备库为主库,恢复业务访问。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值