金仓数据库KingbaseES V8R6 命令行部署repmgr管理集群+switchover测试（四）switchover切换测试

本文链接：https://blog.csdn.net/arthemis_14/article/details/130807873

四、switchover切换测试：

1. 主节点查看集群状态

[kes86@node1 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location  | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                          
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | location1 | 100      | 7        |         | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | location1 | 100      | 7        | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

主节点查看服务状态

[kes86@node1 ~]$ repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
 1  | node1 | primary | * running |          | running | 2730 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 2017 | no      | 0 second(s) ago

2. 备节点查看集群状态

[kes86@node2 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location  | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                          
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | location1 | 100      | 7        |         | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | location1 | 100      | 7        | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3


备节点查看服务状态

[kes86@node2 ~]$ repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
 1  | node1 | primary | * running |          | running | 2730 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 2017 | no      | 0 second(s) ago

3. 在备节点执行switchover切换命令：

repmgr standby switchover --dry-run
过程中没有有warning/error信息说明可以正常切换

[kes86@node2 ~]$ repmgr standby switchover  --dry-run
[NOTICE] checking switchover on node "node2" (ID: 2) in --dry-run mode
[INFO] ES connection to host "node1" succeeded
[INFO] able to execute "repmgr" on remote host "node1"
[INFO] 1 walsenders required, 32 available
[INFO] demotion candidate is able to make replication connection to promotion candidate
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=14
"
[INFO] 14 pending archive files
[INFO] replication lag on this standby is 0 seconds
[INFO] 1 replication slots required, 32 available
[NOTICE] attempting to pause repmgrd on 2 nodes
[INFO] would pause repmgrd on node "node1" (ID: 1)
[INFO] would pause repmgrd on node "node2" (ID: 2)
[NOTICE] local node "node2" (ID: 2) would be promoted to primary; current primary "node1" (ID: 1) would be demoted to standby
[INFO] following shutdown command would be run on node "node1":
  "/home/kes86/kes86/cluster/bin/sys_ctl  -D '/home/kes86/data' -l /home/kes86/kes86/cluster/bin/logfile -W -m fast stop"
[INFO] parameter "shutdown_check_timeout" is set to 60 seconds
[INFO] prerequisites for executing STANDBY SWITCHOVER are met

正式切换

4. 在备节点执行：repmgr standby switchover

执行过程中会删除掉老主库的复制槽。

[kes86@node2 ~]$ repmgr standby switchover
[NOTICE] executing switchover on node "node2" (ID: 2)
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=14
"
[NOTICE] attempting to pause repmgrd on 2 nodes
[INFO] pausing repmgrd on node "node1" (ID 1)
[INFO] pausing repmgrd on node "node2" (ID 2)
[NOTICE] local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
[NOTICE] stopping current primary node "node1" (ID: 1)
[NOTICE] issuing CHECKPOINT on node "node1" (ID: 1) 
[DETAIL] executing server command "/home/kes86/kes86/cluster/bin/sys_ctl  -D '/home/kes86/data' -l /home/kes86/kes86/cluster/bin/logfile -W -m fast stop"
[INFO] checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 4 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 5 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 6 of 60 attempts ("shutdown_check_timeout")
[NOTICE] current primary has been cleanly shut down at location 0/D000028
[NOTICE] promoting standby to primary
[DETAIL] promoting server "node2" (ID: 2) using sys_promote()
[NOTICE] waiting for promotion to complete, replay lsn: 0/D0000A0
[INFO] SET synchronous TO "async" on primary host 
[NOTICE] STANDBY PROMOTE successful
[DETAIL] server "node2" (ID: 2) was successfully promoted to primary
[NOTICE] issuing CHECKPOINT
[NOTICE] node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby
[NOTICE] switchover was successful
[DETAIL] node "node2" is now primary and node "node1" is attached as standby
[INFO] unpausing repmgrd on node "node1" (ID 1)
[INFO] unpause node "node1" (ID 1) successfully
[INFO] unpausing repmgrd on node "node2" (ID 2)
[INFO] unpause node "node2" (ID 2) successfully
[NOTICE] STANDBY SWITCHOVER has completed successfully

执行成功后，查看集群状态

[kes86@node1 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location  | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                          
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | standby |   running | node2    | location1 | 100      | 7        | 0 bytes | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | primary | * running |          | location1 | 100      | 8        |         | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

继续通过switchover切换回原来的主备角色：

[kes86@node1 ~]$ repmgr standby switchover
[NOTICE] executing switchover on node "node1" (ID: 1)
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=12
"
[NOTICE] attempting to pause repmgrd on 2 nodes
[INFO] pausing repmgrd on node "node1" (ID 1)
[INFO] pausing repmgrd on node "node2" (ID 2)
[NOTICE] local node "node1" (ID: 1) will be promoted to primary; current primary "node2" (ID: 2) will be demoted to standby
[NOTICE] stopping current primary node "node2" (ID: 2)
[NOTICE] issuing CHECKPOINT on node "node2" (ID: 2) 
[DETAIL] executing server command "/home/kes86/kes86/cluster/bin/sys_ctl  -D '/home/kes86/data' -l /home/kes86/kes86/cluster/bin/logfile -W -m fast stop"
[INFO] checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
[NOTICE] current primary has been cleanly shut down at location 0/E000028
[NOTICE] promoting standby to primary
[DETAIL] promoting server "node1" (ID: 1) using sys_promote()
[NOTICE] waiting for promotion to complete, replay lsn: 0/E0000A0
[INFO] SET synchronous TO "async" on primary host 
[NOTICE] STANDBY PROMOTE successful
[DETAIL] server "node1" (ID: 1) was successfully promoted to primary
[NOTICE] issuing CHECKPOINT
[NOTICE] node "node1" (ID: 1) promoted to primary, node "node2" (ID: 2) demoted to standby
[NOTICE] switchover was successful
[DETAIL] node "node1" is now primary and node "node2" is attached as standby
[INFO] unpausing repmgrd on node "node1" (ID 1)
[INFO] unpause node "node1" (ID 1) successfully
[INFO] unpausing repmgrd on node "node2" (ID 2)
[INFO] unpause node "node2" (ID 2) successfully
[NOTICE] STANDBY SWITCHOVER has completed successfully

5. 切换成功后查看数据库集群状态：

[kes86@node1 ~]$ repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location  | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                          
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | location1 | 100      | 9        |         | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | location1 | 100      | 8        | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

6. ksql登陆数据库

查看复制槽状态是否正常

[kes86@node1 ~]$ ksql -Usystem -dtest
ksql (V8.0)
输入 "help" 来获取帮助信息.

test=# \x
扩展显示已打开.
test=# select * from pg_replication_slots ;
-[ RECORD 1 ]-------+--------------
slot_name           | repmgr_slot_2
plugin              | 
slot_type           | physical
datoid              | 
database            | 
temporary           | f
active              | t
active_pid          | 7121
xmin                | 635
catalog_xmin        | 
restart_lsn         | 0/E001BC8
confirmed_flush_lsn | 

test=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid              | 7121
usesysid         | 16384
usename          | repmgr
application_name | node2
client_addr      | 192.168.57.43
client_hostname  | 
client_port      | 45080
backend_start    | 2022-12-21 17:19:52.395706+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/E001BC8
write_lsn        | 0/E001BC8
flush_lsn        | 0/E001BC8
replay_lsn       | 0/E001BC8
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 1
sync_state       | quorum
reply_time       | 2022-12-21 17:24:53.675362+08

到此switchover切换测试验证通过。

建议：如果不使用autofailover，建议不要开启kbha服务。只需启动repmgrd守护进程就可以。

启动repmgrd守护进程命令(如果使用sys_monitor.sh管理Kingbase数据库启停,不用额外执行命令启动repmgrd服务,sys_monitor.sh脚本会自动启动repmgrd服务)

repmgrd -d -v -f /home/kes86/kes86/cluster/etc/repmgr.conf

测试使用非root权限sys_monitor.sh脚本启停数据库

--停止数据库

[kes86@node1 ~]$ sys_monitor.sh stop
2022-12-21 17:28:06 Ready to stop all DB ...
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
2022-12-21 17:28:13 begin to stop repmgrd on "[node1]".
2022-12-21 17:28:18 repmgrd on "[node1]" stop success.
2022-12-21 17:28:18 begin to stop repmgrd on "[node2]".
2022-12-21 17:28:19 repmgrd on "[node2]" stop success.
2022-12-21 17:28:19 begin to stop DB on "[node2]".
waiting for server to shut down.... done
server stopped
2022-12-21 17:28:20 DB on "[node2]" stop success.
2022-12-21 17:28:20 begin to stop DB on "[node1]".
waiting for server to shut down.... done
server stopped
2022-12-21 17:28:34 DB on "[node1]" stop success.
2022-12-21 17:28:34 Done. 

--启动数据库

[kes86@node1 ~]$ sys_monitor.sh start
2022-12-21 17:30:29 Ready to start all DB ...
2022-12-21 17:30:29 begin to start DB on "[node1]".
waiting for server to start.... done
server started
2022-12-21 17:30:57 execute to start DB on "[node1]" success, connect to check it.
2022-12-21 17:30:59 DB on "[node1]" start success.
2022-12-21 17:31:00 Try to ping trusted_servers on host node1 ...
2022-12-21 17:31:00 Try to ping trusted_servers on host node2 ...
2022-12-21 17:31:00 begin to start DB on "[node2]".
waiting for server to start.... done
server started
2022-12-21 17:31:02 execute to start DB on "[node2]" success, connect to check it.
2022-12-21 17:31:03 DB on "[node2]" start success.
 ID | Name  | Role    | Status    | Upstream | Location  | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                          
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | location1 | 100      | 9        |         | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node2 | standby |   running | node1    | location1 | 100      | 9        | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2022-12-21 17:31:04 The primary DB is started.
2022-12-21 17:31:04 begin to start repmgrd on "[node1]".
[2022-12-21 17:31:12] [NOTICE] using provided configuration file "/home/kes86/kes86/cluster/bin/../etc/repmgr.conf"
[2022-12-21 17:31:12] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/hamgr.log"

2022-12-21 17:31:19 repmgrd on "[node1]" start success.
2022-12-21 17:31:19 begin to start repmgrd on "[node2]".
[2022-12-21 17:31:51] [NOTICE] using provided configuration file "/home/kes86/kes86/cluster/bin/../etc/repmgr.conf"
[2022-12-21 17:31:51] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/hamgr.log"

2022-12-21 17:31:22 repmgrd on "[node2]" start success.
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
 1  | node1 | primary | * running |          | running | 8637 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 8708 | no      | 1 second(s) ago    
[2022-12-21 17:31:22] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/kbha.log"

[2022-12-21 17:31:56] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/kbha.log"

2022-12-21 17:31:26 Done.

更多信息，参见https://help.kingbase.com.cn/v8/index.html