四、switchover切换测试:
1. 主节点查看集群状态
[kes86@node1 ~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | location1 | 100 | 7 | | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | running | node1 | location1 | 100 | 7 | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
主节点查看服务状态
[kes86@node1 ~]$ repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
1 | node1 | primary | * running | | running | 2730 | no | n/a
2 | node2 | standby | running | node1 | running | 2017 | no | 0 second(s) ago
2. 备节点查看集群状态
[kes86@node2 ~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | location1 | 100 | 7 | | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | running | node1 | location1 | 100 | 7 | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
备节点查看服务状态
[kes86@node2 ~]$ repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
1 | node1 | primary | * running | | running | 2730 | no | n/a
2 | node2 | standby | running | node1 | running | 2017 | no | 0 second(s) ago
3. 在备节点执行switchover切换命令:
repmgr standby switchover --dry-run
过程中没有有warning/error信息说明可以正常切换
[kes86@node2 ~]$ repmgr standby switchover --dry-run
[NOTICE] checking switchover on node "node2" (ID: 2) in --dry-run mode
[INFO] ES connection to host "node1" succeeded
[INFO] able to execute "repmgr" on remote host "node1"
[INFO] 1 walsenders required, 32 available
[INFO] demotion candidate is able to make replication connection to promotion candidate
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=14
"
[INFO] 14 pending archive files
[INFO] replication lag on this standby is 0 seconds
[INFO] 1 replication slots required, 32 available
[NOTICE] attempting to pause repmgrd on 2 nodes
[INFO] would pause repmgrd on node "node1" (ID: 1)
[INFO] would pause repmgrd on node "node2" (ID: 2)
[NOTICE] local node "node2" (ID: 2) would be promoted to primary; current primary "node1" (ID: 1) would be demoted to standby
[INFO] following shutdown command would be run on node "node1":
"/home/kes86/kes86/cluster/bin/sys_ctl -D '/home/kes86/data' -l /home/kes86/kes86/cluster/bin/logfile -W -m fast stop"
[INFO] parameter "shutdown_check_timeout" is set to 60 seconds
[INFO] prerequisites for executing STANDBY SWITCHOVER are met
正式切换
4. 在备节点执行:repmgr standby switchover
执行过程中会删除掉老主库的复制槽。
[kes86@node2 ~]$ repmgr standby switchover
[NOTICE] executing switchover on node "node2" (ID: 2)
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=14
"
[NOTICE] attempting to pause repmgrd on 2 nodes
[INFO] pausing repmgrd on node "node1" (ID 1)
[INFO] pausing repmgrd on node "node2" (ID 2)
[NOTICE] local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
[NOTICE] stopping current primary node "node1" (ID: 1)
[NOTICE] issuing CHECKPOINT on node "node1" (ID: 1)
[DETAIL] executing server command "/home/kes86/kes86/cluster/bin/sys_ctl -D '/home/kes86/data' -l /home/kes86/kes86/cluster/bin/logfile -W -m fast stop"
[INFO] checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 4 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 5 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 6 of 60 attempts ("shutdown_check_timeout")
[NOTICE] current primary has been cleanly shut down at location 0/D000028
[NOTICE] promoting standby to primary
[DETAIL] promoting server "node2" (ID: 2) using sys_promote()
[NOTICE] waiting for promotion to complete, replay lsn: 0/D0000A0
[INFO] SET synchronous TO "async" on primary host
[NOTICE] STANDBY PROMOTE successful
[DETAIL] server "node2" (ID: 2) was successfully promoted to primary
[NOTICE] issuing CHECKPOINT
[NOTICE] node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby
[NOTICE] switchover was successful
[DETAIL] node "node2" is now primary and node "node1" is attached as standby
[INFO] unpausing repmgrd on node "node1" (ID 1)
[INFO] unpause node "node1" (ID 1) successfully
[INFO] unpausing repmgrd on node "node2" (ID 2)
[INFO] unpause node "node2" (ID 2) successfully
[NOTICE] STANDBY SWITCHOVER has completed successfully
执行成功后,查看集群状态
[kes86@node1 ~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | standby | running | node2 | location1 | 100 | 7 | 0 bytes | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | primary | * running | | location1 | 100 | 8 | | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
继续通过switchover切换回原来的主备角色:
[kes86@node1 ~]$ repmgr standby switchover
[NOTICE] executing switchover on node "node1" (ID: 1)
[INFO] The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=12
"
[NOTICE] attempting to pause repmgrd on 2 nodes
[INFO] pausing repmgrd on node "node1" (ID 1)
[INFO] pausing repmgrd on node "node2" (ID 2)
[NOTICE] local node "node1" (ID: 1) will be promoted to primary; current primary "node2" (ID: 2) will be demoted to standby
[NOTICE] stopping current primary node "node2" (ID: 2)
[NOTICE] issuing CHECKPOINT on node "node2" (ID: 2)
[DETAIL] executing server command "/home/kes86/kes86/cluster/bin/sys_ctl -D '/home/kes86/data' -l /home/kes86/kes86/cluster/bin/logfile -W -m fast stop"
[INFO] checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
[INFO] checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
[NOTICE] current primary has been cleanly shut down at location 0/E000028
[NOTICE] promoting standby to primary
[DETAIL] promoting server "node1" (ID: 1) using sys_promote()
[NOTICE] waiting for promotion to complete, replay lsn: 0/E0000A0
[INFO] SET synchronous TO "async" on primary host
[NOTICE] STANDBY PROMOTE successful
[DETAIL] server "node1" (ID: 1) was successfully promoted to primary
[NOTICE] issuing CHECKPOINT
[NOTICE] node "node1" (ID: 1) promoted to primary, node "node2" (ID: 2) demoted to standby
[NOTICE] switchover was successful
[DETAIL] node "node1" is now primary and node "node2" is attached as standby
[INFO] unpausing repmgrd on node "node1" (ID 1)
[INFO] unpause node "node1" (ID 1) successfully
[INFO] unpausing repmgrd on node "node2" (ID 2)
[INFO] unpause node "node2" (ID 2) successfully
[NOTICE] STANDBY SWITCHOVER has completed successfully
5. 切换成功后查看数据库集群状态:
[kes86@node1 ~]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | location1 | 100 | 9 | | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | running | node1 | location1 | 100 | 8 | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
6. ksql登陆数据库
查看复制槽状态是否正常
[kes86@node1 ~]$ ksql -Usystem -dtest
ksql (V8.0)
输入 "help" 来获取帮助信息.
test=# \x
扩展显示已打开.
test=# select * from pg_replication_slots ;
-[ RECORD 1 ]-------+--------------
slot_name | repmgr_slot_2
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 7121
xmin | 635
catalog_xmin |
restart_lsn | 0/E001BC8
confirmed_flush_lsn |
test=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid | 7121
usesysid | 16384
usename | repmgr
application_name | node2
client_addr | 192.168.57.43
client_hostname |
client_port | 45080
backend_start | 2022-12-21 17:19:52.395706+08
backend_xmin |
state | streaming
sent_lsn | 0/E001BC8
write_lsn | 0/E001BC8
flush_lsn | 0/E001BC8
replay_lsn | 0/E001BC8
write_lag |
flush_lag |
replay_lag |
sync_priority | 1
sync_state | quorum
reply_time | 2022-12-21 17:24:53.675362+08
到此switchover切换测试验证通过。
建议:如果不使用autofailover,建议不要开启kbha服务。只需启动repmgrd守护进程就可以。
启动repmgrd守护进程命令(如果使用sys_monitor.sh管理Kingbase数据库启停,不用额外执行命令启动repmgrd服务,sys_monitor.sh脚本会自动启动repmgrd服务)
repmgrd -d -v -f /home/kes86/kes86/cluster/etc/repmgr.conf
测试使用非root权限sys_monitor.sh脚本启停数据库
--停止数据库
[kes86@node1 ~]$ sys_monitor.sh stop
2022-12-21 17:28:06 Ready to stop all DB ...
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
There is no service "node_export" running currently.
There is no service "postgres_ex" running currently.
2022-12-21 17:28:13 begin to stop repmgrd on "[node1]".
2022-12-21 17:28:18 repmgrd on "[node1]" stop success.
2022-12-21 17:28:18 begin to stop repmgrd on "[node2]".
2022-12-21 17:28:19 repmgrd on "[node2]" stop success.
2022-12-21 17:28:19 begin to stop DB on "[node2]".
waiting for server to shut down.... done
server stopped
2022-12-21 17:28:20 DB on "[node2]" stop success.
2022-12-21 17:28:20 begin to stop DB on "[node1]".
waiting for server to shut down.... done
server stopped
2022-12-21 17:28:34 DB on "[node1]" stop success.
2022-12-21 17:28:34 Done.
--启动数据库
[kes86@node1 ~]$ sys_monitor.sh start
2022-12-21 17:30:29 Ready to start all DB ...
2022-12-21 17:30:29 begin to start DB on "[node1]".
waiting for server to start.... done
server started
2022-12-21 17:30:57 execute to start DB on "[node1]" success, connect to check it.
2022-12-21 17:30:59 DB on "[node1]" start success.
2022-12-21 17:31:00 Try to ping trusted_servers on host node1 ...
2022-12-21 17:31:00 Try to ping trusted_servers on host node2 ...
2022-12-21 17:31:00 begin to start DB on "[node2]".
waiting for server to start.... done
server started
2022-12-21 17:31:02 execute to start DB on "[node2]" success, connect to check it.
2022-12-21 17:31:03 DB on "[node2]" start success.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+-----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | location1 | 100 | 9 | | host=node1 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | running | node1 | location1 | 100 | 9 | 0 bytes | host=node2 user=repmgr password=repmgr dbname=repmgr port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2022-12-21 17:31:04 The primary DB is started.
2022-12-21 17:31:04 begin to start repmgrd on "[node1]".
[2022-12-21 17:31:12] [NOTICE] using provided configuration file "/home/kes86/kes86/cluster/bin/../etc/repmgr.conf"
[2022-12-21 17:31:12] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/hamgr.log"
2022-12-21 17:31:19 repmgrd on "[node1]" start success.
2022-12-21 17:31:19 begin to start repmgrd on "[node2]".
[2022-12-21 17:31:51] [NOTICE] using provided configuration file "/home/kes86/kes86/cluster/bin/../etc/repmgr.conf"
[2022-12-21 17:31:51] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/hamgr.log"
2022-12-21 17:31:22 repmgrd on "[node2]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
1 | node1 | primary | * running | | running | 8637 | no | n/a
2 | node2 | standby | running | node1 | running | 8708 | no | 1 second(s) ago
[2022-12-21 17:31:22] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/kbha.log"
[2022-12-21 17:31:56] [NOTICE] redirecting logging output to "/home/kes86/kes86/cluster/log/kbha.log"
2022-12-21 17:31:26 Done.