一、前提准备,一主一从:
[postgres@fl-prod-pg01 .ssh]$ repmgr -f /home/postgres/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
-----+---------+---------+-----------+----------+-----------+----------+----------+------------------------------------------------------------------------
134 | node134 | primary | * running | | location1 | 100 | 3 | host=10.51.1.134 port=5432 user=repmgr dbname=repmgr connect_timeout=2
135 | node135 | standby | running | node134 | location1 | 100 | 2 | host=10.51.1.135 port=5432 user=repmgr dbname=repmgr connect_timeout=2
二、在从上执行命令,备库主动切换测试:
–siblings-follow 表示从节点或者witness节点也跟随切换
[postgres@fl-prod-pg02 .ssh]$ repmgr -f /home/postgres/repmgr/repmgr.conf standby switchover --dry-run --siblings-follow
NOTICE: checking switchover on node "node135" (ID: 135) in --dry-run mode
INFO: SSH connection to host "10.51.1.134" succeeded
INFO: able to execute "repmgr" on remote host "10.51.1.134"
WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
INFO: 1 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: archive mode is "off"
INFO: replication lag on this standby is 0 seconds
INFO: 1 replication slots required, 10 available
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node135" (ID: 135) would be promoted to primary; current primary "node134" (ID: 134) would be demoted to standby
INFO: following shutdown command would be run on node "node134":
"/home/postgres/pg15.5/bin/pg_ctl -D '/home/postgres/pg15.5/data' -W -m fast stop"
INFO: parameter "shutdown_check_timeout" is set to 10 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
如果没有报错:
执行:
[postgres@fl-prod-pg02 .ssh]$ repmgr -f /home/postgres/repmgr/repmgr.conf standby switchover
NOTICE: executing switchover on node "node135" (ID: 135)
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node135" (ID: 135) will be promoted to primary; current primary "node134" (ID: 134) will be demoted to standby
NOTICE: stopping current primary node "node134" (ID: 134)
NOTICE: issuing CHECKPOINT on node "node134" (ID: 134)
DETAIL: executing server command "/home/postgres/pg15.5/bin/pg_ctl -D '/home/postgres/pg15.5/data' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 10 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 10 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/302FCE8
NOTICE: promoting standby to primary
DETAIL: promoting server "node135" (ID: 135) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node135" (ID: 135) was successfully promoted to primary
NOTICE: node "node135" (ID: 135) promoted to primary, node "node134" (ID: 134) demoted to standby
NOTICE: switchover was successful
DETAIL: node "node135" is now primary and node "node134" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
查看集群状态
134是从,135是主
[postgres@fl-prod-pg02 .ssh]$ repmgr -f /home/postgres/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
-----+---------+---------+-----------+----------+-----------+----------+----------+------------------------------------------------------------------------
134 | node134 | standby | running | node135 | location1 | 100 | 3 | host=10.51.1.134 port=5432 user=repmgr dbname=repmgr connect_timeout=2
135 | node135 | primary | * running | | location1 | 100 | 4 | host=10.51.1.135 port=5432 user=repmgr dbname=repmgr connect_timeout=2
验证主从复制是否正常
在主库(135)上查看
postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 29385
usesysid | 16388
usename | repmgr
application_name | node134
client_addr | 10.51.1.134
client_hostname |
client_port | 32886
backend_start | 2024-03-09 10:27:54.872892+08
backend_xmin |
state | streaming
sent_lsn | 0/3030D10
write_lsn | 0/3030D10
flush_lsn | 0/3030D10
replay_lsn | 0/3030D10
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2024-03-09 10:29:06.111796+08
在standy(134)节点 上查看
postgres=# SELECT * FROM pg_stat_wal_receiver;
-[ RECORD 1 ]---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 6331
status | streaming
receive_start_lsn | 0/3000000
receive_start_tli | 4
written_lsn | 0/3030D10
flushed_lsn | 0/3030D10
received_tli | 4
last_msg_send_time | 2024-03-09 10:29:15.989637+08
last_msg_receipt_time | 2024-03-09 10:29:16.126759+08
latest_end_lsn | 0/3030D10
latest_end_time | 2024-03-09 10:28:15.900222+08
slot_name | repmgr_slot_134
sender_host | 10.51.1.135
sender_port | 5432
conninfo | user=repmgr passfile=/home/postgres/.pgpass channel_binding=prefer connect_timeout=2 dbname=replication host=10.51.1.135 port=5432 application_name=node134 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any
测试主从数据同步是否正常:
在新master(135)操作:
postgres=# create database test_db;
CREATE DATABASE
postgres=# create table test_table (varchar name);
CREATE TABLE
postgres=# insert into test_table values('zhw');
INSERT 0 1
新salve(134)操作:
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges
-----------+----------+----------+-------------+-------------+------------+-----------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
repmgr | repmgr | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
test_db | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
postgres=# select * from test_table;
-[ RECORD 1 ]
varchar | zhw
删除测试数据
在新master(135)操作:
postgres=# drop database test_db;
DROP DATABASE
postgres=# drop table test_table ;
DROP TABLE
测试完毕,切回134 master, 135 salve
新salve(134)操作,备库主动切换测试:
–siblings-follow 表示从节点或者witness节点也跟随切换
[postgres@fl-prod-pg01 .ssh]$ repmgr -f /home/postgres/repmgr/repmgr.conf standby switchover --dry-run --siblings-follow
NOTICE: checking switchover on node "node134" (ID: 134) in --dry-run mode
INFO: SSH connection to host "10.51.1.135" succeeded
INFO: able to execute "repmgr" on remote host "10.51.1.135"
WARNING: option "--sibling-nodes" specified, but no sibling nodes exist
INFO: 1 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: archive mode is "off"
INFO: replication lag on this standby is 0 seconds
INFO: 1 replication slots required, 10 available
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node134" (ID: 134) would be promoted to primary; current primary "node135" (ID: 135) would be demoted to standby
INFO: following shutdown command would be run on node "node135":
"/home/postgres/pg15.5/bin/pg_ctl -D '/home/postgres/pg15.5/data' -W -m fast stop"
INFO: parameter "shutdown_check_timeout" is set to 10 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
如果没有报错:
执行:
[postgres@fl-prod-pg01 .ssh]$ repmgr -f /home/postgres/repmgr/repmgr.conf standby switchover
NOTICE: executing switchover on node "node134" (ID: 134)
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node134" (ID: 134) will be promoted to primary; current primary "node135" (ID: 135) will be demoted to standby
NOTICE: stopping current primary node "node135" (ID: 135)
NOTICE: issuing CHECKPOINT on node "node135" (ID: 135)
DETAIL: executing server command "/home/postgres/pg15.5/bin/pg_ctl -D '/home/postgres/pg15.5/data' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 10 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 10 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/304A1E0
NOTICE: promoting standby to primary
DETAIL: promoting server "node134" (ID: 134) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node134" (ID: 134) was successfully promoted to primary
NOTICE: node "node134" (ID: 134) promoted to primary, node "node135" (ID: 135) demoted to standby
NOTICE: switchover was successful
DETAIL: node "node134" is now primary and node "node135" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
查看集群状态
134是主,135是从
[postgres@fl-prod-pg01 .ssh]$ repmgr -f /home/postgres/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
-----+---------+---------+-----------+----------+-----------+----------+----------+------------------------------------------------------------------------
134 | node134 | primary | * running | | location1 | 100 | 5 | host=10.51.1.134 port=5432 user=repmgr dbname=repmgr connect_timeout=2
135 | node135 | standby | running | node134 | location1 | 100 | 4 | host=10.51.1.135 port=5432 user=repmgr dbname=repmgr connect_timeout=2
验证主从复制是否正常
在主库(134)上查看
postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 7056
usesysid | 16388
usename | repmgr
application_name | node135
client_addr | 10.51.1.135
client_hostname |
client_port | 46624
backend_start | 2024-03-09 10:36:54.837639+08
backend_xmin |
state | streaming
sent_lsn | 0/304B568
write_lsn | 0/304B568
flush_lsn | 0/304B568
replay_lsn | 0/304B568
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2024-03-09 10:37:43.641927+08
在standy(135)节点 上查看
postgres=# SELECT * FROM pg_stat_wal_receiver;
-[ RECORD 1 ]---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 30148
status | streaming
receive_start_lsn | 0/3000000
receive_start_tli | 5
written_lsn | 0/304B568
flushed_lsn | 0/304B568
received_tli | 5
last_msg_send_time | 2024-03-09 10:38:13.822603+08
last_msg_receipt_time | 2024-03-09 10:38:13.685812+08
latest_end_lsn | 0/304B568
latest_end_time | 2024-03-09 10:37:13.735467+08
slot_name | repmgr_slot_135
sender_host | 10.51.1.134
sender_port | 5432
conninfo | user=repmgr passfile=/home/postgres/.pgpass channel_binding=prefer connect_timeout=2 dbname=replication host=10.51.1.134 port=5432 application_name=node135 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any
测试主从数据同步是否正常:
在master(134)操作:
postgres=# create database test_db;
CREATE DATABASE
postgres=# create table test_table (varchar name);
CREATE TABLE
postgres=# insert into test_table values('zhw');
INSERT 0 1
salve(135)操作:
\l
List of databases
Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges
-----------+----------+----------+-------------+-------------+------------+-----------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
repmgr | repmgr | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc | =c/postgres +
| | | | | | | postgres=CTc/postgres
test_db | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | libc |
postgres=# select * from test_table;
-[ RECORD 1 ]
varchar | zhw
删除测试数据
在master(134)操作:
postgres=# drop database test_db;
DROP DATABASE
postgres=# drop table test_table ;
DROP TABLE
在salve(135)上查看:
postgres=# select * from test_table;
ERROR: relation "test_table" does not exist
LINE 1: select * from test_table;
测试完毕,能正常手动切换。