【postgresql】repmgr failover 后

戒掉贪嗔痴(薛双奇)

于 2024-09-02 16:33:38 发布

阅读量64

点赞数 1

分类专栏：数据库运维-PostgreSQL 文章标签： postgresql 数据库

本文链接：https://blog.csdn.net/weixin_43346403/article/details/141823380

版权

数据库运维-PostgreSQL 专栏收录该内容

49 篇文章 5 订阅

订阅专栏

1.启动原来的主库

pg_ctl start  
[pgsql@pg2:/home/pgsql]$repmgr -f /postgresql/app/postgresql/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                      
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
 1  | pg1  | primary | ! running |          | default  | 100      | 4        | host=192.168.1.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 2  | pg2  | primary | * running |          | default  | 100      | 5        | host=192.168.1.11 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 3  | pg3  | standby |   running | pg2      | default  | 100      | 5        | host=192.168.1.12 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 4  | pg4  | witness | * running | pg3      | default  | 0        | n/a      | host=192.168.1.13 port=5432 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - node "pg1" (ID: 1) is running but the repmgr node record is inactive

--由此可见，当前已经发送脑裂。

2.原来的主库重新加入集群

(1)关闭集群 
pg_ctl stop 
(2)加入集群前检查 
repmgr -f /postgresql/app/postgresql/repmgr.conf node rejoin -d 'host=192.168.1.11 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --dry-run --verbose
(3)真实加入集群 
repmgr -f /postgresql/app/postgresql/repmgr.conf node rejoin -d 'host=192.168.1.11 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose

[pgsql@pg1:/home/pgsql]$repmgr -f /postgresql/app/postgresql/repmgr.conf node rejoin -d 'host=192.168.1.11 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind --verbose
NOTICE: using provided configuration file "/postgresql/app/postgresql/repmgr.conf"
NOTICE: rejoin target is node "pg2" (ID: 2)
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 5 forked off current database system timeline 4 before current recovery point 0/59000028
INFO: prerequisites for using pg_rewind are met
INFO: 0 files copied to "/tmp/repmgr-config-archive-pg1"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/postgresql/app/postgresql/bin/pg_rewind -D '/postgresql/data' --source-server='host=192.168.1.11 port=5432 user=repmgr dbname=repmgr connect_timeout=2'"
ERROR: pg_rewind execution failed
DETAIL: pg_rewind: error: could not find common ancestor of the source and target cluster's timelines

pg1的时间线是5,而集群的时间线是4,不匹配。
时间线不同，Pg1无法作为pg2的从库，需要重建。

3.克隆

(1)克隆前检查 
repmgr -h 192.168.1.11 -U repmgr -d repmgr -f /postgresql/app/postgresql/repmgr.conf standby clone --dry-run
(2)克隆 
rm -rf /postgresql/data/* 
repmgr -h 192.168.1.11 -U repmgr -d repmgr -f /postgresql/app/postgresql/repmgr.conf standby clone -F  
[pgsql@pg1:/postgresql/data]$repmgr -h 192.168.1.11 -U repmgr -d repmgr -f /postgresql/app/postgresql/repmgr.conf standby clone -F  
NOTICE: destination directory "/postgresql/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.1.11 user=repmgr dbname=repmgr
DETAIL: current installation size is 795 MB
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: directory "/postgresql/data" exists but is not empty
NOTICE: deleting existing directory "/postgresql/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /postgresql/app/postgresql/bin/pg_basebackup -l "repmgr base backup"  -D /postgresql/data -h 192.168.1.11 -p 5432 -U repmgr -X stream -S repmgr_slot_1 
shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
could not identify current directory: No such file or directory
WARNING:  skipping special file "./.s.PGSQL.5432"
WARNING:  skipping special file "./.s.PGSQL.5432"
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: /postgresql/app/postgresql/bin/pg_ctl start -w -D /postgresql/data
HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record

(3)启动集群 
pg_ctl -D /postgresql/data -m fast start
(4)注册集群
[pgsql@pg1:/postgresql/data]$repmgr -f /postgresql/app/postgresql/repmgr.conf standby register --force
INFO: connecting to local node "pg1" (ID: 1)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "pg1" (ID: 1) successfully registered
[pgsql@pg1:/postgresql/data]$repmgr -f /postgresql/app/postgresql/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                      
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
 1  | pg1  | standby |   running | pg2      | default  | 100      | 5        | host=192.168.1.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 2  | pg2  | primary | * running |          | default  | 100      | 5        | host=192.168.1.11 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 3  | pg3  | standby |   running | pg2      | default  | 100      | 5        | host=192.168.1.12 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 4  | pg4  | witness | * running | pg3      | default  | 0        | n/a      | host=192.168.1.13 port=5432 user=repmgr dbname=repmgr connect_timeout=2



repmgr -f /postgresql/app/postgresql/repmgr.conf -h 192.168.1.11 -U repmgr -d repmgr witness unregister 
repmgr -f /postgresql/app/postgresql/repmgr.conf -h 192.168.1.11 -U repmgr -d repmgr witness register -F

[pgsql@pg4:/home/pgsql]$repmgr -f /postgresql/app/postgresql/repmgr.conf -h 192.168.1.11 -U repmgr -d repmgr witness unregister 
INFO: connecting to node "pg4" (ID: 4)
INFO: unregistering witness node 4
INFO: witness unregistration complete
DETAIL: witness node with ID 4 successfully unregistered
[pgsql@pg4:/home/pgsql]$repmgr -f /postgresql/app/postgresql/repmgr.conf -h 192.168.1.11 -U repmgr -d repmgr witness register -F
INFO: connecting to witness node "pg4" (ID: 4)
INFO: connecting to primary node
INFO: "repmgr" extension is already installed
INFO: witness registration complete
NOTICE: witness node "pg4" (ID: 4) successfully registered

[pgsql@pg4:/home/pgsql]$repmgr -f /postgresql/app/postgresql/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                      
----+------+---------+-----------+----------+----------+----------+----------+-------------------------------------------------------------------------
 1  | pg1  | standby |   running | pg2      | default  | 100      | 5        | host=192.168.1.10 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 2  | pg2  | primary | * running |          | default  | 100      | 5        | host=192.168.1.11 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 3  | pg3  | standby |   running | pg2      | default  | 100      | 5        | host=192.168.1.12 port=5432 user=repmgr dbname=repmgr connect_timeout=2
 4  | pg4  | witness | * running | pg2      | default  | 0        | n/a      | host=192.168.1.13 port=5432 user=repmgr dbname=repmgr connect_timeout=2
[pgsql@pg