repmgr无法自动故障转移

蜡津

已于 2023-10-24 11:00:39 修改

阅读量198

点赞数

文章标签：数据库 repmgr postgresql 1024程序员节

于 2023-08-18 21:37:20 首次发布

本文链接：https://blog.csdn.net/weixin_43084715/article/details/132369099

版权

停掉主节点，让备节点自动接管

[postgres@db223 ~]$ repmgr -f ~/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------
1 | db223 | primary | * running | | default | 100 | 15 | host=db223 dbname=repmgr user=repmgr password=repmgr connect_timeout=2
2 | db206 | primary | - failed | ? | default | 100 | | host=db206 dbname=repmgr user=repmgr password=repmgr connect_timeout=2

WARNING: following issues were detected
- unable to connect to node "db206" (ID: 2)

HINT: execute with --verbose option to see connection error messages

旧主重新加入集群

[postgres@db206 data]$ repmgr -f ~/repmgr/repmgr.conf node rejoin -d 'host=db223 port=5432 user=repmgr dbname=repmgr password=repmgr' --force-rewind
NOTICE: rejoin target is node "db223" (ID: 1)
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 1
DETAIL: rejoin target server's timeline 15 forked off current database system timeline 14 before current recovery point 120/9B171E00
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/home/postgres/pg14/bin/pg_rewind -D '/home/postgres/pg14/data' --source-server='host=db223 dbname=repmgr user=repmgr password=repmgr connect_timeout=2'"
NOTICE: 0 files copied to /home/postgres/pg14/data
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=db206 dbname=repmgr user=repmgr password=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "/home/postgres/pg14/bin/pg_ctl -w -D '/home/postgres/pg14/data' start"
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1

怎么着都无法自动故障转移，没有别的办法，做了个重做备机好了，好了（？？？？）

[postgres@db206 data]$ repmgr -h db223 -U repmgr -d repmgr -f /home/postgres/repmgr/repmgr.conf standby clone
NOTICE: destination directory "/home/postgres/pg14/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=db223 user=repmgr dbname=repmgr
DETAIL: current installation size is 1752 MB
INFO: replication slot usage not requested; no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/home/postgres/pg14/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
/home/postgres/pg14/bin/pg_basebackup -l "repmgr base backup" -D /home/postgres/pg14/data -h db223 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /home/postgres/pg14/data start
HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record

[postgres@db206 data]$ pg_ctl start
waiting for server to start....2023-08-18 21:25:12.514 CST [11178] LOG: redirecting log output to logging collector process
2023-08-18 21:25:12.514 CST [11178] HINT: Future log output will appear in directory "log".
done
server started

[postgres@db206 data]$ repmgr -f ~/repmgr/repmgr.conf standby register -F
INFO: connecting to local node "db206" (ID: 2)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "db206" (ID: 2) successfully registered
[postgres@db206 data]$ repmgr -f ~/repmgr/repmgr.conf cluster show

无法自动故障转移的原因

集群的暂停和启动
暂停 repmgrd 服务可以在任意一个节点上进行，一般用于数据库维护。在暂停期间，集群处于静止状态，此时停止主库，集群不会自动进行切换。

repmgr -f ~/repmgr/repmgr.conf service pause
NOTICE: node 1 (node1) paused
NOTICE: node 2 (node2) paused

此时检查各个节点服务的状态，可以观察到 Paused 列变为 yes

repmgr -f ~/repmgr/repmgr.conf service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+------+---------+--------------------
1 | node1 | primary | * running | | running | 2133 | yes | n/a
2 | node2 | standby | running | node1 | running | 2088 | yes | 0 second(s) ago

使用下列命令解除暂停状态