PostgreSQL高可用之repmgr

作者:XuYuchong

软件版本:

PostgreSQL 13.4

repmgr 5.3

数据库安装规划:

node_id

node_name

port

data_directory

1

node1

5432

/data/n001

primary

2

node2

5433

/data/n002

standby

3

node3

5434

/data/n003

standby

01,初始化数据库,并修改数据库参数,启动数据库

#初始化

initdb -D /data/n001

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".

The default database encoding has accordingly been set to "UTF8".

The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /data/n001 ... ok

creating subdirectories ... ok

selecting dynamic shared memory implementation ... posix

selecting default max_connections ... 100

selecting default shared_buffers ... 128MB

selecting default time zone ... Asia/Shanghai

creating configuration files ... ok

running bootstrap script ... ok

performing post-bootstrap initialization ... ok

syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections

You can change this by editing pg_hba.conf or using the option -A, or

--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /data/n001 -l logfile start

#参数修改

wal_keep_size=10GB

max_wal_size = 10GB

min_wal_size = 8000MB

max_wal_senders = 10

max_replication_slots = 10

hot_standby = on

archive_mode = on

archive_command = '/bin/true'

shared_preload_libraries ='repmgr'

wal_log_hints=on

show wal_keep_size;

show max_wal_size;

show min_wal_size;

show max_wal_senders;

show max_replication_slots;

show hot_standby;

show archive_mode;

show archive_command;

show shared_preload_libraries;

show wal_log_hints;

#启动数据库

-bash-4.2$ pg_ctl -D /data/n001/ start

waiting for server to start....2021-10-28 10:58:44.526 CST [31597] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

2021-10-28 10:58:44.526 CST [31597] LOG:  listening on IPv6 address "::1", port 5432

2021-10-28 10:58:44.526 CST [31597] LOG:  listening on IPv4 address "127.0.0.1", port 5432

2021-10-28 10:58:44.547 CST [31597] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"

2021-10-28 10:58:44.571 CST [31604] LOG:  database system was shut down at 2021-10-28 10:52:48 CST

2021-10-28 10:58:44.602 CST [31597] LOG:  database system is ready to accept connections

done

server started

02,初始化repmgr数据库和用户

#Create a dedicated PostgreSQL superuser account and a database for the repmgr metadat.

createuser -s repmgr

createdb repmgr -O repmgr

03,编辑repmgr配置文件

::::::::::::::

node1.conf

::::::::::::::

node_id=1

node_name='node1'

conninfo='host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/n001'

pg_bindir='/usr/local/postgresql/bin'

::::::::::::::

node2.conf

::::::::::::::

node_id=2

node_name='node2'

conninfo='host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/n002'

pg_bindir='/usr/local/postgresql/bin'

::::::::::::::

node3.conf

::::::::::::::

node_id=3

node_name='node3'

conninfo='host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/data/n003'

pg_bindir='/usr/local/postgresql/bin'

04,注册node01主数据库

-bash-4.2$ repmgr -f /data/node1.conf  primary register

INFO: connecting to primary database...

NOTICE: attempting to install extension "repmgr"

NOTICE: "repmgr" extension successfully installed

NOTICE: primary node record (ID: 1) registered

-bash-4.2$ repmgr -f /data/node1.conf  cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

05,克隆node02,node03备数据库

克隆命令:

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone --dry-run

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node3.conf standby clone --dry-run

repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node3.conf standby clone

克隆日志:

-bash-4.2$ repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone --dry-run

NOTICE: destination directory "/data/n002" provided

INFO: connecting to source node

DETAIL: connection string is: host=127.0.0.1 user=repmgr dbname=repmgr

DETAIL: current installation size is 31 MB

INFO: "repmgr" extension is installed in database "repmgr"

INFO: replication slot usage not requested;  no replication slot will be set up for this standby

INFO: parameter "max_wal_senders" set to 10

NOTICE: checking for available walsenders on the source node (2 required)

INFO: sufficient walsenders available on the source node

DETAIL: 2 required, 10 available

NOTICE: checking replication connections can be made to the source server (2 required)

INFO: required number of replication connections could be made to the source server

DETAIL: 2 replication connections required

NOTICE: standby will attach to upstream node 1

HINT: consider using the -c/--fast-checkpoint option

INFO: would execute:

  /usr/local/postgresql/bin/pg_basebackup -l "repmgr base backup"  -D /data/n002 -h 127.0.0.1 -p 5432 -U repmgr -X stream

INFO: all prerequisites for "standby clone" are met

-bash-4.2$ repmgr -h 127.0.0.1 -U repmgr -d repmgr -f /data/node2.conf standby clone

NOTICE: destination directory "/data/n002" provided

INFO: connecting to source node

DETAIL: connection string is: host=127.0.0.1 user=repmgr dbname=repmgr

DETAIL: current installation size is 31 MB

INFO: replication slot usage not requested;  no replication slot will be set up for this standby

NOTICE: checking for available walsenders on the source node (2 required)

NOTICE: checking replication connections can be made to the source server (2 required)

INFO: creating directory "/data/n002"...

NOTICE: starting backup (using pg_basebackup)...

HINT: this may take some time; consider using the -c/--fast-checkpoint option

INFO: executing:

  /usr/local/postgresql/bin/pg_basebackup -l "repmgr base backup"  -D /data/n002 -h 127.0.0.1 -p 5432 -U repmgr -X stream

NOTICE: standby clone (using pg_basebackup) complete

NOTICE: you can now start your PostgreSQL server

HINT: for example: pg_ctl -D /data/n002 start

HINT: after starting the server, you need to register this standby with "repmgr standby register"

修改node02,node03端口

sed -i "s/#port = 5432/port = 5433/g" /data/n002/postgresql.conf

sed -i "s/#port = 5432/port = 5434/g" /data/n003/postgresql.conf

-bash-4.2$ cat /data/n002/postgresql.conf |grep port

port = 5433                             # (change requires restart)

#ssl_passphrase_command_supports_reload = off

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        #   %r = remote host and port

-bash-4.2$ cat /data/n003/postgresql.conf |grep port

port = 5434                             # (change requires restart)

#ssl_passphrase_command_supports_reload = off

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        # supported by the operating system:

                                        #   %r = remote host and port

启动数据库并注册:

pg_ctl -D /data/n002 start

pg_ctl -D /data/n003 start

repmgr -f /data/node2.conf standby register

repmgr -f /data/node3.conf standby register

-bash-4.2$ pg_ctl -D /data/n002 start

waiting for server to start....2021-10-28 11:13:27.211 CST [5683] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

2021-10-28 11:13:27.212 CST [5683] LOG:  listening on IPv6 address "::1", port 5433

2021-10-28 11:13:27.212 CST [5683] LOG:  listening on IPv4 address "127.0.0.1", port 5433

2021-10-28 11:13:27.238 CST [5683] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"

2021-10-28 11:13:27.271 CST [5684] LOG:  database system was interrupted; last known up at 2021-10-28 11:09:14 CST

..2021-10-28 11:13:29.396 CST [5684] LOG:  entering standby mode

2021-10-28 11:13:29.418 CST [5684] LOG:  redo starts at 0/2000028

2021-10-28 11:13:29.440 CST [5684] LOG:  consistent recovery state reached at 0/2000138

2021-10-28 11:13:29.440 CST [5683] LOG:  database system is ready to accept read only connections

2021-10-28 11:13:29.482 CST [5718] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1

done

server started

-bash-4.2$ pg_ctl -D /data/n003 start

waiting for server to start....2021-10-28 11:13:33.327 CST [5734] LOG:  starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

2021-10-28 11:13:33.328 CST [5734] LOG:  listening on IPv6 address "::1", port 5434

2021-10-28 11:13:33.328 CST [5734] LOG:  listening on IPv4 address "127.0.0.1", port 5434

2021-10-28 11:13:33.350 CST [5734] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5434"

2021-10-28 11:13:33.397 CST [5736] LOG:  database system was interrupted; last known up at 2021-10-28 11:11:11 CST

.2021-10-28 11:13:35.226 CST [5736] LOG:  entering standby mode

.2021-10-28 11:13:35.258 CST [5736] LOG:  redo starts at 0/4000028

2021-10-28 11:13:35.271 CST [5736] LOG:  consistent recovery state reached at 0/4000100

2021-10-28 11:13:35.271 CST [5734] LOG:  database system is ready to accept read only connections

2021-10-28 11:13:35.285 CST [5764] LOG:  started streaming WAL from primary at 0/5000000 on timeline 1

done

server started

-bash-4.2$ repmgr -f /data/node2.conf standby register

INFO: connecting to local node "node2" (ID: 2)

INFO: connecting to primary database

WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)

INFO: standby registration complete

NOTICE: standby node "node2" (ID: 2) successfully registered

-bash-4.2$ repmgr -f /data/node3.conf standby register

INFO: connecting to local node "node3" (ID: 3)

INFO: connecting to primary database

WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)

INFO: standby registration complete

NOTICE: standby node "node3" (ID: 3) successfully registered

-bash-4.2$ repmgr -f /data/node3.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | standby |   running | node1    | default  | 100      | 1        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 1        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

06,switchover with repmgr

#切换执行需要配置ssh等效性。(SSH)

#在standby上执行,切换node2为primary

#check:

repmgr standby switchover -f /data/node2.conf --siblings-follow --dry-run

#执行切换:

repmgr standby switchover -f /data/node2.conf

#备机指向新的primary:

repmgr  -f /data/node3.conf standby follow

#切换主数据库为node2:

-bash-4.2$ repmgr standby switchover -f /data/node2.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node2" (ID: 2) in --dry-run mode

INFO: SSH connection to host "127.0.0.1" succeeded

INFO: able to execute "repmgr" on remote host "127.0.0.1"

INFO: all sibling nodes are reachable via SSH

INFO: 2 walsenders required, 10 available

INFO: demotion candidate is able to make replication connection to promotion candidate

INFO: 0 pending archive files

INFO: replication lag on this standby is 0 seconds

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node2" (ID: 2) would be promoted to primary; current primary "node1" (ID: 1) would be demoted to standby

INFO: following shutdown command would be run on node "node1":

  "/usr/local/postgresql/bin/pg_ctl  -D '/data/n001' -W -m fast stop"

INFO: parameter "shutdown_check_timeout" is set to 60 seconds

INFO: prerequisites for executing STANDBY SWITCHOVER are met

-bash-4.2$ repmgr standby switchover -f /data/node2.conf

NOTICE: executing switchover on node "node2" (ID: 2)

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node3 (node ID: 3)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby

NOTICE: stopping current primary node "node1" (ID: 1)

NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n001' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/6000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node2" (ID: 2) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node2" (ID: 2) was successfully promoted to primary

NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby

NOTICE: switchover was successful

DETAIL: node "node2" is now primary and node "node1" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

-bash-4.2$ repmgr standby switchover -f /data/node2.conf

NOTICE: executing switchover on node "node2" (ID: 2)

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node3 (node ID: 3)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby

NOTICE: stopping current primary node "node1" (ID: 1)

NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n001' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/6000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node2" (ID: 2) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node2" (ID: 2) was successfully promoted to primary

NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby

NOTICE: switchover was successful

DETAIL: node "node2" is now primary and node "node1" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

#可以发现node2已经变为primary

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | primary | * running |          | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 1        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#切换node3从node2同步数据:执行新的primary

-bash-4.2$ cat n003/postgresql.auto.conf

# Do not edit this file manually!

# It will be overwritten by the ALTER SYSTEM command.

primary_conninfo = 'host=127.0.0.1 user=repmgr application_name=node3 connect_timeout=2'

-bash-4.2$ repmgr  -f /data/node3.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: timelines are same, this server is not ahead

DETAIL: local node lsn is 0/6000A28, follow target lsn is 0/6000A28

NOTICE: setting node 3's upstream to node 2

WARNING: node "node3" not found in "pg_stat_replication"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node2" (ID: 2)

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 1        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | primary | * running |          | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#改名了修改了primary_conninfo信息,指向新的primary数据库。

-bash-4.2$ cat n003/postgresql.auto.conf

# Do not edit this file manually!

# It will be overwritten by the ALTER SYSTEM command.

primary_conninfo = 'user=repmgr connect_timeout=2 host=127.0.0.1 port=5433 application_name=node3'

#切换主数据库为node3:

#check:

repmgr standby switchover -f /data/node3.conf --siblings-follow --dry-run

#执行切换:

repmgr standby switchover -f /data/node3.conf

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | primary | * running |          | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

-bash-4.2$ repmgr standby switchover -f /data/node3.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node3" (ID: 3) in --dry-run mode

INFO: SSH connection to host "127.0.0.1" succeeded

INFO: able to execute "repmgr" on remote host "127.0.0.1"

INFO: all sibling nodes are reachable via SSH

INFO: 2 walsenders required, 10 available

INFO: demotion candidate is able to make replication connection to promotion candidate

INFO: 0 pending archive files

INFO: replication lag on this standby is 0 seconds

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node3" (ID: 3) would be promoted to primary; current primary "node2" (ID: 2) would be demoted to standby

INFO: following shutdown command would be run on node "node2":

  "/usr/local/postgresql/bin/pg_ctl  -D '/data/n002' -W -m fast stop"

INFO: parameter "shutdown_check_timeout" is set to 60 seconds

INFO: prerequisites for executing STANDBY SWITCHOVER are met

-bash-4.2$ repmgr standby switchover -f /data/node3.conf

NOTICE: executing switchover on node "node3" (ID: 3)

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node1 (node ID: 1)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node3" (ID: 3) will be promoted to primary; current primary "node2" (ID: 2) will be demoted to standby

NOTICE: stopping current primary node "node2" (ID: 2)

NOTICE: issuing CHECKPOINT on node "node2" (ID: 2)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n002' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/7000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node3" (ID: 3) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node3" (ID: 3) was successfully promoted to primary

NOTICE: node "node3" (ID: 3) promoted to primary, node "node2" (ID: 2) demoted to standby

NOTICE: switchover was successful

DETAIL: node "node3" is now primary and node "node2" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | standby |   running | node3    | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | primary | * running |          | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#切换主数据库为node1:

#candidate的数据库不能直接切换为主数据库:

repmgr  -f /data/node1.conf standby follow

#check:

repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run

#执行切换:

repmgr standby switchover -f /data/node1.conf

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2

2  | node2 | standby |   running | node3    | default  | 100      | 2        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | primary | * running |          | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#node1为级联备库,不能直接提升为主数据库。

-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node1" (ID: 1) in --dry-run mode

ERROR: local node "node1" (ID: 1) is not a downstream of demotion candidate primary "node3" (ID: 3)

DETAIL: registered upstream node ID is 2

HINT: execute "repmgr standby register --force" to update the local node's metadata

#修改node1为node3的从库才可以正常切换为主数据库。

-bash-4.2$ repmgr  -f /data/node1.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: timelines are same, this server is not ahead

DETAIL: local node lsn is 0/7000E30, follow target lsn is 0/7000E30

NOTICE: setting node 1's upstream to node 3

WARNING: node "node1" not found in "pg_stat_replication"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node3" (ID: 3)

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node3    | default  | 100      | 3        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | standby |   running | node3    | default  | 100      | 3        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | primary | * running |          | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow --dry-run

NOTICE: checking switchover on node "node1" (ID: 1) in --dry-run mode

ERROR: local node "node1" (ID: 1) is not a downstream of demotion candidate primary "node3" (ID: 3)

DETAIL: registered upstream node ID is 2

HINT: execute "repmgr standby register --force" to update the local node's metadata

-bash-4.2$ repmgr  -f /data/node1.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: timelines are same, this server is not ahead

DETAIL: local node lsn is 0/7000E30, follow target lsn is 0/7000E30

NOTICE: setting node 1's upstream to node 3

WARNING: node "node1" not found in "pg_stat_replication"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node3" (ID: 3)

#--siblings-follow更从库指向新的主数据库

-bash-4.2$ repmgr standby switchover -f /data/node1.conf --siblings-follow

NOTICE: executing switchover on node "node1" (ID: 1)

NOTICE: attempting to pause repmgrd on 3 nodes

NOTICE: local node "node1" (ID: 1) will be promoted to primary; current primary "node3" (ID: 3) will be demoted to standby

NOTICE: stopping current primary node "node3" (ID: 3)

NOTICE: issuing CHECKPOINT on node "node3" (ID: 3)

DETAIL: executing server command "/usr/local/postgresql/bin/pg_ctl  -D '/data/n003' -W -m fast stop"

INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")

INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")

NOTICE: current primary has been cleanly shut down at location 0/8000028

NOTICE: promoting standby to primary

DETAIL: promoting server "node1" (ID: 1) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node1" (ID: 1) was successfully promoted to primary

NOTICE: node "node1" (ID: 1) promoted to primary, node "node3" (ID: 3) demoted to standby

NOTICE: executing STANDBY FOLLOW on 1 of 1 siblings

INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes

NOTICE: switchover was successful

DETAIL: node "node1" is now primary and node "node3" is attached as standby

NOTICE: STANDBY SWITCHOVER has completed successfully

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 4        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

07.failover(Promoting  a standby server with repmgr)

#检查数据库状态

repmgr  -f /data/node1.conf cluster show

-bash-4.2$ repmgr  -f /data/node1.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 4        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node1    | default  | 100      | 3        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

#停止主数据库

-bash-4.2$ pg_ctl -D /data/n001/ stop

waiting for server to shut down..... done

server stopped

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                           

----+-------+---------+---------------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | ? unreachable | ?        | default  | 100      |          | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2  

2  | node2 | standby |   running     | ? node1  | default  | 100      | 4        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running     | ? node1  | default  | 100      | 4        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

  - unable to connect to node "node1" (ID: 1)

  - node "node1" (ID: 1) is registered as an active primary but is unreachable

  - unable to connect to node "node2" (ID: 2)'s upstream node "node1" (ID: 1)

  - unable to determine if node "node2" (ID: 2) is attached to its upstream node "node1" (ID: 1)

  - unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)

  - unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages+

# node2 提升为主数据库

-bash-4.2$ repmgr  -f /data/node2.conf standby promote

WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified

DETAIL: these nodes will remain attached to the current primary:

  node3 (node ID: 3)

NOTICE: promoting standby to primary

DETAIL: promoting server "node2" (ID: 2) using pg_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node2" (ID: 2) was successfully promoted to primary

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | - failed  | ?        | default  | 100      |          | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | primary | * running |          | default  | 100      | 5        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | ? node1  | default  | 100      | 4        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

  - unable to connect to node "node1" (ID: 1)

  - unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)

  - unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

#node3 更改主库为node2

-bash-4.2$ repmgr  -f /data/node3.conf standby follow

NOTICE: attempting to find and follow current primary

INFO: local node 3 can attach to follow target node 2

DETAIL: local node's recovery point: 0/90000A0; follow target node's fork point: 0/90000A0

NOTICE: setting node 3's upstream to node 2

WARNING: node "node3" attached in state "startup"

NOTICE: STANDBY FOLLOW successful

DETAIL: standby attached to upstream node "node2" (ID: 2)

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | primary | - failed  | ?        | default  | 100      |          | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | primary | * running |          | default  | 100      | 5        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 4        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected

  - unable to connect to node "node1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

#修复node1 ,重新加入集群

#pg_rewind requires that either wal_log_hints is enabled。

repmgr node rejoin -f /etc/repmgr.conf -d 'host=node1 dbname=repmgr user=repmgr' --force-rewind --verbose --dry-run

-bash-4.2$ repmgr -f /data/node1.conf node rejoin -d 'host=127.0.0.1 port=5433 user=repmgr dbname=repmgr'

NOTICE: rejoin target is node "node2" (ID: 2)

INFO: local node 1 can attach to rejoin target node 2

DETAIL: local node's recovery point: 0/9000028; rejoin target node's fork point: 0/90000A0

NOTICE: setting node 1's upstream to node 2

WARNING: unable to ping "host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: starting server using "/usr/local/postgresql/bin/pg_ctl  -w -D '/data/n001' start"

NOTICE: NODE REJOIN successful

DETAIL: node 1 is now attached to node 2

-bash-4.2$ repmgr  -f /data/node2.conf cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                               

----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------

1  | node1 | standby |   running | node2    | default  | 100      | 4        | host=127.0.0.1 user=repmgr dbname=repmgr connect_timeout=2      

2  | node2 | primary | * running |          | default  | 100      | 5        | host=127.0.0.1 port=5433 user=repmgr dbname=repmgr connect_timeout=2

3  | node3 | standby |   running | node2    | default  | 100      | 5        | host=127.0.0.1 port=5434 user=repmgr dbname=repmgr connect_timeout=2

*禁止转载,可转发(转发文章请注明出处)

PostgreSQL高可用之repmgr

PostgreSQL是一个强大的开源数据库管理系统,而repmgr是一个用于实现高可用性的工具,可以帮助用户部署和管理PostgreSQL的复制集群。下面将介绍如何在CentOS 7上安装和部署repmgr实现PostgreSQL高可用性。 首先,需要在CentOS 7上安装PostgreSQLrepmgr。可以通过yum安装PostgreSQL,然后再通过源代码安装repmgr。安装完成后,需要在所有节点上创建用于复制的用户和用户组,并配置SSH免密登录。 接下来,在主节点上初始化repmgr并创建repmgr用户,然后再在备用节点上连接到主节点并完成初始化。 然后,在主节点上进行repmgr的配置,包括指定节点ID、设置连接信息等。同时,在备用节点上也需要进行repmgr的配置,指定备用节点的ID并设置连接信息。配置完成后,在主节点上执行repmgr standby clone命令将主节点的数据拷贝到备用节点。 最后,在所有节点上启动repmgr守护进程,并在主节点上执行repmgr cluster show命令检查集群状态。如果所有步骤都完成并且状态正常,那么repmgr的安装和部署就顺利完成了。 通过以上步骤,就可以在CentOS 7上安装和部署repmgr实现PostgreSQL高可用性。当主节点发生故障时,repmgr可以自动将备用节点提升为主节点,确保系统的持续可用性。同时,repmgr还提供了监控和管理PostgreSQL复制集群的功能,帮助用户更好地管理数据库系统。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值