PG基于repmgr实现自动和手动故障切换

PG基于repmgr实现自动和手动故障切换

1、准备机器(1主2从1withness)

主库

[root@wcbpg ~]# docker rm -f wcbrepmgr64361
[root@wcbpg ~]# docker run -d --name wcbrepmgr64361 -h wcbrepmgr64361 \
  --net=pg-network --ip 172.72.6.61 \
  -p 64361:5432  \
  -v /sys/fs/cgroup:/sys/fs/cgroup \
  --privileged=true registry.cn-hangzhou.aliyuncs.com/lhrbest/lhrpgall:1.0 \
  /usr/sbin/init
[root@wcbpg ~]# docker network connect bridge wcbrepmgr64361

从库1

[root@wcbpg ~]# docker rm -f wcbrepmgr64362
[root@wcbpg ~]# docker run -d --name wcbrepmgr64362 -h wcbrepmgr64362 \
  --net=pg-network --ip 172.72.6.62 \
  -p 64362:5432  \
  -v /sys/fs/cgroup:/sys/fs/cgroup \
  --privileged=true registry.cn-hangzhou.aliyuncs.com/lhrbest/lhrpgall:1.0 \
  /usr/sbin/init
[root@wcbpg ~]# docker network connect bridge wcbrepmgr64362

从库2

[root@wcbpg ~]# docker rm -f wcbrepmgr64363
[root@wcbpg ~]# docker run -d --name wcbrepmgr64363 -h wcbrepmgr64363 \
  --net=pg-network --ip 172.72.6.63 \
  -p 64363:5432  \
  -v /sys/fs/cgroup:/sys/fs/cgroup \
  --privileged=true registry.cn-hangzhou.aliyuncs.com/lhrbest/lhrpgall:1.0 \
  /usr/sbin/init
[root@wcbpg ~]# docker network connect bridge wcbrepmgr64363

withness库

[root@wcbpg ~]#docker rm -f wcbrepmgr64364
[root@wcbpg ~]#docker run -d --name wcbrepmgr64364 -h wcbrepmgr64364 \
  --net=pg-network --ip 172.72.6.64 \
  -p 64364:5432  \
  -v /sys/fs/cgroup:/sys/fs/cgroup \
  --privileged=true registry.cn-hangzhou.aliyuncs.com/lhrbest/lhrpgall:1.0 \
  /usr/sbin/init
[root@wcbpg ~]#docker network connect bridge wcbrepmgr64364

重启这四台容器

[root@wcbpg ~]# docker restart wcbrepmgr64364 wcbrepmgr64363 wcbrepmgr64362 wcbrepmgr64361
[root@wcbpg ~]# docker ps |grep wcbrepmgr
46bd4758ba01        registry.cn-hangzhou.aliyuncs.com/lhrbest/lhrpgall:1.0   "/usr/sbin/init"         2 minutes ago       Up 53 seconds       5433-5435/tcp, 0.0.0.0:64364->5432/tcp                                                         wcbrepmgr64364
bbc64f79cfc7        registry.cn-hangzhou.aliyuncs.com/lhrbest/lhrpgall:1.0   "/usr/sbin/init"         3 minutes ago       Up 42 seconds       5433-5435/tcp, 0.0.0.0:64363->5432/tcp                                                         wcbrepmgr64363
c7c42f5ca4d2        registry.cn-hangzhou.aliyuncs.com/lhrbest/lhrpgall:1.0   "/usr/sbin/init"         4 minutes ago       Up 30 seconds       5433-5435/tcp, 0.0.0.0:64362->5432/tcp                                                         wcbrepmgr64362
577c72695a92        registry.cn-hangzhou.aliyuncs.com/lhrbest/lhrpgall:1.0   "/usr/sbin/init"         6 minutes ago       Up 18 seconds       5433-5435/tcp, 0.0.0.0:64361->5432/tcp                                                         wcbrepmgr64361

2、安装repmgr

官网:https://repmgr.org/

官方文档:https://repmgr.org/docs/current/index.html

所有历史版本:https://repmgr.org/downloads.html

GitHub:https://github.com/EnterpriseDB/repmgr

从https://repmgr.org/download/repmgr-5.2.1.tar.gz下载好安装包后上传到Linux。

拷贝文件到容器内

[root@wcbpg /]# docker cp repmgr-5.2.1.tar.gz wcbrepmgr64361:/home/pg13/
[root@wcbpg /]# docker cp repmgr-5.2.1.tar.gz wcbrepmgr64362:/home/pg13/
[root@wcbpg /]# docker cp repmgr-5.2.1.tar.gz wcbrepmgr64363:/home/pg13/
[root@wcbpg /]# docker cp repmgr-5.2.1.tar.gz wcbrepmgr64364:/home/pg13/

依次在所有PG节点环境上安装repmgr

[root@wcbpg ~]# docker exec -it wcbrepmgr64364 bash
[root@wcbrepmgr64364 /]# su - pg13
[pg13@wcbrepmgr64364 ~]$ ls
repmgr-5.2.1.tar.gz
[pg13@wcbrepmgr64364 ~]$ tar -zxvf repmgr-5.2.1.tar.gz
[pg13@wcbrepmgr64364 ~]$ cd repmgr-5.2.1/
[pg13@wcbrepmgr64364 repmgr-5.2.1]$ ./configure 
[pg13@wcbrepmgr64364 repmgr-5.2.1]$ make -j 8 && make install
[pg13@wcbrepmgr64364 repmgr-5.2.1]$ repmgr --help

配置四台机器的互信

下载sshUserSetup.sh文件并上传到Linux

[root@wcbpg ~]# docker cp sshUserSetup.sh  wcbrepmgr64361:/root/

只在主库61节点上,以root用户执行:sshUserSetup.sh依次输入:yes,回车和密码即可。

[root@wcbrepmgr64361 ~]# pwd
/root
[root@wcbrepmgr64361 ~]# ls -lth
total 40K
-rw-r--r-- 1 root root  33K Nov 27 21:59 sshUserSetup.sh
-rw------- 1 root root 3.3K Dec  4  2018 anaconda-ks.cfg
[root@wcbrepmgr64361 ~]#chmod +x sshUserSetup.sh
[root@wcbrepmgr64361 ~]# ./sshUserSetup.sh -user pg13  -hosts "172.72.6.61 172.72.6.62 172.72.6.63 172.72.6.64" -advanced exverify -confirm
The output of this script is also logged into /tmp/sshUserSetup_2022-01-01-11-59-01.log
Hosts are 172.72.6.61 172.72.6.62 172.72.6.63 172.72.6.64
user is pg13
Platform:- Linux 
Checking if the remote hosts are reachable
PING 172.72.6.61 (172.72.6.61) 56(84) bytes of data.
64 bytes from 172.72.6.61: icmp_seq=1 ttl=64 time=0.068 ms
64 bytes from 172.72.6.61: icmp_seq=2 ttl=64 time=0.074 ms
64 bytes from 172.72.6.61: icmp_seq=3 ttl=64 time=0.078 ms
64 bytes from 172.72.6.61: icmp_seq=4 ttl=64 time=0.102 ms
64 bytes from 172.72.6.61: icmp_seq=5 ttl=64 time=0.085 ms

--- 172.72.6.61 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4001ms
rtt min/avg/max/mdev = 0.068/0.081/0.102/0.014 ms
PING 172.72.6.62 (172.72.6.62) 56(84) bytes of data.
64 bytes from 172.72.6.62: icmp_seq=1 ttl=64 time=0.254 ms
64 bytes from 172.72.6.62: icmp_seq=2 ttl=64 time=0.163 ms
64 bytes from 172.72.6.62: icmp_seq=3 ttl=64 time=0.123 ms
64 bytes from 172.72.6.62: icmp_seq=4 ttl=64 time=0.094 ms
64 bytes from 172.72.6.62: icmp_seq=5 ttl=64 time=0.160 ms

--- 172.72.6.62 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 0.094/0.158/0.254/0.056 ms
PING 172.72.6.63 (172.72.6.63) 56(84) bytes of data.
64 bytes from 172.72.6.63: icmp_seq=1 ttl=64 time=0.447 ms
64 bytes from 172.72.6.63: icmp_seq=2 ttl=64 time=0.154 ms
64 bytes from 172.72.6.63: icmp_seq=3 ttl=64 time=0.134 ms
64 bytes from 172.72.6.63: icmp_seq=4 ttl=64 time=0.147 ms
64 bytes from 172.72.6.63: icmp_seq=5 ttl=64 time=0.170 ms

--- 172.72.6.63 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4008ms
rtt min/avg/max/mdev = 0.134/0.210/0.447/0.119 ms
PING 172.72.6.64 (172.72.6.64) 56(84) bytes of data.
64 bytes from 172.72.6.64: icmp_seq=1 ttl=64 time=0.446 ms
64 bytes from 172.72.6.64: icmp_seq=2 ttl=64 time=0.138 ms
64 bytes from 172.72.6.64: icmp_seq=3 ttl=64 time=0.175 ms
64 bytes from 172.72.6.64: icmp_seq=4 ttl=64 time=0.163 ms
64 bytes from 172.72.6.64: icmp_seq=5 ttl=64 time=0.097 ms

--- 172.72.6.64 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4013ms
rtt min/avg/max/mdev = 0.097/0.203/0.446/0.125 ms
Remote host reachability check succeeded.
The following hosts are reachable: 172.72.6.61 172.72.6.62 172.72.6.63 172.72.6.64.
The following hosts are not reachable: .
All hosts are reachable. Proceeding further...
firsthost 172.72.6.61
numhosts 4
The script will setup SSH connectivity from the host wcbrepmgr64361 to all
the remote hosts. After the script is executed, the user can use SSH to run
commands on the remote hosts or copy files between this host wcbrepmgr64361
and the remote hosts without being prompted for passwords or confirmations.

NOTE 1:
As part of the setup procedure, this script will use ssh and scp to copy
files between the local host and the remote hosts. Since the script does not
store passwords, you may be prompted for the passwords during the execution of
the script whenever ssh or scp is invoked.

NOTE 2:
AS PER SSH REQUIREMENTS, THIS SCRIPT WILL SECURE THE USER HOME DIRECTORY
AND THE .ssh DIRECTORY BY REVOKING GROUP AND WORLD WRITE PRIVILEGES TO THESE
directories.

Do you want to continue and let the script make the above mentioned changes (yes/no)?
Confirmation provided on the command line

The user chose yes
Please specify if you want to specify a passphrase for the private key this script will create for the local host. Passphrase is used to encrypt the private key and makes SSH much more secure. Type 'yes' or 'no' and then press enter. In case you press 'yes', you would need to enter the passphrase whenever the script executes ssh or scp. no 
The estimated number of times the user would be prompted for a passphrase is 8. In addition, if the private-public files are also newly created, the user would have to specify the passphrase on one additional occasion. 
Enter 'yes' or 'no'.
yes

The user chose yes
Creating .ssh directory on local host, if not present already
Creating authorized_keys file on local host
Changing permissions on authorized_keys to 644 on local host
Creating known_hosts file on local host
Changing permissions on known_hosts to 644 on local host
Creating config file on local host
If a config file exists already at /root/.ssh/config, it would be backed up to /root/.ssh/config.backup.
Removing old private/public keys on local host
Running SSH keygen on local host
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:JEwRyjSIrpcJj86qYBxtYPEFcVKipWFbeYGBBme1iBY root@wcbrepmgr64361
The key's randomart image is:
+---[RSA 1024]----+
|oE+@@==o         |
|o*%*=B           |
|+B..= o .        |
|+.o    o         |
|.= =    S        |
|+ B              |
|++               |
|oo               |
|=                |
+----[SHA256]-----+
Creating .ssh directory and setting permissions on remote host 172.72.6.61
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR pg13. THIS IS AN SSH REQUIREMENT.
The script would create ~pg13/.ssh/config file on remote host 172.72.6.61. If a config file exists already at ~pg13/.ssh/config, it would be backed up to ~pg13/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host 172.72.6.61.
Warning: Permanently added '172.72.6.61' (ECDSA) to the list of known hosts.
pg13@172.72.6.61's password: 
Done with creating .ssh directory and setting permissions on remote host 172.72.6.61.
Creating .ssh directory and setting permissions on remote host 172.72.6.62
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR pg13. THIS IS AN SSH REQUIREMENT.
The script would create ~pg13/.ssh/config file on remote host 172.72.6.62. If a config file exists already at ~pg13/.ssh/config, it would be backed up to ~pg13/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host 172.72.6.62.
Warning: Permanently added '172.72.6.62' (ECDSA) to the list of known hosts.
pg13@172.72.6.62's password: 
Done with creating .ssh directory and setting permissions on remote host 172.72.6.62.
Creating .ssh directory and setting permissions on remote host 172.72.6.63
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR pg13. THIS IS AN SSH REQUIREMENT.
The script would create ~pg13/.ssh/config file on remote host 172.72.6.63. If a config file exists already at ~pg13/.ssh/config, it would be backed up to ~pg13/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host 172.72.6.63.
Warning: Permanently added '172.72.6.63' (ECDSA) to the list of known hosts.
pg13@172.72.6.63's password: 
Done with creating .ssh directory and setting permissions on remote host 172.72.6.63.
Creating .ssh directory and setting permissions on remote host 172.72.6.64
THE SCRIPT WOULD ALSO BE REVOKING WRITE PERMISSIONS FOR group AND others ON THE HOME DIRECTORY FOR pg13. THIS IS AN SSH REQUIREMENT.
The script would create ~pg13/.ssh/config file on remote host 172.72.6.64. If a config file exists already at ~pg13/.ssh/config, it would be backed up to ~pg13/.ssh/config.backup.
The user may be prompted for a password here since the script would be running SSH on host 172.72.6.64.
Warning: Permanently added '172.72.6.64' (ECDSA) to the list of known hosts.
pg13@172.72.6.64's password: 
Done with creating .ssh directory and setting permissions on remote host 172.72.6.64.
Copying local host public key to the remote host 172.72.6.61
The user may be prompted for a password or passphrase here since the script would be using SCP for host 172.72.6.61.
pg13@172.72.6.61's password: 
Done copying local host public key to the remote host 172.72.6.61
Copying local host public key to the remote host 172.72.6.62
The user may be prompted for a password or passphrase here since the script would be using SCP for host 172.72.6.62.
pg13@172.72.6.62's password: 
Done copying local host public key to the remote host 172.72.6.62
Copying local host public key to the remote host 172.72.6.63
The user may be prompted for a password or passphrase here since the script would be using SCP for host 172.72.6.63.
pg13@172.72.6.63's password: 
Done copying local host public key to the remote host 172.72.6.63
Copying local host public key to the remote host 172.72.6.64
The user may be prompted for a password or passphrase here since the script would be using SCP for host 172.72.6.64.
pg13@172.72.6.64's password: 
Done copying local host public key to the remote host 172.72.6.64
Creating keys on remote host 172.72.6.61 if they do not exist already. This is required to setup SSH on host 172.72.6.61.
Generating public/private rsa key pair.
Your identification has been saved in .ssh/id_rsa.
Your public key has been saved in .ssh/id_rsa.pub.
The key fingerprint is:
SHA256:bS3FF83jGzcgiGgizS6BggPPLxdb8FI16IpiCdzS4BA pg13@wcbrepmgr64361
The key's randomart image is:
+---[RSA 1024]----+
|E.o . oo+ .   .o |
|** + *.. o o . oo|
|=.X =.o     + + .|
|.= = =.  . o . +.|
|. =.+.  S + .   =|
|.o.o.    . .   . |
|..               |
|                 |
|                 |
+----[SHA256]-----+
Creating keys on remote host 172.72.6.62 if they do not exist already. This is required to setup SSH on host 172.72.6.62.
Generating public/private rsa key pair.
Your identification has been saved in .ssh/id_rsa.
Your public key has been saved in .ssh/id_rsa.pub.
The key fingerprint is:
SHA256:7yZAItuR+w7kSvkADh/Xu3Es1lr9n18FCpaeCjA5034 pg13@wcbrepmgr64362
The key's randomart image is:
+---[RSA 1024]----+
|                 |
|     o     .     |
|    *..   +   .  |
|  . +B.  o o . . |
|o .+o=+ E o .   .|
|oo.*o .* +      .|
| .= o.=.* o     .|
| . + o.B....   ..|
|  . ..+  o. ..o..|
+----[SHA256]-----+
Creating keys on remote host 172.72.6.63 if they do not exist already. This is required to setup SSH on host 172.72.6.63.
Generating public/private rsa key pair.
Your identification has been saved in .ssh/id_rsa.
Your public key has been saved in .ssh/id_rsa.pub.
The key fingerprint is:
SHA256:ialCTxTpd48NKpSv875ORpHxgtr3dOGhx7wkXM+B5eg pg13@wcbrepmgr64363
The key's randomart image is:
+---[RSA 1024]----+
|    ...     .    |
|    .o +   =     |
|   .o.+ . * o    |
|   ++ .*oO = .   |
|  o.oo=oS=E o    |
| . o.+oo.=o.     |
|  . ooo . .      |
|   .oo           |
|     ==.         |
+----[SHA256]-----+
Creating keys on remote host 172.72.6.64 if they do not exist already. This is required to setup SSH on host 172.72.6.64.
Generating public/private rsa key pair.
Your identification has been saved in .ssh/id_rsa.
Your public key has been saved in .ssh/id_rsa.pub.
The key fingerprint is:
SHA256:+UQ4oIINX1y8suEJjRuNeaHkoOoQwmv06vlZ1hwZI8Y pg13@wcbrepmgr64364
The key's randomart image is:
+---[RSA 1024]----+
|.  ..oo          |
|.=..+... .       |
|+++B.E +o .      |
|+oO.B o ++       |
|+.o* = oS .      |
|oo..+ o .o       |
|+ .  o o  .      |
| o. +            |
|.o.o             |
+----[SHA256]-----+
Updating authorized_keys file on remote host 172.72.6.61
Updating known_hosts file on remote host 172.72.6.61
The script will run SSH on the remote machine 172.72.6.61. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Updating authorized_keys file on remote host 172.72.6.62
Updating known_hosts file on remote host 172.72.6.62
The script will run SSH on the remote machine 172.72.6.62. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Updating authorized_keys file on remote host 172.72.6.63
Updating known_hosts file on remote host 172.72.6.63
The script will run SSH on the remote machine 172.72.6.63. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Updating authorized_keys file on remote host 172.72.6.64
Updating known_hosts file on remote host 172.72.6.64
The script will run SSH on the remote machine 172.72.6.64. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
SSH setup is complete.

------------------------------------------------------------------------
Verifying SSH setup
===================
The script will now run the date command on the remote nodes using ssh
to verify if ssh is setup correctly. IF THE SETUP IS CORRECTLY SETUP,
THERE SHOULD BE NO OUTPUT OTHER THAN THE DATE AND SSH SHOULD NOT ASK FOR
PASSWORDS. If you see any output other than date or are prompted for the
password, ssh is not setup correctly and you will need to resolve the
issue and set up ssh again.
The possible causes for failure could be:
1. The server settings in /etc/ssh/sshd_config file do not allow ssh
for user pg13.
2. The server may have disabled public key based authentication.
3. The client public key on the server may be outdated.
4. ~pg13 or ~pg13/.ssh on the remote host may not be owned by pg13.
5. User may not have passed -shared option for shared remote users or
may be passing the -shared option for non-shared remote users.
6. If there is output in addition to the date, but no password is asked,
it may be a security alert shown as part of company policy. Append the
additional text to the <OMS HOME>/sysman/prov/resources/ignoreMessages.txt file.
------------------------------------------------------------------------
--172.72.6.61:--
Running /usr/bin/ssh -x -l pg13 172.72.6.61 date to verify SSH connectivity has been setup from local host to 172.72.6.61.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
The script will run SSH on the remote machine 172.72.6.61. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Sat Jan  1 12:00:07 CST 2022
------------------------------------------------------------------------
--172.72.6.62:--
Running /usr/bin/ssh -x -l pg13 172.72.6.62 date to verify SSH connectivity has been setup from local host to 172.72.6.62.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
The script will run SSH on the remote machine 172.72.6.62. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Sat Jan  1 12:00:08 CST 2022
------------------------------------------------------------------------
--172.72.6.63:--
Running /usr/bin/ssh -x -l pg13 172.72.6.63 date to verify SSH connectivity has been setup from local host to 172.72.6.63.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
The script will run SSH on the remote machine 172.72.6.63. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Sat Jan  1 12:00:08 CST 2022
------------------------------------------------------------------------
--172.72.6.64:--
Running /usr/bin/ssh -x -l pg13 172.72.6.64 date to verify SSH connectivity has been setup from local host to 172.72.6.64.
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL. Please note that being prompted for a passphrase may be OK but being prompted for a password is ERROR.
The script will run SSH on the remote machine 172.72.6.64. The user may be prompted for a passphrase here in case the private key has been encrypted with a passphrase.
Sat Jan  1 12:00:08 CST 2022
------------------------------------------------------------------------
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from 172.72.6.61 to 172.72.6.61
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Bad owner or permissions on /home/pg13/.ssh/config
------------------------------------------------------------------------
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from 172.72.6.61 to 172.72.6.62
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Bad owner or permissions on /home/pg13/.ssh/config
------------------------------------------------------------------------
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from 172.72.6.61 to 172.72.6.63
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Bad owner or permissions on /home/pg13/.ssh/config
------------------------------------------------------------------------
------------------------------------------------------------------------
Verifying SSH connectivity has been setup from 172.72.6.61 to 172.72.6.64
IF YOU SEE ANY OTHER OUTPUT BESIDES THE OUTPUT OF THE DATE COMMAND OR IF YOU ARE PROMPTED FOR A PASSWORD HERE, IT MEANS SSH SETUP HAS NOT BEEN SUCCESSFUL.
Bad owner or permissions on /home/pg13/.ssh/config
------------------------------------------------------------------------
-Verification from complete-
SSH verification complete.

在每台主机修改权限,否则报错:Bad owner or permissions on /home/pg13/.ssh/config

[pg13@wcbrepmgr64361 ~]$ chmod 600 /home/pg13/.ssh/config

在每台主机上进行验证通信是否正常

[pg13@wcbrepmgr64361 ~]$ ssh pg13@wcbrepmgr64361 date
Sat Jan  1 12:11:16 CST 2022
[pg13@wcbrepmgr64361 ~]$ ssh pg13@wcbrepmgr64362 date
Sat Jan  1 12:11:20 CST 2022
[pg13@wcbrepmgr64361 ~]$ ssh pg13@wcbrepmgr64363 date
Sat Jan  1 12:11:23 CST 2022
[pg13@wcbrepmgr64361 ~]$ ssh pg13@wcbrepmgr64364 date
Sat Jan  1 12:11:26 CST 2022

3、配置repmgr

repmgr前期配置。

在主库操作

[pg13@wcbrepmgr64361 ~]$ cat >> /pg13/pgdata/pg_hba.conf <<"EOF"

local repmgr repmgr md5
host repmgr repmgr 127.0.0.1/32 md5
host repmgr repmgr 172.72.6.0/24 md5

local replication repmgr md5
host replication repmgr 127.0.0.1/32 md5
host replication repmgr 172.72.6.0/24 md5

EOF

[pg13@wcbrepmgr64361 ~]$ cat >> /pg13/pgdata/postgresql.conf <<"EOF"

# 归档参数
wal_level='replica'
archive_mode='on'
archive_command='test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f'
restore_command='cp /pg13/archive/%f %p'

# 主从流复制
hot_standby=on
max_wal_senders=10
wal_sender_timeout=60s
wal_keep_size=16MB

# 主从切换参数,启用PG数据库的复制槽,PG12不需要"use_replication_slots=true"这个参数了。
max_replication_slots=10
wal_log_hints=on

# 自动切换
shared_preload_libraries ='repmgr'

EOF
[pg13@wcbrepmgr64361 ~]$ pg_ctl start
waiting for server to start....2022-01-01 12:17:01.842 CST [3368] LOG:  redirecting log output to logging collector process
2022-01-01 12:17:01.842 CST [3368] HINT:  Future log output will appear in directory "pg_log".
 done
server started
[pg13@wcbrepmgr64361 ~]$ psql
psql (13.2)
Type "help" for help.

postgres=# \x
Expanded display is on.
postgres=# select * from pg_settings where name in ('wal_level','archive_mode','archive_command');

-[ RECORD 1 ]---+------------------------------------------------------------------
name            | archive_command
setting         | test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f
unit            | 
category        | Write-Ahead Log / Archiving
short_desc      | Sets the shell command that will be called to archive a WAL file.
extra_desc      | 
context         | sighup
vartype         | string
source          | configuration file
min_val         | 
max_val         | 
enumvals        | 
boot_val        | 
reset_val       | test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f
sourcefile      | /pg13/pgdata/postgresql.conf
sourceline      | 791
pending_restart | f
-[ RECORD 2 ]---+------------------------------------------------------------------
name            | archive_mode
setting         | on
unit            | 
category        | Write-Ahead Log / Archiving
short_desc      | Allows archiving of WAL files using archive_command.
extra_desc      | 
context         | postmaster
vartype         | enum
source          | configuration file
min_val         | 
max_val         | 
enumvals        | {always,on,off}
boot_val        | off
reset_val       | on
sourcefile      | /pg13/pgdata/postgresql.conf
sourceline      | 790
pending_restart | f
-[ RECORD 3 ]---+------------------------------------------------------------------
name            | wal_level
setting         | replica
unit            | 
category        | Write-Ahead Log / Settings
short_desc      | Set the level of information written to the WAL.
extra_desc      | 
context         | postmaster
vartype         | enum
source          | configuration file
min_val         | 
max_val         | 
enumvals        | {minimal,replica,logical}
boot_val        | replica
reset_val       | replica
sourcefile      | /pg13/pgdata/postgresql.conf
sourceline      | 789
pending_restart | f

主库创建相关的用户和数据库

[pg13@wcbrepmgr64361 ~]$  createuser -s repmgr
[pg13@wcbrepmgr64361 ~]$ 
[pg13@wcbrepmgr64361 ~]$ createdb repmgr -O repmgr
[pg13@wcbrepmgr64361 ~]$ 
[pg13@wcbrepmgr64361 ~]$ psql -h 127.0.0.1 -c "alter user repmgr with password 'wcb';"
ALTER ROLE
[pg13@wcbrepmgr64361 ~]$ psql -h 127.0.0.1 -c "alter user repmgr set search_path to repmgr, \"\$user\",public;"
ALTER ROLE

每个节点分别修改repmgr.conf

主库:

[pg13@wcbrepmgr64361 ~]$cat > /pg13/pg13/repmgr.conf << "EOF"
node_id=1
node_name=wcbrepmgr64361
conninfo='host=172.72.6.61 user=repmgr password=wcb dbname=repmgr connect_timeout=2'
data_directory='/pg13/pgdata'
pg_bindir='/pg13/pg13/bin'
EOF

从库1

[pg13@wcbrepmgr64362 ~]$cat > /pg13/pg13/repmgr.conf << "EOF"
node_id=2
node_name=wcbrepmgr64362
conninfo='host=172.72.6.62 user=repmgr password=wcb dbname=repmgr connect_timeout=2'
data_directory='/pg13/pgdata'
pg_bindir='/pg13/pg13/bin'
EOF

从库2

[pg13@wcbrepmgr64363 ~]$ cat > /pg13/pg13/repmgr.conf << "EOF"
node_id=3
node_name=wcbrepmgr64363
conninfo='host=172.72.6.63 user=repmgr password=wcb dbname=repmgr connect_timeout=2'
data_directory='/pg13/pgdata'
pg_bindir='/pg13/pg13/bin'
EOF

witness节点

[pg13@wcbrepmgr64364 ~]$cat > /pg13/pg13/repmgr.conf << "EOF"
node_id=4
node_name=wcbrepmgr64364
conninfo='host=172.72.6.64 user=repmgr password=wcb dbname=repmgr connect_timeout=2'
data_directory='/pg13/pgdata'
pg_bindir='/pg13/pg13/bin'
EOF

在主库注册服务

[pg13@wcbrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf primary register
INFO: connecting to primary database...

NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
[pg13@wcbrepmgr64361 ~]$ 
[pg13@wcbrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show
 ID | Name           | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------
 1  | wcbrepmgr64361 | primary | * running |          | default  | 100      | 1        | host=172.72.6.61 user=repmgr password=wcb dbname=repmgr connect_timeout=2

每个节点都配置.pgpass密码文件

[pg13@wcbrepmgr64361 ~]$ pwd
/home/pg13
[pg13@wcbrepmgr64361 ~]$ echo "#ip:port:db:user:pwd" >> ~/.pgpass
[pg13@wcbrepmgr64361 ~]$ echo "172.72.6.61:5432:repmgr:repmgr:wcb" >> ~/.pgpass
[pg13@wcbrepmgr64361 ~]$ echo "172.72.6.62:5432:repmgr:repmgr:wcb" >> ~/.pgpass
[pg13@wcbrepmgr64361 ~]$ echo "172.72.6.63:5432:repmgr:repmgr:wcb" >> ~/.pgpass
[pg13@wcbrepmgr64361 ~]$ echo "172.72.6.64:5432:repmgr:repmgr:wcb" >> ~/.pgpass
[pg13@wcbrepmgr64361 ~]$ chmod 0600 ~/.pgpass

克隆备库1

[pg13@wcbrepmgr64362 ~]$ repmgr -h 172.72.6.61 -U repmgr -d repmgr -f /pg13/pg13/repmgr.conf standby clone --force --dry-run
NOTICE: destination directory "/pg13/pgdata" provided
ERROR: specified data directory "/pg13/pgdata" appears to contain a running PostgreSQL instance
HINT: ensure the target data directory does not contain a running PostgreSQL instance
[pg13@wcbrepmgr64362 ~]$ 
[pg13@wcbrepmgr64362 ~]$ pg_ctl stop
waiting for server to shut down.... done
server stopped
[pg13@wcbrepmgr64362 ~]$ repmgr -h 172.72.6.61 -U repmgr -d repmgr -f /pg13/pg13/repmgr.conf standby clone --force --dry-run
NOTICE: destination directory "/pg13/pgdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=172.72.6.61 user=repmgr dbname=repmgr
DETAIL: current installation size is 29 MB
INFO: "repmgr" extension is installed in database "repmgr"
WARNING: target data directory appears to be a PostgreSQL data directory and will be overwritten
DETAIL: target data directory is "/pg13/pgdata"
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: all prerequisites for "standby clone" are met
[pg13@wcbrepmgr64362 ~]$ repmgr -h 172.72.6.61 -U repmgr -d repmgr -f /pg13/pg13/repmgr.conf standby clone --force 
NOTICE: destination directory "/pg13/pgdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=172.72.6.61 user=repmgr dbname=repmgr
DETAIL: current installation size is 29 MB

INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: directory "/pg13/pgdata" exists but is not empty
NOTICE: -F/--force provided - deleting existing data directory "/pg13/pgdata"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /pg13/pg13/bin/pg_basebackup -l "repmgr base backup"  -D /pg13/pgdata -h 172.72.6.61 -p 5432 -U repmgr -X stream 
Password: 
WARNING:  skipping special file "./.s.PGSQL.5432"
WARNING:  skipping special file "./.s.PGSQL.5432"
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /pg13/pgdata start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
[pg13@wcbrepmgr64362 ~]$ pg_ctl -D /pg13/pgdata start
waiting for server to start....
2022-01-01 12:34:08.761 CST [3585] LOG:  redirecting log output to logging collector process
2022-01-01 12:34:08.761 CST [3585] HINT:  Future log output will appear in directory "pg_log".
 done
server started
[pg13@wcbrepmgr64362 ~]$ 
[pg13@wcbrepmgr64362 ~]$ psql
psql (13.2)
Type "help" for help.
postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 3594
status                | streaming
receive_start_lsn     | 0/5000000
receive_start_tli     | 1
written_lsn           | 0/50001F0
flushed_lsn           | 0/50001F0
received_tli          | 1
last_msg_send_time    | 2022-01-01 12:34:08.894686+08
last_msg_receipt_time | 2022-01-01 12:34:08.894884+08
latest_end_lsn        | 0/50001F0
latest_end_time       | 2022-01-01 12:34:08.894686+08
slot_name             | 
sender_host           | 172.72.6.61
sender_port           | 5432
conninfo              | user=repmgr password=******** channel_binding=disable connect_timeout=2 dbname=replication host=172.72.6.61 port=5432 application_name=wcbrepmgr64362 fallback_application_name=walreceiver sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any
[pg13@wcbrepmgr64362 repmgr-5.2.1]$  repmgr -f /pg13/pg13/repmgr.conf standby register --force
INFO: connecting to local node "wcbrepmgr64362" (ID: 2)
INFO: connecting to primary database

WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
INFO: standby registration complete
NOTICE: standby node "wcbrepmgr64362" (ID: 2) successfully registered
[pg13@wcbrepmgr64362 ~]$ 
[pg13@wcbrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | wcbrepmgr64361 | primary | * running |                | default  | 100      | 1        | host=172.72.6.61 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 2  | wcbrepmgr64362 | standby |   running | wcbrepmgr64361 | default  | 100      | 1        | host=172.72.6.62 user=repmgr password=wcb dbname=repmgr connect_timeout=2

克隆备库2

[pg13@wcbrepmgr64363 ~]$ pg_ctl stop
pg_ctl: PID file "/pg13/pgdata/postmaster.pid" does not exist
Is server running?
[pg13@wcbrepmgr64363 ~]$ repmgr -h 172.72.6.61 -U repmgr -d repmgr -f /pg13/pg13/repmgr.conf standby clone --force --dry-run
NOTICE: destination directory "/pg13/pgdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=172.72.6.61 user=repmgr dbname=repmgr

DETAIL: current installation size is 29 MB
INFO: "repmgr" extension is installed in database "repmgr"
WARNING: target data directory appears to be a PostgreSQL data directory and will be overwritten
DETAIL: target data directory is "/pg13/pgdata"
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 9 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: all prerequisites for "standby clone" are met
[pg13@wcbrepmgr64363 ~]$ 
[pg13@wcbrepmgr64363 ~]$ repmgr -h 172.72.6.61 -U repmgr -d repmgr -f /pg13/pg13/repmgr.conf standby clone --force 
NOTICE: destination directory "/pg13/pgdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=172.72.6.61 user=repmgr dbname=repmgr
DETAIL: current installation size is 29 MB

INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: directory "/pg13/pgdata" exists but is not empty
NOTICE: -F/--force provided - deleting existing data directory "/pg13/pgdata"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /pg13/pg13/bin/pg_basebackup -l "repmgr base backup"  -D /pg13/pgdata -h 172.72.6.61 -p 5432 -U repmgr -X stream 
Password: 
WARNING:  skipping special file "./.s.PGSQL.5432"
WARNING:  skipping special file "./.s.PGSQL.5432"
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /pg13/pgdata start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
[pg13@wcbrepmgr64363 ~]$ pg_ctl -D /pg13/pgdata start
waiting for server to start....
2022-01-01 12:36:23.576 CST [3666] LOG:  redirecting log output to logging collector process
2022-01-01 12:36:23.576 CST [3666] HINT:  Future log output will appear in directory "pg_log".
 done
server started
[pg13@wcbrepmgr64363 ~]$ 
[pg13@wcbrepmgr64363 ~]$ psql
psql (13.2)
Type "help" for help.

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 3675
status                | streaming
receive_start_lsn     | 0/7000000
receive_start_tli     | 1
written_lsn           | 0/7000280
flushed_lsn           | 0/7000280
received_tli          | 1
last_msg_send_time    | 2022-01-01 12:36:23.693785+08
last_msg_receipt_time | 2022-01-01 12:36:23.693898+08
latest_end_lsn        | 0/7000280
latest_end_time       | 2022-01-01 12:36:23.693785+08
slot_name             | 
sender_host           | 172.72.6.61
sender_port           | 5432
conninfo              | user=repmgr password=******** channel_binding=disable connect_timeout=2 dbname=replication host=172.72.6.61 port=5432 application_name=wcbrepmgr64363 fallback_application_name=walreceiver sslmode=disable sslcompression=0 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any

postgres=# 
postgres=# \q
[pg13@wcbrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby register --force
INFO: connecting to local node "wcbrepmgr64363" (ID: 3)
INFO: connecting to primary database

WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
INFO: standby registration complete
NOTICE: standby node "wcbrepmgr64363" (ID: 3) successfully registered
[pg13@wcbrepmgr64363 ~]$ 
[pg13@wcbrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | wcbrepmgr64361 | primary | * running |                | default  | 100      | 1        | host=172.72.6.61 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 2  | wcbrepmgr64362 | standby |   running | wcbrepmgr64361 | default  | 100      | 1        | host=172.72.6.62 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 3  | wcbrepmgr64363 | standby |   running | wcbrepmgr64361 | default  | 100      | 1        | host=172.72.6.63 user=repmgr password=wcb dbname=repmgr connect_timeout=2

配置 witness

[pg13@wcbrepmgr64364 ~]$ rm -rf /pg13/pgdata

[pg13@wcbrepmgr64364 ~]$ 
[pg13@wcbrepmgr64364 ~]$ /pg13/pg13/bin/initdb -D /pg13/pgdata -E UTF8 --locale=en_US.utf8 -U postgres

The files belonging to this database system will be owned by user "pg13".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /pg13/pgdata ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... PRC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /pg13/pg13/bin/pg_ctl -D /pg13/pgdata -l logfile start

[pg13@wcbrepmgr64364 ~]$ cat >> /pg13/pgdata/postgresql.conf <<"EOF"

listen_addresses = '*'
port=5432
unix_socket_directories='/pg13/pgdata'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on

# 归档参数
wal_level='replica'
archive_mode='on'
archive_command='test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f'
restore_command='cp /pg13/archive/%f %p'

# 主从流复制
hot_standby=on
max_wal_senders=10
wal_sender_timeout=60s
wal_keep_size=16MB

# 主从切换参数,启用PG数据库的复制槽,PG12不需要"use_replication_slots=true"这个参数了。
max_replication_slots=10
wal_log_hints=on

# 自动切换
shared_preload_libraries ='repmgr'

EOF

[pg13@wcbrepmgr64364 ~]$ cat >> /pg13/pgdata/pg_hba.conf <<"EOF"

# TYPE  DATABASE    USER    ADDRESS       METHOD
local     all       all                    trust
host      all       all   127.0.0.1/32     trust
host      all       all    0.0.0.0/0        md5
host   replication  all    0.0.0.0/0        md5

local repmgr repmgr md5
host repmgr repmgr 127.0.0.1/32 md5
host repmgr repmgr 172.72.6.0/24 md5

local replication repmgr md5
host replication repmgr 127.0.0.1/32 md5
host replication repmgr 172.72.6.0/24 md5

EOF
[pg13@wcbrepmgr64364 ~]$ /pg13/pg13/bin/pg_ctl -D /pg13/pgdata -l logfile start
waiting for server to start....
 done
server started
[pg13@wcbrepmgr64364 ~]$ createuser -s repmgr
[pg13@wcbrepmgr64364 ~]$ createdb repmgr -O repmgr
[pg13@wcbrepmgr64364 ~]$ psql -h 127.0.0.1 -c "alter user repmgr with password 'wcb';"
ALTER ROLE
[pg13@wcbrepmgr64364 ~]$ psql -h 127.0.0.1 -c "alter user repmgr set search_path to repmgr, \"\$user\",public;"
ALTER ROLE
[pg13@wcbrepmgr64364 ~]$ repmgr -f /pg13/pg13/repmgr.conf witness register -h 172.72.6.61 -U repmgr -d repmgr  --force
INFO: connecting to witness node "wcbrepmgr64364" (ID: 4)

INFO: connecting to primary node
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
INFO: witness registration complete
NOTICE: witness node "wcbrepmgr64364" (ID: 4) successfully registered
[pg13@wcbrepmgr64364 ~]$ 
[pg13@wcbrepmgr64364 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | wcbrepmgr64361 | primary | * running |                | default  | 100      | 1        | host=172.72.6.61 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 2  | wcbrepmgr64362 | standby |   running | wcbrepmgr64361 | default  | 100      | 1        | host=172.72.6.62 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 3  | wcbrepmgr64363 | standby |   running | wcbrepmgr64361 | default  | 100      | 1        | host=172.72.6.63 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 4  | wcbrepmgr64364 | witness | * running | wcbrepmgr64361 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=wcb dbname=repmgr connect_timeout=2
[pg13@wcbrepmgr64364 ~]$ 

检查repmgr集群

[pg13@wcbrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf  cluster matrix 
INFO: connecting to database
 Name           | ID | 1 | 2 | 3 | 4
----------------+----+---+---+---+---
 wcbrepmgr64361 | 1  | * | * | * | * 
 wcbrepmgr64362 | 2  | * | * | * | * 
 wcbrepmgr64363 | 3  | * | * | * | * 
 wcbrepmgr64364 | 4  | * | * | * | * 
[pg13@wcbrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf  cluster crosscheck
INFO: connecting to database
 Name           | ID | 1 | 2 | 3 | 4
----------------+----+---+---+---+---
 wcbrepmgr64361 | 1  | * | * | * | * 
 wcbrepmgr64362 | 2  | * | * | * | * 
 wcbrepmgr64363 | 3  | * | * | * | * 
 wcbrepmgr64364 | 4  | * | * | * | * 
[pg13@wcbrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf node status
Node "wcbrepmgr64361":
        PostgreSQL version: 13.2
        Total data size: 29 MB
        Conninfo: host=172.72.6.61 user=repmgr password=wcb dbname=repmgr connect_timeout=2
        Role: primary
        WAL archiving: enabled
        Archive command: test ! -f /pg13/archive/%f && cp %p /pg13/archive/%f
        WALs pending archiving: 0 pending files
        Replication connections: 2 (of maximal 10)
        Replication slots: 0 physical (of maximal 10; 0 missing)
        Replication lag: n/a

[pg13@wcbrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf node check
Node "wcbrepmgr64361":
        Server role: OK (node is primary)
        Replication lag: OK (N/A - node is primary)
        WAL archiving: OK (0 pending archive ready files)
        Upstream connection: OK (N/A - node is primary)
        Downstream servers: OK (2 of 2 downstream nodes attached)
        Replication slots: OK (node has no physical replication slots)
        Missing physical replication slots: OK (node has no missing physical replication slots)
        Configured data directory: OK (configured "data_directory" is "/pg13/pgdata")

4、主从切换

[pg13@wcbrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show
 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | wcbrepmgr64361 | primary | * running |                | default  | 100      | 1        | host=172.72.6.61 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 3  | wcbrepmgr64363 | standby |   running | wcbrepmgr64361 | default  | 100      | 1        | host=172.72.6.63 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 4  | wcbrepmgr64364 | witness | * running | wcbrepmgr64361 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=wcb dbname=repmgr connect_timeout=2
[pg13@wcbrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --dry-run --force-rewind
NOTICE: checking switchover on node "wcbrepmgr64363" (ID: 3) in --dry-run mode
INFO: prerequisites for using pg_rewind are met
INFO: SSH connection to host "172.72.6.61" succeeded
INFO: able to execute "repmgr" on remote host "172.72.6.61"
INFO: all sibling nodes are reachable via SSH
INFO: 2 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
NOTICE: local node "wcbrepmgr64363" (ID: 3) would be promoted to primary; current primary "wcbrepmgr64361" (ID: 1) would be demoted to standby
INFO: following shutdown command would be run on node "wcbrepmgr64361":
  "/pg13/pg13/bin/pg_ctl  -D '/pg13/pgdata' -W -m fast stop"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met
[pg13@wcbrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind
NOTICE: executing switchover on node "wcbrepmgr64363" (ID: 3)
NOTICE: local node "wcbrepmgr64363" (ID: 3) will be promoted to primary; current primary "wcbrepmgr64361" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "wcbrepmgr64361" (ID: 1)
NOTICE: issuing CHECKPOINT on node "wcbrepmgr64361" (ID: 1) 
DETAIL: executing server command "/pg13/pg13/bin/pg_ctl  -D '/pg13/pgdata' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/F000028
NOTICE: promoting standby to primary
DETAIL: promoting server "wcbrepmgr64363" (ID: 3) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "wcbrepmgr64363" (ID: 3) was successfully promoted to primary
NOTICE: issuing CHECKPOINT on node "wcbrepmgr64363" (ID: 3) 
ERROR: unable to execute CHECKPOINT
ERROR: connection to database failed
DETAIL: 
fe_sendauth: no password supplied

ERROR: unable to establish a replication connection to the rejoin target node
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"
INFO: waiting for node "wcbrepmgr64361" (ID: 1) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout")
DETAIL: checking for record in node "wcbrepmgr64363"'s "pg_stat_replication" table where "application_name" is "wcbrepmgr64361"
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"
[pg13@wcbrepmgr64363 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show
 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | wcbrepmgr64361 | primary | - failed  | ?              | default  | 100      |          | host=172.72.6.61 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 3  | wcbrepmgr64363 | primary | * running |                | default  | 100      | 2        | host=172.72.6.63 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 4  | wcbrepmgr64364 | witness | * running | wcbrepmgr64363 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=wcb dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "wcbrepmgr64361" (ID: 1)

HINT: execute with --verbose option to see connection error messages

这里可以看到升主成功了,但是主库切备库失败了,需要重新克隆备库

[pg13@wcbrepmgr64361 ~]$ pg_ctl stop
pg_ctl: PID file "/pg13/pgdata/postmaster.pid" does not exist
Is server running?
[pg13@wcbrepmgr64361 ~]$ repmgr -h 172.72.6.63 -U repmgr -d repmgr -f /pg13/pg13/repmgr.conf standby clone --force 
NOTICE: destination directory "/pg13/pgdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=172.72.6.63 user=repmgr dbname=repmgr

DETAIL: current installation size is 29 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: directory "/pg13/pgdata" exists but is not empty
NOTICE: -F/--force provided - deleting existing data directory "/pg13/pgdata"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /pg13/pg13/bin/pg_basebackup -l "repmgr base backup"  -D /pg13/pgdata -h 172.72.6.63 -p 5432 -U repmgr -X stream 
Password: 
WARNING:  skipping special file "./.s.PGSQL.5432"
WARNING:  skipping special file "./.s.PGSQL.5432"
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /pg13/pgdata start
HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record
[pg13@wcbrepmgr64361 ~]$ pg_ctl -D /pg13/pgdata start
waiting for server to start....
2022-01-01 13:28:51.087 CST [8249] LOG:  redirecting log output to logging collector process
2022-01-01 13:28:51.087 CST [8249] HINT:  Future log output will appear in directory "pg_log".
 done
server started
[pg13@wcbrepmgr64361 ~]$ 
[pg13@wcbrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby register --force
INFO: connecting to local node "wcbrepmgr64361" (ID: 1)

INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "wcbrepmgr64361" (ID: 1) successfully registered
[pg13@wcbrepmgr64361 ~]$ 
[pg13@wcbrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | wcbrepmgr64361 | standby |   running | wcbrepmgr64363 | default  | 100      | 2        | host=172.72.6.61 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 3  | wcbrepmgr64363 | primary | * running |                | default  | 100      | 2        | host=172.72.6.63 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 4  | wcbrepmgr64364 | witness | * running | wcbrepmgr64363 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=wcb dbname=repmgr connect_timeout=2
[pg13@wcbrepmgr64361 ~]$ 

failover异常手工主从切换

repmgr -f /pg13/pg13/repmgr.conf --siblings-follow standby promote
repmgr -f /pg13/pg13/repmgr.conf cluster show

5、其他常见问题

repmgr取消注册某个从节点

[pg13@wcbrepmgr64361 ~]$  repmgr standby unregister --node-id=2 -f /pg13/pg13/repmgr.conf
INFO: connecting to local standby
INFO: connecting to primary database
NOTICE: unregistering node 2
INFO: standby unregistration complete

主备切换提示WARNING: node “wcbrepmgr64361” not found in “pg_stat_replication”:

[pg13@wcbrepmgr64362 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby switchover --siblings-follow --force-rewind
NOTICE: executing switchover on node "wcbrepmgr64362" (ID: 2)

NOTICE: local node "wcbrepmgr64362" (ID: 2) will be promoted to primary; current primary "wcbrepmgr64361" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "wcbrepmgr64361" (ID: 1)
NOTICE: issuing CHECKPOINT on node "wcbrepmgr64361" (ID: 1) 
DETAIL: executing server command "/pg13/pg13/bin/pg_ctl  -D '/pg13/pgdata' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/8000028
NOTICE: promoting standby to primary
DETAIL: promoting server "wcbrepmgr64362" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "wcbrepmgr64362" (ID: 2) was successfully promoted to primary
NOTICE: issuing CHECKPOINT on node "wcbrepmgr64362" (ID: 2) 
ERROR: unable to execute CHECKPOINT
ERROR: connection to database failed
DETAIL: 
fe_sendauth: no password supplied

ERROR: unable to establish a replication connection to the rejoin target node
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"
INFO: waiting for node "wcbrepmgr64361" (ID: 1) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout")
DETAIL: checking for record in node "wcbrepmgr64362"'s "pg_stat_replication" table where "application_name" is "wcbrepmgr64361"
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"
WARNING: node "wcbrepmgr64361" not found in "pg_stat_replication"

这是主库切换到备库时异常了,需要重新克隆并且重新重新注册

[pg13@wcbrepmgr64361 ~]$ pg_ctl stop
pg_ctl: PID file "/pg13/pgdata/postmaster.pid" does not exist
Is server running?
[pg13@wcbrepmgr64361 ~]$ repmgr -h 172.72.6.63 -U repmgr -d repmgr -f /pg13/pg13/repmgr.conf standby clone --force 
NOTICE: destination directory "/pg13/pgdata" provided
INFO: connecting to source node
DETAIL: connection string is: host=172.72.6.63 user=repmgr dbname=repmgr

DETAIL: current installation size is 29 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: directory "/pg13/pgdata" exists but is not empty
NOTICE: -F/--force provided - deleting existing data directory "/pg13/pgdata"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /pg13/pg13/bin/pg_basebackup -l "repmgr base backup"  -D /pg13/pgdata -h 172.72.6.63 -p 5432 -U repmgr -X stream 
Password: 
WARNING:  skipping special file "./.s.PGSQL.5432"
WARNING:  skipping special file "./.s.PGSQL.5432"
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /pg13/pgdata start
HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record
[pg13@wcbrepmgr64361 ~]$ pg_ctl -D /pg13/pgdata start
waiting for server to start....
2022-01-01 13:28:51.087 CST [8249] LOG:  redirecting log output to logging collector process
2022-01-01 13:28:51.087 CST [8249] HINT:  Future log output will appear in directory "pg_log".
 done
server started
[pg13@wcbrepmgr64361 ~]$ 
[pg13@wcbrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf standby register --force
INFO: connecting to local node "wcbrepmgr64361" (ID: 1)

INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "wcbrepmgr64361" (ID: 1) successfully registered
[pg13@wcbrepmgr64361 ~]$ 
[pg13@wcbrepmgr64361 ~]$ repmgr -f /pg13/pg13/repmgr.conf cluster show

 ID | Name           | Role    | Status    | Upstream       | Location | Priority | Timeline | Connection string                                                        
----+----------------+---------+-----------+----------------+----------+----------+----------+---------------------------------------------------------------------------
 1  | wcbrepmgr64361 | standby |   running | wcbrepmgr64363 | default  | 100      | 2        | host=172.72.6.61 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 3  | wcbrepmgr64363 | primary | * running |                | default  | 100      | 2        | host=172.72.6.63 user=repmgr password=wcb dbname=repmgr connect_timeout=2
 4  | wcbrepmgr64364 | witness | * running | wcbrepmgr64363 | default  | 0        | n/a      | host=172.72.6.64 user=repmgr password=wcb dbname=repmgr connect_timeout=2
[pg13@wcbrepmgr64361 ~]$ 

修复主节点报错:该问题暂未解决(目前的方法是重新搭建,重新注册)

[pg13@wcbrepmgr64361 pg_log]$ repmgr -f /pg13/pg13/repmgr.conf node rejoin -d 'host=172.72.6.63 user=repmgr dbname=repmgr connect_timeout=2' --force-rewind  --verbose
NOTICE: using provided configuration file "/pg13/pg13/repmgr.conf"
ERROR: connection to database failed
DETAIL: 
fe_sendauth: no password supplied

ERROR: unable to establish a replication connection to the rejoin target node
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值