PostgreSQL 数据库高可用软件对比之Repmgr & CLup

PostgreSQL 迷徒

已于 2024-08-29 15:56:17 修改

阅读量870

点赞数 26

分类专栏： PostgreSQL 运维工具文章标签：数据库 postgresql

于 2024-08-29 14:47:40 首次发布

本文链接：https://blog.csdn.net/qq_40860316/article/details/141678147

版权

PostgreSQL 运维工具专栏收录该内容

4 篇文章 0 订阅

订阅专栏

测试环境

项目	说明
操作系统	RockyLinux8.8
PostgreSQL	12.16
Repmgr	yum源安装repmgr_12
CLup	开源社区版
PostgreSQL系统用户	pg12
PGDATA目录	/data/pgdata12

主机列表

主机名	IP
PG01	10.16.18.160
PG02	10.16.18.161
PG03	10.16.18.162
ubuntu-KVM	10.16.18.16

先放上两者官方文档地址

Repmgr: https://www.repmgr.org/docs/current/index.html

CLup: CLup5.x产品手册：CLup简介

2. Repmgr 安装部署

repmgr实际上是安装到PostgreSQL软件中的，更像是PostgreSQL的一个插件，但是在bin下多了些可执行文件。下文中我们将使用yum源安装，其会自动将PostgreSQL数据库软件也安装上。如果需要将其放到自己编译的软件目录中，需要迁移相关的文件，方法在下文中会有说明。本文中没有部署witness节点和配置repmgrd服务，所以当前还没有高可用的功能。主要是步骤太过繁琐了，后面找时间再补充上来。

2.1 安装repmgr软件

先安装PostgreSQL的yum源

dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm

dnf -qy module disable postgresql

安装repmgr，此时会新安装PostgreSQL，即使目录已存在（即使配置了PG_CONFIG环境变量也不行）

dnf install repmgr_12

迁移到自己的PostgreSQL软件中

cp /usr/pgsql-12/bin/repmgr* /usr/csupg-12.16/bin/.
cp /usr/pgsql-12/lib/repmgr* /usr/csupg-12.16/lib/.
cp /usr/pgsql-12/share/extension/repmgr* /usr/csupg-12.16/share/extension/xxx

拷贝extension时注意看下自己的软件目录下是否还有子目录postgresql

剩下的两台机器可以用scp发过去。

2.2 初始化主库

在10.16.18.160上初始化一个PostgreSQL数据库，初始化主要步骤如下，这里不再详写了：

创建用户和用户组pg12
添加环境变量
初始化数据库
修改数据库参数
启动数据库

本文中实际上是用CLup创建出来的，哈哈偷个懒，Web页面上点点更方便，如果是新手建议自己手工搭建。

2.3 配置主库

下面的操作在10.16.18.160上执行。

修改repmgr配置文件的属主

chown -R pg12:pg12 /etc/repmgr/12/repmgr.conf

登录数据库用户pg12

su - pg12

登录数据库，在主库中创建一个单独的database对象repmgr和超级用户repmgr:

create user repmgr with superuser password 'repmgr12';
create database repmgr owner repmgr;

添加.pgpass文件

10.16.18.160:5412:repmgr:repmgr:repmgr12
10.16.18.160:5412:replication:repmgr:repmgr12
10.16.18.161:5412:repmgr:repmgr:repmgr12
10.16.18.161:5412:replication:repmgr:repmgr12
10.16.18.162:5412:repmgr:repmgr:repmgr12
10.16.18.162:5412:replication:repmgr:repmgr12

修改文件权限

chmod 600 .pgpass

修改repmgr配置文件/etc/repmgr/12/repmgr.conf

node_id=1
node_name='10.16.18.160'
conninfo='host=10.16.18.160 user=repmgr dbname=repmgr connect_timeout=10'
data_directory='/data/pgdata12'
pg_bindir='/usr/csupg-12.16/bin/'

注册主库：

[pg12@PG01 ~]$ repmgr primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed

查看集群状态：

[pg12@PG01 ~]$ repmgr cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                             
----+------+---------+-----------+----------+----------+----------+----------+--------------------
 1  | PG01 | primary | * running |          | default  | 100      | 1        | host=10.16.18.160 user=repmgr dbname=repmgr connect_timeout=10

主库需要开启wal_log_hints

wal_log_hints = on

重启下数据库

pg_ctl restart

2.4 配置备库

以下操作需要在两台备库(10.16.18.161, 10.16.18.162)上都要执行。

创建用户和组

groupadd -g 712 pg12
useradd -g 712 -u 712 -m pg12

配置环境变量（这里是参照CLup创建数据库时添加的环境变量）

export PATH=/usr/csupg-12.16/bin:$PATH
export LD_LIBRARY_PATH=/usr/csupg-12.16/lib:$LD_LIBRARY_PATH
export PGDATA=/data/pgdata12
export PGHOST='/tmp'
export PGPORT=5412
export LANG=en_US.UTF8

创建目录

mkdir -p /etc/repmgr/12

将10.16.18.160上的配置文件拷贝过来

scp 10.16.18.160:/etc/repmgr/12/repmgr.conf /etc/repmgr/12/.

修改repmgr配置文件的属主

chown -R pg12:pg12 /etc/repmgr/12/repmgr.conf

登录数据库的系统用户

su - pg12

添加.pgpass文件

10.16.18.160:5412:repmgr:repmgr:repmgr12
10.16.18.160:5412:replication:repmgr:repmgr12
10.16.18.161:5412:repmgr:repmgr:repmgr12
10.16.18.161:5412:replication:repmgr:repmgr12
10.16.18.162:5412:repmgr:repmgr:repmgr12
10.16.18.162:5412:replication:repmgr:repmgr12

修改文件权限

chmod 600 .pgpass

修改10.16.18.161机器上的repmgr配置文件

node_id=2
node_name='10.16.18.161'
conninfo='host=10.16.18.161 user=repmgr dbname=repmgr connect_timeout=10'
data_directory='/data/pgdata12'
pg_bindir='/usr/csupg-12.16/bin/'

修改10.16.18.162机器上的repmgr配置文件

node_id=3
node_name='10.16.18.162'
conninfo='host=10.16.18.162 user=repmgr dbname=repmgr connect_timeout=10'
data_directory='/data/pgdata12'
pg_bindir='/usr/csupg-12.16/bin/'

2.5 搭建备库（clone）

以下操作需要在两台备库(10.16.18.161, 10.16.18.162)上都要执行。

测试clone命令，不真实执行，主要是检查下环境是否正常

[pg12@PG02 data]$ repmgr -h 10.16.18.160 -U repmgr -d repmgr standby clone --dry-run
NOTICE: destination directory "/data/pgdata12" provided
INFO: connecting to source node
DETAIL: connection string is: host=10.16.18.160 user=repmgr dbname=repmgr
DETAIL: current installation size is 32 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 64
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 64 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: would execute:
  /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /data/pgdata12 -h 10.16.18.160 -p 5412 -U repmgr -X stream 
INFO: all prerequisites for "standby clone" are met

出现上面的结果说明可以执行，否则的话按照提示先修复相关问题后再执行

repmgr -h 10.16.18.160 -U repmgr -d repmgr standby clone

[pg12@PG03 ~]$  repmgr -h 10.16.18.160 -U repmgr -d repmgr standby clone
NOTICE: destination directory "/data/pgdata12" provided
INFO: connecting to source node
DETAIL: connection string is: host=10.16.18.160 user=repmgr dbname=repmgr
DETAIL: current installation size is 32 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: creating directory "/data/pgdata12"...
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/csupg-12.16/bin/pg_basebackup -l "repmgr base backup"  -D /data/pgdata12 -h 10.16.18.160 -p 5412 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /data/pgdata12 start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

搭建好后启动备库：

pg_ctl start

注册备库

repmgr standby register --upstream-node-id 1

[pg12@PG03 ~]$ repmgr standby register --upstream-node-id 1
INFO: connecting to local node "PG03" (ID: 3)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "PG03" (ID: 3) successfully registered

查看集群状态

[pg12@PG03 ~]$ repmgr cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                             
----+------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------
 1  | PG01 | primary | * running |          | default  | 100      | 1        | host=10.16.18.160 user=repmgr dbname=repmgr connect_timeout=10
 2  | PG02 | standby |   running | PG01     | default  | 100      | 1        | host=10.16.18.161 user=repmgr dbname=repmgr connect_timeout=10
 3  | PG03 | standby |   running | PG01     | default  | 100      | 1        | host=10.16.18.162 user=repmgr dbname=repmgr connect_timeout=10

3. CLup 安装部署

3.1 安装程序

先安装Server端(clup)，这里是安装到ubuntu-KVM一台虚拟机上的，使用一键安装命令，SSH连接上Ubuntu执行

wget -qO /tmp/clup.sh --no-check-certificate https://get.csudata.com/csuinst/clup.sh && bash /tmp/clup.sh openclup install

如果是有多个网卡，需要选择一个IP作为服务的IP。然后安装Agent端时需要输入选择的这个IP。

然后再安装Agent端，在三台数据库主机(10.16.18.160, 10.16.18.161, 10.16.18.162)上执行

wget -qO /tmp/clup.sh --no-check-certificate https://get.csudata.com/csuinst/clup.sh && bash /tmp/clup.sh openclup-agent install

中间需要输入CLup Server所在主机的IP。

安装完成后登录Web页面查看(http://10.16.18.16:8090 用户名：admin，密码：openclup):

3.2 创建集群

创建步骤可以参考我的另一篇文章：新发现：一款非常好用的PostgreSQL数据库管理工具-CSDN博客

主要步骤如下：

首次需要添加一个VIP池
创建集群

添加VIP池

创建集群的步骤参考文章：新发现：一款非常好用的PostgreSQL数据库管理工具-CSDN博客

这里不在赘述，就是Web界面上填写一些主要信息，点击几下就可以将集群搭建起来了。

前面的文章中，是在10.16.18.160和10.16.18.161上创建了一主一备的集群，然后这里再搭建一个备库

4. 一些使用场景的对比

4.1 Repmgr中的数据库切换

假设要把10.16.18.161机器切换成主库。需要先配置用户免密，如何配置用户免密网上文章有很多，这里不再赘述。

repmgr的操作，到10.16.18.161机器上执行下面的命令：

[pg12@PG02 ~]$ repmgr standby switchover
NOTICE: executing switchover on node "10.16.18.161" (ID: 2)
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
  10.16.18.162 (node ID: 3)
NOTICE: attempting to pause repmgrd on 3 nodes
NOTICE: local node "10.16.18.161" (ID: 2) will be promoted to primary; current primary "10.16.18.160" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "10.16.18.160" (ID: 1)
NOTICE: issuing CHECKPOINT on node "10.16.18.160" (ID: 1) 
DETAIL: executing server command "/usr/csupg-12.16/bin/pg_ctl  -D '/data/pgdata12' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/1B000028
NOTICE: promoting standby to primary
DETAIL: promoting server "10.16.18.161" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "10.16.18.161" (ID: 2) was successfully promoted to primary
WARNING: node "10.16.18.160" attached in state "startup"
INFO: waiting for node "10.16.18.160" (ID: 1) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout")
DETAIL: node "10.16.18.161" (ID: 1) is currently attached to its upstream node in state "startup"
NOTICE: node "10.16.18.161" (ID: 2) promoted to primary, node "10.16.18.160" (ID: 1) demoted to standby
NOTICE: switchover was successful
DETAIL: node "10.16.18.161" is now primary and node "10.16.18.160" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully

查看下当前集群的状态

[pg12@PG02 ~]$ repmgr cluster show
 ID | Name         | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string                                             
----+--------------+---------+-----------+--------------+----------+----------+----------+----------------------------------------------------------------
 1  | 10.16.18.160 | standby |   running | 10.16.18.161 | default  | 100      | 1        | host=10.16.18.160 user=repmgr dbname=repmgr connect_timeout=10
 2  | 10.16.18.161 | primary | * running |              | default  | 100      | 2        | host=10.16.18.161 user=repmgr dbname=repmgr connect_timeout=10
 3  | 10.16.18.162 | standby |   running | 10.16.18.160 | default  | 100      | 2        | host=10.16.18.162 user=repmgr dbname=repmgr connect_timeout=10

4.2 CLup中的数据库切换

先进入到集群的详情页

然后找到想要切换到的备库，点击切换为主库

点击提交后，这里会有一个是否保留级联关系的确认框，当前我们的备库都在主库上，所以保留和不保留级联关系，结果应该都是一样的。点击确定

切换过程会有日志打印，看到Success说明应该是成功了

返回到集群详情查看

4.2 高可用故障切换

repmgr需要配置witness节点和开启repmgrd服务才能完成故障切换，这里没有配置witness节点，暂时就先不演示了，后面再补。

CLup的故障切换，需要先将集群上线

然后我们将当前主库10.16.18.161主机停掉看下CLup是否会将执行故障切换。先是探测到Agent异常

然后等一会发现会有一个后台任务

点击显示详情里面就是切换的过程了

再看下集群详情，发现主库已经切换到10.16.18.160上去了(这里的时间忽略掉，中午休息去了哈哈)

5. 总结

主要功能对比：

功能	Repmgr	CLup
搭建备库	✅	✅
切换主库	✅	✅
集群状态监控	✅	✅
高可用故障切换	✅	✅
其他运维管理(启停库、查看和修改数据库参数、查看数据库日志等)		✅

安装部署对比：

Repmgr的安装部署相对复杂，安装配置过程相比于CLup要多很多。然后使用clone搭建备库和后续的高可用都需要先配置免密，高可用还需要配置witness节点和repmgrd服务。
CLup的安装相对简单些，使用其一键安装的命令，默认也不需要修改什么配置，安装好POstgreSQL软件后就可以搭建集群了。然后得益于其有Web界面，操作可视化显得更加方便。

还有一点就是repmgr是命令行终端显示，CLup是Web页面可视化，相对而言CLup可能要更好用些，并且一次部署，可以管理很多套集群。不过大家可以都试试，学习嘛，就得多折腾，然后对比下哪个好用用哪个。对了本文中的CLup是开源社区版本，其还有一个商业版本，如果是企业使用的话，最好是用企业版本，毕竟有专人做技术支持更加放心些。