测试环境
项目 | 说明 |
---|---|
操作系统 | RockyLinux8.8 |
PostgreSQL | 12.16 |
Repmgr | yum源安装repmgr_12 |
CLup | 开源社区版 |
PostgreSQL系统用户 | pg12 |
PGDATA目录 | /data/pgdata12 |
主机列表
主机名 | IP |
---|---|
PG01 | 10.16.18.160 |
PG02 | 10.16.18.161 |
PG03 | 10.16.18.162 |
ubuntu-KVM | 10.16.18.16 |
先放上两者官方文档地址
Repmgr: https://www.repmgr.org/docs/current/index.html
CLup: CLup5.x产品手册:CLup简介
2. Repmgr 安装部署
repmgr实际上是安装到PostgreSQL软件中的,更像是PostgreSQL的一个插件,但是在bin
下多了些可执行文件。下文中我们将使用yum源安装,其会自动将PostgreSQL数据库软件也安装上。如果需要将其放到自己编译的软件目录中,需要迁移相关的文件,方法在下文中会有说明。本文中没有部署witness
节点和配置repmgrd
服务,所以当前还没有高可用的功能。主要是步骤太过繁琐了,后面找时间再补充上来。
2.1 安装repmgr软件
先安装PostgreSQL的yum源
dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm dnf -qy module disable postgresql
安装repmgr,此时会新安装PostgreSQL,即使目录已存在(即使配置了PG_CONFIG环境变量也不行)
dnf install repmgr_12
迁移到自己的PostgreSQL软件中
cp /usr/pgsql-12/bin/repmgr* /usr/csupg-12.16/bin/. cp /usr/pgsql-12/lib/repmgr* /usr/csupg-12.16/lib/. cp /usr/pgsql-12/share/extension/repmgr* /usr/csupg-12.16/share/extension/xxx
拷贝extension时注意看下自己的软件目录下是否还有子目录
postgresql
剩下的两台机器可以用scp发过去。
2.2 初始化主库
在10.16.18.160
上初始化一个PostgreSQL数据库,初始化主要步骤如下,这里不再详写了:
-
创建用户和用户组
pg12
-
添加环境变量
-
初始化数据库
-
修改数据库参数
-
启动数据库
本文中实际上是用CLup创建出来的,哈哈偷个懒,Web页面上点点更方便,如果是新手建议自己手工搭建。
2.3 配置主库
下面的操作在10.16.18.160
上执行。
修改repmgr配置文件的属主
chown -R pg12:pg12 /etc/repmgr/12/repmgr.conf
登录数据库用户pg12
su - pg12
登录数据库,在主库中创建一个单独的database对象repmgr和超级用户repmgr:
create user repmgr with superuser password 'repmgr12'; create database repmgr owner repmgr;
添加.pgpass文件
10.16.18.160:5412:repmgr:repmgr:repmgr12 10.16.18.160:5412:replication:repmgr:repmgr12 10.16.18.161:5412:repmgr:repmgr:repmgr12 10.16.18.161:5412:replication:repmgr:repmgr12 10.16.18.162:5412:repmgr:repmgr:repmgr12 10.16.18.162:5412:replication:repmgr:repmgr12
修改文件权限
chmod 600 .pgpass
修改repmgr配置文件/etc/repmgr/12/repmgr.conf
node_id=1 node_name='10.16.18.160' conninfo='host=10.16.18.160 user=repmgr dbname=repmgr connect_timeout=10' data_directory='/data/pgdata12' pg_bindir='/usr/csupg-12.16/bin/'
注册主库:
[pg12@PG01 ~]$ repmgr primary register INFO: connecting to primary database... NOTICE: attempting to install extension "repmgr" NOTICE: "repmgr" extension successfully installed
查看集群状态:
[pg12@PG01 ~]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+-------------------- 1 | PG01 | primary | * running | | default | 100 | 1 | host=10.16.18.160 user=repmgr dbname=repmgr connect_timeout=10
主库需要开启wal_log_hints
wal_log_hints = on
重启下数据库
pg_ctl restart
2.4 配置备库
以下操作需要在两台备库(10.16.18.161, 10.16.18.162)上都要执行。
创建用户和组
groupadd -g 712 pg12 useradd -g 712 -u 712 -m pg12
配置环境变量(这里是参照CLup创建数据库时添加的环境变量)
export PATH=/usr/csupg-12.16/bin:$PATH export LD_LIBRARY_PATH=/usr/csupg-12.16/lib:$LD_LIBRARY_PATH export PGDATA=/data/pgdata12 export PGHOST='/tmp' export PGPORT=5412 export LANG=en_US.UTF8
创建目录
mkdir -p /etc/repmgr/12
将10.16.18.160
上的配置文件拷贝过来
scp 10.16.18.160:/etc/repmgr/12/repmgr.conf /etc/repmgr/12/.
修改repmgr配置文件的属主
chown -R pg12:pg12 /etc/repmgr/12/repmgr.conf
登录数据库的系统用户
su - pg12
添加.pgpass文件
10.16.18.160:5412:repmgr:repmgr:repmgr12 10.16.18.160:5412:replication:repmgr:repmgr12 10.16.18.161:5412:repmgr:repmgr:repmgr12 10.16.18.161:5412:replication:repmgr:repmgr12 10.16.18.162:5412:repmgr:repmgr:repmgr12 10.16.18.162:5412:replication:repmgr:repmgr12
修改文件权限
chmod 600 .pgpass
修改10.16.18.161
机器上的repmgr配置文件
node_id=2 node_name='10.16.18.161' conninfo='host=10.16.18.161 user=repmgr dbname=repmgr connect_timeout=10' data_directory='/data/pgdata12' pg_bindir='/usr/csupg-12.16/bin/'
修改10.16.18.162
机器上的repmgr配置文件
node_id=3 node_name='10.16.18.162' conninfo='host=10.16.18.162 user=repmgr dbname=repmgr connect_timeout=10' data_directory='/data/pgdata12' pg_bindir='/usr/csupg-12.16/bin/'
2.5 搭建备库(clone)
以下操作需要在两台备库(10.16.18.161, 10.16.18.162)上都要执行。
测试clone命令,不真实执行,主要是检查下环境是否正常
[pg12@PG02 data]$ repmgr -h 10.16.18.160 -U repmgr -d repmgr standby clone --dry-run NOTICE: destination directory "/data/pgdata12" provided INFO: connecting to source node DETAIL: connection string is: host=10.16.18.160 user=repmgr dbname=repmgr DETAIL: current installation size is 32 MB INFO: "repmgr" extension is installed in database "repmgr" INFO: replication slot usage not requested; no replication slot will be set up for this standby INFO: parameter "max_wal_senders" set to 64 NOTICE: checking for available walsenders on the source node (2 required) INFO: sufficient walsenders available on the source node DETAIL: 2 required, 64 available NOTICE: checking replication connections can be made to the source server (2 required) INFO: required number of replication connections could be made to the source server DETAIL: 2 replication connections required NOTICE: standby will attach to upstream node 1 HINT: consider using the -c/--fast-checkpoint option INFO: would execute: /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup" -D /data/pgdata12 -h 10.16.18.160 -p 5412 -U repmgr -X stream INFO: all prerequisites for "standby clone" are met
出现上面的结果说明可以执行,否则的话按照提示先修复相关问题后再执行
repmgr -h 10.16.18.160 -U repmgr -d repmgr standby clone [pg12@PG03 ~]$ repmgr -h 10.16.18.160 -U repmgr -d repmgr standby clone NOTICE: destination directory "/data/pgdata12" provided INFO: connecting to source node DETAIL: connection string is: host=10.16.18.160 user=repmgr dbname=repmgr DETAIL: current installation size is 32 MB INFO: replication slot usage not requested; no replication slot will be set up for this standby NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: creating directory "/data/pgdata12"... NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/csupg-12.16/bin/pg_basebackup -l "repmgr base backup" -D /data/pgdata12 -h 10.16.18.160 -p 5412 -U repmgr -X stream NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: pg_ctl -D /data/pgdata12 start HINT: after starting the server, you need to register this standby with "repmgr standby register"
搭建好后启动备库:
pg_ctl start
注册备库
repmgr standby register --upstream-node-id 1 [pg12@PG03 ~]$ repmgr standby register --upstream-node-id 1 INFO: connecting to local node "PG03" (ID: 3) INFO: connecting to primary database INFO: standby registration complete NOTICE: standby node "PG03" (ID: 3) successfully registered
查看集群状态
[pg12@PG03 ~]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------- 1 | PG01 | primary | * running | | default | 100 | 1 | host=10.16.18.160 user=repmgr dbname=repmgr connect_timeout=10 2 | PG02 | standby | running | PG01 | default | 100 | 1 | host=10.16.18.161 user=repmgr dbname=repmgr connect_timeout=10 3 | PG03 | standby | running | PG01 | default | 100 | 1 | host=10.16.18.162 user=repmgr dbname=repmgr connect_timeout=10
3. CLup 安装部署
3.1 安装程序
先安装Server端(clup),这里是安装到ubuntu-KVM
一台虚拟机上的,使用一键安装命令,SSH连接上Ubuntu执行
wget -qO /tmp/clup.sh --no-check-certificate https://get.csudata.com/csuinst/clup.sh && bash /tmp/clup.sh openclup install
如果是有多个网卡,需要选择一个IP作为服务的IP。然后安装Agent端时需要输入选择的这个IP。
然后再安装Agent端,在三台数据库主机(10.16.18.160, 10.16.18.161, 10.16.18.162)上执行
wget -qO /tmp/clup.sh --no-check-certificate https://get.csudata.com/csuinst/clup.sh && bash /tmp/clup.sh openclup-agent install
中间需要输入CLup Server所在主机的IP。
安装完成后登录Web页面查看(http://10.16.18.16:8090 用户名:admin,密码:openclup):
3.2 创建集群
创建步骤可以参考我的另一篇文章:新发现:一款非常好用的PostgreSQL数据库管理工具-CSDN博客
主要步骤如下:
-
首次需要添加一个VIP池
-
创建集群
添加VIP池
创建集群的步骤参考文章:新发现:一款非常好用的PostgreSQL数据库管理工具-CSDN博客
这里不在赘述,就是Web界面上填写一些主要信息,点击几下就可以将集群搭建起来了。
前面的文章中,是在10.16.18.160
和10.16.18.161
上创建了一主一备的集群,然后这里再搭建一个备库
4. 一些使用场景的对比
4.1 Repmgr中的数据库切换
假设要把10.16.18.161
机器切换成主库。需要先配置用户免密,如何配置用户免密网上文章有很多,这里不再赘述。
repmgr的操作,到10.16.18.161
机器上执行下面的命令:
[pg12@PG02 ~]$ repmgr standby switchover NOTICE: executing switchover on node "10.16.18.161" (ID: 2) WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified DETAIL: these nodes will remain attached to the current primary: 10.16.18.162 (node ID: 3) NOTICE: attempting to pause repmgrd on 3 nodes NOTICE: local node "10.16.18.161" (ID: 2) will be promoted to primary; current primary "10.16.18.160" (ID: 1) will be demoted to standby NOTICE: stopping current primary node "10.16.18.160" (ID: 1) NOTICE: issuing CHECKPOINT on node "10.16.18.160" (ID: 1) DETAIL: executing server command "/usr/csupg-12.16/bin/pg_ctl -D '/data/pgdata12' -W -m fast stop" INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout") INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout") NOTICE: current primary has been cleanly shut down at location 0/1B000028 NOTICE: promoting standby to primary DETAIL: promoting server "10.16.18.161" (ID: 2) using pg_promote() NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete NOTICE: STANDBY PROMOTE successful DETAIL: server "10.16.18.161" (ID: 2) was successfully promoted to primary WARNING: node "10.16.18.160" attached in state "startup" INFO: waiting for node "10.16.18.160" (ID: 1) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout") DETAIL: node "10.16.18.161" (ID: 1) is currently attached to its upstream node in state "startup" NOTICE: node "10.16.18.161" (ID: 2) promoted to primary, node "10.16.18.160" (ID: 1) demoted to standby NOTICE: switchover was successful DETAIL: node "10.16.18.161" is now primary and node "10.16.18.160" is attached as standby NOTICE: STANDBY SWITCHOVER has completed successfully
查看下当前集群的状态
[pg12@PG02 ~]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+--------------+---------+-----------+--------------+----------+----------+----------+---------------------------------------------------------------- 1 | 10.16.18.160 | standby | running | 10.16.18.161 | default | 100 | 1 | host=10.16.18.160 user=repmgr dbname=repmgr connect_timeout=10 2 | 10.16.18.161 | primary | * running | | default | 100 | 2 | host=10.16.18.161 user=repmgr dbname=repmgr connect_timeout=10 3 | 10.16.18.162 | standby | running | 10.16.18.160 | default | 100 | 2 | host=10.16.18.162 user=repmgr dbname=repmgr connect_timeout=10
4.2 CLup中的数据库切换
先进入到集群的详情页
然后找到想要切换到的备库,点击切换为主库
点击提交后,这里会有一个是否保留级联关系的确认框,当前我们的备库都在主库上,所以保留和不保留级联关系,结果应该都是一样的。点击确定
切换过程会有日志打印,看到Success
说明应该是成功了
返回到集群详情查看
4.2 高可用故障切换
repmgr需要配置witness
节点和开启repmgrd
服务才能完成故障切换,这里没有配置witness
节点,暂时就先不演示了,后面再补。
CLup的故障切换,需要先将集群上线
然后我们将当前主库10.16.18.161
主机停掉看下CLup是否会将执行故障切换。先是探测到Agent异常
然后等一会发现会有一个后台任务
点击显示详情里面就是切换的过程了
再看下集群详情,发现主库已经切换到10.16.18.160
上去了(这里的时间忽略掉,中午休息去了哈哈)
5. 总结
主要功能对比:
功能 | Repmgr | CLup |
---|---|---|
搭建备库 | ✅ | ✅ |
切换主库 | ✅ | ✅ |
集群状态监控 | ✅ | ✅ |
高可用故障切换 | ✅ | ✅ |
其他运维管理(启停库、查看和修改数据库参数、查看数据库日志等) | ✅ |
安装部署对比:
-
Repmgr的安装部署相对复杂,安装配置过程相比于CLup要多很多。然后使用clone搭建备库和后续的高可用都需要先配置免密,高可用还需要配置
witness
节点和repmgrd
服务。 -
CLup的安装相对简单些,使用其一键安装的命令,默认也不需要修改什么配置,安装好POstgreSQL软件后就可以搭建集群了。然后得益于其有Web界面,操作可视化显得更加方便。
还有一点就是repmgr是命令行终端显示,CLup是Web页面可视化,相对而言CLup可能要更好用些,并且一次部署,可以管理很多套集群。不过大家可以都试试,学习嘛,就得多折腾,然后对比下哪个好用用哪个。对了本文中的CLup是开源社区版本,其还有一个商业版本,如果是企业使用的话,最好是用企业版本,毕竟有专人做技术支持更加放心些。