pg高可用之repmgr（一）

最新推荐文章于 2024-01-23 16:22:36 发布

三思呐三思

最新推荐文章于 2024-01-23 16:22:36 发布

阅读量8.4k

点赞数 7

分类专栏： PG 文章标签： repmgr postgresql高可用

本文链接：https://blog.csdn.net/weixin_37692493/article/details/117032458

版权

PG 专栏收录该内容

24 篇文章 12 订阅

订阅专栏

文章目录

一、repmgr初识
二 PostgreSQL + repmgr高可用集群安装部署
三、PostgreSQL + repmgr高可用集群集群部署

一、repmgr初识

repmgr是一套开源的PostgreSQL集群管理工具，具有非常轻量级的使用特性。具体表现有以下特点：

配置操作简单，可一键式完成相关部署操作；
支持Auto Failover和Manual Switchover；
分布式管理集群节点，易扩展，可在线增删集群节点。

1.1 repmgr基本概念

1、repmgr专业术语解释

replication cluster：复制集群，指的repmgr管理的通过流复制搭建的PostgreSQL数据库集群
node：在复制集群中，一个pg数据库节点称为一个node节点
upstream node: 备用服务器的流复制链接的上游节点
failover: 当主服务器节点不可用时，repmgr可将备用服务器自动提升为主服务器节点的过程成为failover
switchover: 通过repmgr工具手动将备用服务器提升为主服务器节点的过程称之为switchover
fencing: 当发生failover后，确保旧的主服务器节点不会被意外重新链接上来，造成“脑裂”
witness server: 见证服务器，在多个备服务器的情况下若发生failover，见证服务器可用来决策选出新的主服务器节点。
primary节点：流复制中可用于业务读写的节点
standby节点：流复制中数据均从master节点进行复制，至允许业务查询

2、repmgr组件组成及功能

repmgr ：用于执行管理任务的命令行工具，包括设置服务器角色、主动切换服务器角色、查看复制集群状态信息等
repmgrd ：守护程序，监控和记录复制集群信息、检测集群复制故障并决策选出最佳服务器并提升为主服务器、用户自定义脚本执行电子邮件告警发送

1.2 PostgreSQL + repmgr 基本架构

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NZbPBgUU-1621433220795)(http://note.youdao.com/yws/res/83085/2C172F81E6734B688BD7433801AC2057)]

Repmgr流复制管理工具对集群节点的管理是基于一个分布式的管理方式。每个节点都有自己的repmgr.conf配置文件，用来记录本节点的ID,节点名称，连接信息，数据库PGDATA目录等配置参数。在配置好这些参数后，就可以通过repmgr命令实现对集群节点的“一键式”部署。

部署完成后，每个节点都有自己的repmgrd守护进程来监控节点数据库状态，且每个节点维护自己的元数据表，用于记录所有集群节点的信息。其中主节点守护进程主要用来监控本节点数据库服务状态，备节点守护进程主要用来监控主节点和本节点数据库服务状态。在发生Auto Failover时，备节点在尝试N次连接主节点失败后，repmgrd会在所有备节点中选举一个候选备节点（选举机制参考以下Tips）提升为新主节点，然后其他备节点去Follow到该新主上，至此，形成一个新的集群状态。

Tips：

Repmgr选举候选备节点会以以下顺序选举：LSN ， Priority， Node_ID。
系统会先选举一个LSN比较大者作为候选备节点；
    若LSN一样，会根据Priority优先级进行比较，该优先级是在配置文件中进行参数配置；
    若优先级也一样，会比较节点的Node ID，小者会优先选举。

1.3 repmgr常用命令

命令	功能
repmgr primary register	注册对应服务器的primary节点为主服务器节点
repmgr primary unregister	注销不活动的主服务器节点
repmgr standby clone	从主节点复制数据到standby节点
repmgr standby register	注册对应节点standby节点为备用服务器节点
repmgr standby unregister	注销备用服务器节点
repmgr standby promote	将备服务器节点提升为主服务器节点
repmgr standby follow	将一主多从架构中，其余的standby被服务器节点重新指向新的primary主服务器节点
repmgr standby switchover	将指定备服务器节点提升为主服务器节点，并将primary主服务器降级为备服务器节点
repmgr witness register	注册指定节点为见证服务器节点
repmgr witness unregister	注销见证服务器节点
repmgr node status	查看各节点的基本信息和复制状态
repmgr node check	高可用集群节点状态信息检查
repmgr node rejoin	重新加入一个失效节点到集群
repmgr cluster show	查看集群中已注册的节点基本信息与状态
repmgr cluster matrix	查看集群中所有节点的matrix信息
repmgr cluster crosscheck	查看集群中所有节点间两两交叉连接检测
repmgr cluster event	查看集群事件记录信息
repmgr cluster cleanup	清理集群监控历史

二 PostgreSQL + repmgr高可用集群安装部署

2.1 部署规划

1、服务器角色

主机名	主机IP	角色
172-16-104-7	172.16.104.7	主服务器
172-16-104-56	172.16.104.56	备用服务器
172-16-104-57	172.16.104.57	见证服务器

2、服务部署

主机名	主机IP	PostgreSQL服务	repmgr
172-16-104-7	172.16.104.7	PostgreSQL-11.11	repmgr 5.2.1
172-16-104-56	172.16.104.56	PostgreSQL-11.11	repmgr 5.2.1
172-16-104-57	172.16.104.57	PostgreSQL-11.11	repmgr 5.2.1

2.2 OS基本环境配置

1、hosts文件配置

# cat /etc/hosts
172.16.104.55 172-16-104-55
172.16.104.56 172-16-104-56
172.16.104.57 172-16-104-57
172.16.104.7 172-16-104-7

2、高可用集群服务器免密登陆

复制集群之间各个服务器之间免密连接并安装rsync，笔者在root和postgres用户下都做了免密，高可用集群中每台服务器均需要进行以下操作：

-- root用户下的免密登陆设置
# ssh-keygen
# ssh-copy-id ${其他服务器}
# ssh ${其他服务器}

-- postgres用户下免密登陆设置
# su - postgres
$ ssh-keygen
$ ssh-copy-id ${其他服务器}
$ ssh ${其他服务器}

2.3 PostgreSQL数据库服务安装

在一个全新的PostgreSQL + repmgr高可用集群搭建部署的过程中，只需要在主数据库服务器上对PostgreSQL数据库服务进行初始化即可，其余备服务器数据库可利用repmgr工具进行初始化搭建部署。

1、单实例PostgreSQL数据库搭建部署

文档可参考：https://blog.csdn.net/weixin_37692493/article/details/108500373

2、必要的配置文件修改

1）postgresql.conf 配置文件

$ vim postgresql.conf
max_wal_senders = 10
max_replication_slots = 10
wal_level = 'hot_standby'
hot_standby = on
archive_mode = on
archive_command = '/bin/true'

2）pg_hba.conf 配置文件

repmgr用户为repmgr工具默认使用和创建的数据库用户。

$ vim pg_hba.conf
local   replication   repmgr                              trust
host    replication   repmgr      127.0.0.1/32            trust
host    replication   repmgr      0.0.0.0/0               trust

local   repmgr        repmgr                              trust
host    repmgr        repmgr      127.0.0.1/32            trust
host    repmgr        repmgr      0.0.0.0/0               trust

2.3 repmgr安装部署

1、创建repmgr工具默认使用的数据库管理用户

# su - postgres
$ createuser -s repmgr
$ createdb repmgr -O repmgr

2、软件包安装

repmgr需要安装与PostgreSQL数据库服务相兼容的版本，版本匹配信息如下：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-yPZFwefT-1621433220812)(http://note.youdao.com/yws/res/83046/4FAA615734004440B084AE11F954C59E)]

# yum install flex
# wget -c https://repmgr.org/download/repmgr-5.2.1.tar.gz
# tar xf repmgr-5.2.1.tar.gz -C /usr/local
# ./configure && make install
# mkdir -pv /pg_data/pgsql11/repmgr/

2、repmgr配置文件

$ cat repmgr.conf 
# 基本信息
node_id=1                                       # 节点ID，高可用集群各节点标识
node_name='172-16-104-7'                        # 节点名称，高可用集群各节点名称
conninfo='host=172-16-104-7 user=repmgr dbname=repmgr connect_timeout=2'    # 本节点数据库连接信息
data_directory='/pg_data/pgsql11/data'          # pg数据目录
replication_user='repmgr'                       # 流复制数据库用户，默认使用repmgr
repmgr_bindir='/usr/local/pgsql11/bin'          # repmgr软件目录
pg_bindir='/usr/local/pgsql11/bin'              # pg软件目录
#shutdown_check_timeout=10      

# pg、repmgr服务管理命令
service_start_command='sudo systemctl start postgres11'
service_stop_command='sudo systemctl stop postgres11'
service_restart_command='sudo systemctl restart postgres11'
service_reload_command='sudo systemctl reload postgres11'
repmgrd_service_start_command='sudo systemctl start repmgr11'
repmgrd_service_stop_command='sudo systemctl stop repmgr11'

# 日志管理
log_level=INFO
log_file='/pg_data/pgsql11/repmgr/repmgrd.log'
log_status_interval=10

# failover设置
failover='automatic'
promote_command='/usr/local/pgsql11/bin/repmgr standby promote -f /pg_data/pgsql11/repmgr.conf --log-to-file'
follow_command='/usr/local/pgsql11/bin/repmgr standby follow -f /pg_data/pgsql11/repmgr.conf --log-to-file --upstream-node-id=%n'


# 高可用参数设置
location='location1'
priority=100
monitoring_history=yes
reconnect_interval=5
reconnect_attempts=3
monitor_interval_secs=5
use_replication_slots=true

3、repmgr服务的systemctl启动设置

[Unit]
Description=A replication manager, and failover management tool for PostgreSQL
After=syslog.target
After=network.target
After=postgresql-11.service

[Service]
Type=forking

User=postgres
Group=postgres

# PID file
PIDFile=/pg_data/pgsql11/repmgr/repmgrd-11.pid

# Location of repmgr conf file:
Environment=REPMGRDCONF=/pg_data/pgsql11/repmgr.conf
Environment=PIDFILE=/pg_data/pgsql11/repmgr/repmgrd-11.pid

# Where to send early-startup messages from the server
# This is normally controlled by the global default set by systemd
# StandardOutput=syslog
ExecStart=/usr/local/pgsql11/bin/repmgrd -f ${REPMGRDCONF} -p ${PIDFILE} -d --verbose
ExecStop=/usr/bin/kill -TERM $MAINPID
ExecReload=/usr/bin/kill -HUP $MAINPID

# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=300

[Install]
WantedBy=multi-user.target

三、PostgreSQL + repmgr高可用集群集群部署

前提概要：

高可用集群的各个服务器上安装部署好repmgr软件
主服务器提前安装PostgreSQL数据库并初始化完成并正常启动数据库（primary）

3.1 repmgr注册主服务器节点

1、注册本地服务器为主服务器节点

[postgres@172-16-104-7 pgsql11]$ repmgr -f /pg_data/pgsql11/repmgr.conf primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered

2、查看集群信息

可以看到目前集群中已经添加“172-16-104-7”为主服务器节点。

[postgres@172-16-104-7 pgsql11]$ repmgr -f /pg_data/pgsql11/repmgr.conf cluster show
 ID | Name         | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                            
----+--------------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------
 1  | 172-16-104-7 | primary | * running |          | default  | 100      | 1        | host=172-16-104-7 user=repmgr dbname=repmgr connect_timeout=2

3、数据库查看集群基础信息

repmgr=# \x 1
Expanded display is on.
repmgr=# SELECT * FROM repmgr.nodes;
-[ RECORD 1 ]----+--------------------------------------------------------------
node_id          | 1
upstream_node_id | 
active           | t
node_name        | 172-16-104-7
type             | primary
location         | default
priority         | 100
conninfo         | host=172-16-104-7 user=repmgr dbname=repmgr connect_timeout=2
repluser         | repmgr
slot_name        | 
config_file      | /pg_data/pgsql11/repmgr.conf

3.2 备服务器节点克隆

备服务器节点注册前，不需要对PostgreSQL数据库进行初始化，可通过repmgr工具“一键式”部署。在对备用服务器进行克隆前，我们可以使用以下命令测试数据库连通性：

$ psql 'host=172.16.104.7 user=repmgr dbname=repmgr connect_timeout=2'

1、命令说明

-- dry-run表示命令测试，并不会实际执行，可用于验证是否会出现一些基本错误
$ repmgr -h ${主服务器_ip} -U repmgr -d repmgr -f /pg_data/pgsql11/repmgr.conf standby clone --dry-run

-- 实际执行pg的克隆操作
$ repmgr -h ${主服务器_ip} -U repmgr -d repmgr -f /pg_data/pgsql11/repmgr.conf standby clone

2、部署运行

1）测试运行

[root@172-16-104-56 pgsql11]# su - postgres
上一次登录：二 5月 11 16:42:41 CST 2021pts/1 上
[postgres@172-16-104-55 pgsql11]$ repmgr -h 172-16-104-7 -U repmgr -d repmgr -f /pg_data/pgsql11/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/pg_data/pgsql11/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=172-16-104-7 user=repmgr dbname=repmgr
DETAIL: current installation size is 64 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: all prerequisites for "standby clone" are met

2）备用服务器节点数据克隆

[postgres@172-16-104-56 pgsql11]$ repmgr -h 172-16-104-7 -U repmgr -d repmgr -f /pg_data/pgsql11/repmgr.conf standby clone
NOTICE: destination directory "/pg_data/pgsql11/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=172-16-104-7 user=repmgr dbname=repmgr
DETAIL: current installation size is 64 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
INFO: checking and correcting permissions on existing directory "/pg_data/pgsql11/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  pg_basebackup -l "repmgr base backup"  -D /pg_data/pgsql11/data -h 172-16-104-7 -p 5432 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /pg_data/pgsql11/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

3）启动备用服务器节点standby数据库

[postgres@172-16-104-56 data]$  pg_ctl -D /pg_data/pgsql11/data start
waiting for server to start....2021-05-11 15:54:29.230 CST [10031] 日志:  正在监听IPv4地址"0.0.0.0"，端口 5432
2021-05-11 15:54:29.230 CST [10031] 日志:  正在监听IPv6地址"::"，端口 5432
2021-05-11 15:54:29.234 CST [10031] 日志:  在Unix套接字 "/tmp/.s.PGSQL.5432"上侦听
2021-05-11 15:54:29.267 CST [10031] 日志:  日志输出重定向到日志收集进程
2021-05-11 15:54:29.267 CST [10031] 提示:  后续的日志输出将出现在目录 "/pg_data/pgsql11/logs"中.
 done
server started

3.3 注册本地服务器为备用服务器

1、将standby数据库所在的服务器注册为集群的备用服务器节点。

[postgres@172-16-104-56 data]$ repmgr -f /pg_data/pgsql11/repmgr.conf standby register
INFO: connecting to local node "172-16-104-56" (ID: 3)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
INFO: standby registration complete
NOTICE: standby node "172-16-104-56" (ID: 3) successfully registered

2、查看当前集群信息

[postgres@172-16-104-56 data]$ repmgr -f /pg_data/pgsql11/repmgr.conf cluster show
 ID | Name          | Role    | Status    | Upstream     | Location   | Priority | Timeline | Connection string                                             
----+---------------+---------+-----------+--------------+------------+----------+----------+----------------------------------------------------------------
 1  | 172-16-104-7  | primary | * running |              | location1  | 100      | 1        | host=172-16-104-7 user=repmgr dbname=repmgr connect_timeout=2 
 2  | 172-16-104-56 | standby |   running | 172-16-104-7 | location1  | 100      | 1        | host=172-16-104-56 user=repmgr dbname=repmgr connect_timeout=2

到此，我们pg+repmgr的高可用集群基本完成一半，他们基本拥有了手动switchover的能力。

三思呐三思

关注

7
点赞
踩
32

收藏

觉得还不错? 一键收藏
5
评论
pg高可用之repmgr（一）

文章目录一、repmgr初识1.1 repmgr基本概念1.2 PostgreSQL + repmgr 基本架构1.3 repmgr常用命令二 repmgr安装部署2.1 部署规划2.2 OS基本环境配置2.3 PostgreSQL数据库服务安装2.3 repmgr安装部署一、repmgr初识repmgr是一套开源的PostgreSQL集群管理工具，具有非常轻量级的使用特性。具体表现有以下特点：配置操作简单，可一键式完成相关部署操作；支持Auto Failover和Manual Switchov
复制链接

扫一扫