centos虚拟机基于postgresql15的repmgr主从自动故障切换配置

最新推荐文章于 2024-09-14 08:59:27 发布

你的良心真的真的不会痛吗

最新推荐文章于 2024-09-14 08:59:27 发布

阅读量339

点赞数 5

分类专栏：服务器配置文章标签： centos linux postgresql

本文链接：https://blog.csdn.net/Stringcui/article/details/141053631

版权

服务器配置专栏收录该内容

1 篇文章 0 订阅

订阅专栏

项目场景：

配置centos虚拟机基于postgresql15的repmgr主从自动故障切换，出现的问题及解决方案。

环境说明

hostname	ip	状态
node1	192.168.1.141	主
node2	192.168.1.142	从

1.虚拟机都操作：

安装postgres15


# 获取所需依赖包
wget https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/l/libzstd-1.5.2-1.el7.x86_64.rpm
wget https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/l/llvm5.0-devel-5.0.1-7.el7.x86_64.rpm
wget https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/l/llvm5.0-5.0.1-7.el7.x86_64.rpm
wget https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/l/llvm5.0-libs-5.0.1-7.el7.x86_64.rpm
 
 
yum install -y ./libzstd-1.5.2-1.el7.x86_64.rpm 
yum install -y centos-release-scl-rh llvm5*
yum install -y epel-release


# 安装版本库的RPM
sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

# 安装 PostgreSQL
sudo yum install -y postgresql15-server postgresql15-devel

# 初始化DB
sudo /usr/pgsql-15/bin/postgresql-15-setup initdb

# 开机启动|启动|重启|状态|停止 命令
sudo systemctl enable postgresql-15
sudo systemctl start postgresql-15
sudo systemctl restart postgresql-15
sudo systemctl status postgresql-15
sudo systemctl stop postgresql-15

安装完成后在 /var/lib/pgsql/15 下生成文件夹
在这里插入图片描述

安装repmgr

根据pg版本安装对应版本

curl https://dl.2ndquadrant.com/default/release/get/15/rpm | bash
yum install -y repmgr15

修改postgres用户密码

最好两个服务器密码一样，别忘了。

echo "自己输密码"|passwd --stdin postgres

创建流复制用户

// 创建流复制用户replicator
create user replica replication login connection limit 5 password 'replica';

// 创建repmgr数据库
create database repmgr;

// 创建repmgr用户
create user repmgr with password 'repmgr' superuser login;
alter database repmgr owner to repmgr;

修改pg_hba.conf配置文件

# TYPE  DATABASE        USER            ADDRESS                 METHOD
# "local" is for Unix domain socket connections only
local   all             all                                     peer
# IPv4 local connections:
host    all             all             127.0.0.1/32            scram-sha-256
host    all             all             192.168.1.0/24          md5
# IPv6 local connections:
host    all             all             ::1/128                 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     peer
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256
host    all             all             0.0.0.0/0               scram-sha-256
host    replication     replica         192.168.1.142/32        trust
host    replication     replica         192.168.1.141/32        trust
# all
host    replication     repmgr          192.168.1.141/32        trust
host    replication     repmgr          192.168.1.142/32        trust

修改postgresql.conf配置文件

# basic
listen_addresses = '*'                # 监听所有ip
port = 5432                           # 端口
max_connections = 1000                # 最大连接数
superuser_reserved_connections = 10   # 给超级用户预留的连接数
shared_buffers = 1GB                  # 共享内存，一般设置为内存的1/4
work_mem = 16MB                       # 设置在写入临时磁盘文件之前查询操作(例如排序或哈希表)可使用的最大内存容量
maintenance_work_mem = 256MB          # 在维护性操作（例如VACUUM、CREATE INDEX和ALTER TABLE ADD FOREIGN KEY）中使用的 最大的内存量
timezone = 'Asia/Shanghai'            # 系统时区
hot_standby = on                      # 打开热备

shared_preload_libraries = 'repmgr'
# optimizer
default_statistics_target = 500       # 默认100，ANALYZE在pg_statistic中存储的信息量，增大该值，会增加ANALYZE的时间，但会让解释计划更精准

# wal
max_wal_size = 1GB                    # 建议与shared_buffers保持一致
min_wal_size = 80MB                   # 建议max_wal_size/12.5
wal_log_hints = on                    # 控制WAL日志记录的方式，建议打开
wal_level = replica                   # wal日志写入级别，要使用流复制，必须使用replica或更高级别
wal_sender_timeout = 60s              # 设置WAL发送者在发送WAL数据时等待主服务器响应的超时时间

# archive
archive_mode = on                     # 
archive_command = 'copy "%p" /var/lib/pgsql/15/data/pg_archive/"%f"'   # 归档存储位置，自行修改

# log 近7天轮询
log_destination = 'csvlog'            # 日志格式
logging_collector = on                # 日志收集器
log_directory = 'pg_log'              # 日志目录 $PGDATA/pg_log
log_filename = 'postgresql.%a'        # 7天日志轮询
log_file_mode = 0600                  # 日志文件的权限
log_rotation_size = 0                 # 日志的最大尺寸，设置为零时将禁用基于大小创建新的日志文件
log_truncate_on_rotation = on         # 这个参数将导致PostgreSQL截断（覆盖而不是追加）任何已有的同名日志文件
log_min_duration_statement = 0        # 如果语句运行至少指定的时间量，将导致记录每一个这种完成的语句的持续时间
log_duration = on                     # 每一个完成的语句的持续时间被记录
log_lock_waits = on                   # 控制当一个会话为获得一个锁等到超过deadlock_timeout时，是否要产生一个日志消息
log_statement = 'mod'                 # 控制哪些 SQL 语句被记录。有效值是 none (off)、ddl、mod和 all（所有语句）。ddl记录所有数据定义语句，例如CREATE、ALTER和 DROP语句。mod记录所有ddl语句，外加数据修改语句例如INSERT, UPDATE、DELETE、TRUNCATE, 和COPY FROM
log_timezone = 'Asia/Shanghai'        # 设置在服务器日志中写入的时间戳的时区

# sql
statement_timeout = 300000            # 语句执行超时时间 5分钟
idle_in_transaction_session_timeout = 300000   # 事务空闲超时时间 5分钟
idle_session_timeout = 1800000        # 会话空闲超时时间 30分钟
lock_timeout = 60000                  # 等锁超时时间 1分钟

启动pg15

systemctl start postgresql-15

2.配置服务器ssh远程免密登录

// 每个虚拟机都生成秘钥
ssh-keygen -t rsa
// 都发到主服务器上
ssh-copy-id -i id_rsa.pub -p22  postgres@node1

// 发送给从秘钥
scp authorized_keys node2:$PWD

2.1设置postgres用户权限

如果根据网上ssh远程连接配置了秘钥，并且给与了该用户文件夹700权限，并且authorized_keys 600权限之后还是出现了需要密码验证。

2.2请查看SElinux是否关闭

临时关闭：

运行命令 setenforce 0，设置为permissive模式，不禁止违反安全策略的行为，但会记录在日志中。运行命令 setenforce 1，设置为enforcing模式，禁止违反安全策略的行为。

永久关闭：编辑 /etc/selinux/config 文件，将 SELINUX=enforcing 改为 SELINUX=disabled，保存并重启机器。运行命令 getenforce，验证SELinux状态为 Disabled。

2.3 配置完成后务必测试

使用下面的代码测试，务必保证postgres能免密登录其他虚拟机的postgres，如提示需要密码，请查看上方2.1和2.2是否操作

    // node1切换postgres用户
	su postgres
	//ssh 其他虚拟机的postgres用户
	ssh node2

3.repmgr配置：

3.1主虚拟机操作

vi /etc/repmgr/15/repmgr.conf

修改repmgr.conf配置文件


ssh_options='-q -o ConnectTimeout=10'
node_id=1
node_name='192.168.1.141'
conninfo='user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.141 fallback_application_name=repmgr options=-csearch_path='
data_directory='/var/lib/pgsql/15/data'
pg_bindir='/usr/pgsql-15/bin/'
failover='automatic'
promote_command='/usr/pgsql-15/bin/repmgr standby promote -f /etc/repmgr/15/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-15/bin/repmgr standby follow -f /etc/repmgr/15/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/var/lib/pgsql/15/repmgrd.log'


primary_visibility_consensus = true
standby_disconnect_on_failover = true
repmgrd_service_start_command ='sudo /usr/bin/systemctl start repmgr15.service'
repmgrd_service_stop_command ='sudo /usr/bin/systemctl stop repmgr15.service'
passfile='/home/postgres/.pgpass'

# 高可用参数设置
location='location1'                   # 多数据中心时标记服务器的位置，在故障转移期间用于检查当前主节点的可见性
priority=100                                                         # 节点优先级，选主时可能使用到（lsn > priority > node_id），0代表该节点不会被提升为主节点
monitoring_history=yes                                               # 是否将监控数据写入“monitoring_history”表
reconnect_interval=5                                                 # 故障转移之前，尝试重新连接的间隔（以秒为单位）
reconnect_attempts=3                                                 # 故障转移之前，尝试重新连接的次数
monitor_interval_secs=2
use_replication_slots=true
connection_check_type=ping          # ping: repmgr 使用PQPing() 方法测试连接

//注册主节点
su - postgres -c "/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf primary register"
//查看状态
su - postgres -c "/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show"
//启动standby节点
systemctl start postgresql-15
//启动repmgr15
systemctl start repmgr15

3.2从虚拟机操作

vi /etc/repmgr/15/repmgr.conf

ssh_options='-q -o ConnectTimeout=10'
node_id=2
node_name='192.168.1.142'
conninfo='user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.142 fallback_application_name=repmgr options=-csearch_path='
data_directory='/var/lib/pgsql/15/data'
pg_bindir='/usr/pgsql-15/bin/'
failover='automatic'
promote_command='/usr/pgsql-15/bin/repmgr standby promote -f /etc/repmgr/15/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-15/bin/repmgr standby follow -f /etc/repmgr/15/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/var/lib/pgsql/15/repmgrd.log'


primary_visibility_consensus = true
standby_disconnect_on_failover = true
repmgrd_service_start_command ='sudo /usr/bin/systemctl start repmgr15.service'
repmgrd_service_stop_command ='sudo /usr/bin/systemctl stop repmgr15.service'
passfile='/home/postgres/.pgpass'

# 高可用参数设置
location='location1'                   # 多数据中心时标记服务器的位置，在故障转移期间用于检查当前主节点的可见性
priority=100                                                         # 节点优先级，选主时可能使用到（lsn > priority > node_id），0代表该节点不会被提升为主节点
monitoring_history=yes                                               # 是否将监控数据写入“monitoring_history”表
reconnect_interval=5                                                 # 故障转移之前，尝试重新连接的间隔（以秒为单位）
reconnect_attempts=3                                                 # 故障转移之前，尝试重新连接的次数
monitor_interval_secs=2
use_replication_slots=true
connection_check_type=ping          # ping: repmgr 使用PQPing() 方法测试连接

//注册从节点
su - postgres -c "/usr/pgsql-15/bin/repmgr -h 192.168.1.142 -U repmgr -d repmgr -f /etc/repmgr/15/repmgr.conf standby clone --dry-run" 
//从主节点验证是否能复制数据
su - postgres -c "/usr/pgsql-15/bin/repmgr -h 192.168.1.141 -U repmgr -d repmgr -f /etc/repmgr/15/repmgr.conf standby clone --dry-run" 
//开始复制主data数据为从节点，复制完会生成从节点的standby.signal文件
su - postgres -c "/usr/pgsql-15/bin/repmgr -h 192.168.1.141 -U repmgr -d repmgr -f /etc/repmgr/15/repmgr.conf standby clone" 

//启动standby节点
systemctl start postgresql-15
//启动repmgr15
systemctl start repmgr15

//查看状态
su - postgres -c "/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show"
// 成功状态
 ID | Name          | Role    | Status    | Upstream      | Location  | Priority | Timeline | Connection string                                                                                                     
----+---------------+---------+-----------+---------------+-----------+----------+----------+------------------------------------------------------------------------------------------------------------------------
 1  | 192.168.1.141 | primary | * running |               | location1 | 100      | 12       | user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.141 fallback_application_name=repmgr options=-csearch_path=
 2  | 192.168.1.142 | standby |   running | 192.168.1.141 | location1 | 100      | 12       | user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.142 fallback_application_name=repmgr options=-csearch_path=

4.主从正常运行主从切换：

正常运行时主从切换，进入到从服务器

从服务器

su - postgres -c "/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf standby switchover -U repmgr --verbose" 
//切换后
[root@node2 data]# su - postgres -c "/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show"
 ID | Name          | Role    | Status    | Upstream      | Location  | Priority | Timeline | Connection string                                                                                                     
----+---------------+---------+-----------+---------------+-----------+----------+----------+------------------------------------------------------------------------------------------------------------------------
 1  | 192.168.1.141 | standby |   running | 192.168.1.142 | location1 | 100      | 12       | user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.141 fallback_application_name=repmgr options=-csearch_path=
 2  | 192.168.1.142 | primary | * running |               | location1 | 100      | 13       | user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.142 fallback_application_name=repmgr options=-csearch_path=

5.测试主故障自动将从设置为主：

进入主服务器
关停postgresql-15

systemctl stop postgresql-15

主服务器显示

[root@node1 data]# su - postgres -c "/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show"
 ID | Name          | Role    | Status               | Upstream | Location  | Priority | Timeline | Connection string                                                                                                     
----+---------------+---------+----------------------+----------+-----------+----------+----------+------------------------------------------------------------------------------------------------------------------------
 1  | 192.168.1.141 | primary | * running            |          | location1 | 100      | 14       | user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.141 fallback_application_name=repmgr options=-csearch_path=
 2  | 192.168.1.142 | standby | ! running as primary |          | location1 | 100      | 15       | user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.142 fallback_application_name=repmgr options=-csearch_path=

从服务器显示

[root@node2 data]# su - postgres -c "/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show"
 ID | Name          | Role    | Status    | Upstream | Location  | Priority | Timeline | Connection string                                                                                                     
----+---------------+---------+-----------+----------+-----------+----------+----------+------------------------------------------------------------------------------------------------------------------------
 1  | 192.168.1.141 | primary | ! running |          | location1 | 100      | 14       | user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.141 fallback_application_name=repmgr options=-csearch_path=
 2  | 192.168.1.142 | primary | * running |          | location1 | 100      | 15       | user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.142 fallback_application_name=repmgr options=-csearch_path=

WARNING: following issues were detected
  - node "192.168.1.141" (ID: 1) is running but the repmgr node record is inactive

从服务器142 ip已变更为了primary（主）了
进入主服务器
将141注册为从服务加入到repmgr

// 测试是否成功
su - postgres -c "repmgr node rejoin -d 'host=192.168.1.142 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose --dry-run"

su - postgres -c "repmgr node rejoin -d 'host=192.168.1.142 dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose"

//最后的提示信息为成功
INFO: node "192.168.1.141" (ID: 1) is pingable
INFO: node "192.168.1.141" (ID: 1) has attached to its upstream node
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2

//查看状态，成功
[root@node1 data]# su - postgres -c "/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show"
 ID | Name          | Role    | Status    | Upstream      | Location  | Priority | Timeline | Connection string                                                                                                     
----+---------------+---------+-----------+---------------+-----------+----------+----------+------------------------------------------------------------------------------------------------------------------------
 1  | 192.168.1.141 | standby |   running | 192.168.1.142 | location1 | 100      | 14       | user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.141 fallback_application_name=repmgr options=-csearch_path=
 2  | 192.168.1.142 | primary | * running |               | location1 | 100      | 15       | user=repmgr connect_timeout=2 dbname=repmgr host=192.168.1.142 fallback_application_name=repmgr options=-csearch_path=