Patroni 是一个流行的开源工具,它用于自动化PostgreSQL的设置和维护,实现高可用性。它与 etcd(一种分布式键值存储系统,通常用于配置共享和服务发现)协同工作,创建一个可靠的环境,确保PostgreSQL数据库在主节点故障时能够持续提供服务。
架构描述
在一个典型的 Patroni + etcd 的高可用架构中,会有以下组件:
-
Patroni:
-
每个PostgreSQL实例都配有一个Patroni代理。Patroni负责进行健康检查、故障转移和配置更改。
-
-
etcd集群:
-
作为分布式系统的核心,存储所有Patroni实例的状态信息,包括当前的主数据库节点和配置参数。
-
架构原理
-
启动:
-
当PostgreSQL实例启动时,Patroni将从etcd中获取当前的集群状态,并根据这个状态对数据库进行初始化。
-
-
领导选举:
-
如果主数据库不可用,Patroni将触发领导选举过程,etcd中存储的领导选举数据将帮助确定哪个从节点应该被提升为新的主节点。
-
-
健康检查和故障转移:
-
Patroni定期检查每个PostgreSQL实例的健康状况。如果主节点出现问题,Patroni会自动进行故障转移,将从节点提升为新的主节点,并在etcd中更新状态。
-
-
配置更改:
-
管理员可以通过etcd对集群进行配置更改,Patroni会自动将这些更改应用到所有PostgreSQL实例。
-
-
服务发现:
-
应用程序使用etcd中的信息来发现当前的主数据库实例,并连接到它进行读写操作。
-
优势
-
自动故障转移:在主节点出现故障时自动进行故障转移,减少了系统的停机时间。 -
简化管理:Patroni简化了数据库集群的管理,包括自动故障恢复、备份和恢复。 -
分布式系统:etcd的分布式特性确保了集群状态的一致性和可靠性。
Patroni + etcd 架构提供了一个强大的解决方案,用于维护PostgreSQL数据库的高可用性,它适合于需要确保持续运行的关键业务应用。
环境准备
下载安装包及其依赖包到指定目录下
mkdir /root/python3-rpm
sudo yum install python3 python3-devel --downloadonly --downloaddir=/root/python3-rpm
[root@node3 python3-rpm]# cd /root/python3-rpm/
[root@node3 python3-rpm]# ls -lrt
total 9956
-rw-r--r-- 1 root root 101080 Jul 4 2014 dwz-0.11-3.el7.x86_64.rpm
-rw-r--r-- 1 root root 4724 Jul 4 2014 perl-srpm-macros-1-8.el7.noarch.rpm
-rw-r--r-- 1 root root 20044 Aug 23 2019 python3-rpm-generators-6-2.el7.noarch.rpm
-rw-r--r-- 1 root root 644052 Aug 23 2019 python3-setuptools-39.2.0-10.el7.noarch.rpm
-rw-r--r-- 1 root root 83048 Aug 23 2019 redhat-rpm-config-9.1.0-88.el7.centos.noarch.rpm
-rw-r--r-- 1 root root 1702324 Oct 15 2020 python3-pip-9.0.3-8.el7.noarch.rpm
-rw-r--r-- 1 root root 8252 Oct 15 2020 python3-rpm-macros-3-34.el7.noarch.rpm
-rw-r--r-- 1 root root 9368 Oct 15 2020 python-rpm-macros-3-34.el7.noarch.rpm
-rw-r--r-- 1 root root 8960 Oct 15 2020 python-srpm-macros-3-34.el7.noarch.rpm
-rw-r--r-- 1 root root 72036 Jun 26 19:57 python3-3.6.8-19.el7_9.x86_64.rpm
-rw-r--r-- 1 root root 222212 Jun 26 19:57 python3-devel-3.6.8-19.el7_9.x86_64.rpm
-rw-r--r-- 1 root root 7287572 Jun 26 19:57 python3-libs-3.6.8-19.el7_9.x86_64.rpm
下载python3
yum install python3 python3-devel -y
[root@node3 python3-rpm]# sudo yum history list python3
Loaded plugins: fastestmirror, langpacks
ID | Command line | Date and time | Action(s) | Altered
-------------------------------------------------------------------------------
10 | install python3 python3- | 2023-12-01 20:06 | Install | 12
history list
打包 patroni[etcd]
mkdir /root/patroni_etcd_2.1.1
pip3 install patroni[etcd] -i https://mirrors.aliyun.com/pypi/simple/ --download /root/patroni_etcd_2.1.1
[root@node3 patroni_etcd_2.1.1]# cd /root/patroni_etcd_2.1.1
[root@node3 patroni_etcd_2.1.1]# ls -lrt
total 1952
-rw-r--r-- 1 root root 316464 Dec 1 20:09 patroni-3.2.0-py3-none-any.whl
-rw-r--r-- 1 root root 143835 Dec 1 20:09 urllib3-1.26.18-py2.py3-none-any.whl
-rw-r--r-- 1 root root 125201 Dec 1 20:09 PyYAML-6.0.1.tar.gz
-rw-r--r-- 1 root root 24774 Dec 1 20:09 prettytable-2.5.0-py3-none-any.whl
-rw-r--r-- 1 root root 97486 Dec 1 20:09 click-8.0.4-py3-none-any.whl
-rw-r--r-- 1 root root 496866 Dec 1 20:09 psutil-5.9.6.tar.gz
-rw-r--r-- 1 root root 42808 Dec 1 20:09 ydiff-1.2.tar.gz
-rw-r--r-- 1 root root 247702 Dec 1 20:09 python_dateutil-2.8.2-py2.py3-none-any.whl
-rw-r--r-- 1 root root 37270 Dec 1 20:09 python-etcd-0.4.5.tar.gz
-rw-r--r-- 1 root root 102325 Dec 1 20:09 wcwidth-0.2.9-py2.py3-none-any.whl
-rw-r--r-- 1 root root 17978 Dec 1 20:09 importlib_metadata-4.8.3-py3-none-any.whl
-rw-r--r-- 1 root root 11053 Dec 1 20:09 six-1.16.0-py2.py3-none-any.whl
-rw-r--r-- 1 root root 269084 Dec 1 20:09 dnspython-2.2.1-py3-none-any.whl
-rw-r--r-- 1 root root 5313 Dec 1 20:09 zipp-3.6.0-py3-none-any.whl
-rw-r--r-- 1 root root 26844 Dec 1 20:09 typing_extensions-4.1.1-py3-none-any.whl
[root@node3 ~]# tar -zcvf patroni_etcd_2.1.1.tar.gz patroni_etcd_2.1.1
PostgreSQL + replication 部署
1、安装依赖包
yum -y install readline-devel zlib-devel gettext-devel openssl openssl-devel pam pam-devel libxml2 libxml2-devel libxslt libxslt-devel perl perl-devel tcl-devel uuid-devel gcc gcc-c++ make perl-ExtUtils*
2、所有节点关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl stop NetworkManager.service
systemctl disable NetworkManager.service
sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
setenforce 0
3、设置/etc/hosts
10.211.55.13 node1
10.211.55.14 node2
10.211.55.16 node3
4、内核参数调整
vi /etc/sysctl.conf
fs.file-max = 76724200
kernel.sem = 10000 10240000 10000 102
kernel.shmmni = 4096
kernel.shmall = 253702
kernel.shmmax = 1222191360
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 2621
net.core.wmem_default = 2621
net.core.rmem_max = 419430
net.core.wmem_max = 1048576
fs.aio-max-nr = 40960000
vm.dirty_ratio=20
vm.dirty_background_ratio=3
vm.dirty_writeback_centisecs=100
vm.dirty_expire_centisecs=500
vm.swappiness=0
vm.min_free_kbytes=524288
vm.swappiness=0
vm.overcommit_memory=2
vm.overcommit_ratio=75
net.ipv4.ip_local_port_range = 10000 65535
sysctl -p
5、用户limits设置
vi /etc/security/limits.conf
xyc soft nofile 1048576
xyc hard nofile 1048576
xyc soft nproc 131072
xyc hard nproc 131072
xyc soft memlock unlimited
xyc hard memlock unlimited
xyc soft core unlimited
xyc hard core unlimited
xyc soft stack unlimited
xyc hard stack unlimited
6、创建用户和组
groupadd dba -g 2000
useradd xyc --gid 2000 --uid 2000 --create-home
echo "Enmo@2023"|passwd xyc --stdin
7、创建安装目录\数据目录\WAL目录\归档目录
mkdir -p /opt/pg12
mkdir -p /opt/data
mkdir -p /opt/wal
mkdir -p /opt/archive
chown -R xyc:dba /soft /opt
chmod 0700 /opt/data /opt/wal /opt/archive
7、编译安装,所有节点安装PostgreSQL
PostgreSQL的官方网站下载:++https://www.postgresql.org/++
下载文件并解压
# su - xyc
$ cd /soft
tar -xvf postgresql-12.15.tar.bz2
编译
cd postgresql-12.15/
./configure --prefix=/opt/pg12 --with-openssl --with-pgport=5432
安装
使用gmake 或者gmake world
$ gmake world
当看到最后一行显示为:
PostgreSQL, contrib, and documentation successfully made. Ready to install.
说明已经编译成功
使用gmake install或者gmake install-world 进行安装
$ gmake install-world //包含扩展包和文档
当看到最后一行显示为:
PostgreSQL, contrib, and documentation installation complete.
说明已经安装成功
查看版本
$ /opt/pg12/bin/postgres --version
postgres (PostgreSQL) 12.1
8、设置软链接
# cd /opt
# ln -s /opt/pg12 /opt/pgsql
创建一个软链接指向当前版本,当进行版本变更后,不需要调整调用脚本,只需要修改这个软链接即可,后面都会使用这个软链接。
主库初始化数据目录
/opt/pgsql/bin/initdb -D /opt/data -X /opt/wal -EUTF8 -Uxyc -W
9、修改数据库参数
$ vi /opt/data/postgresql.conf
listen_addresses='*'
port=5432
cluster_name='pg200'
max_connections=200
unix_socket_directories='/opt/data'
unix_socket_group = 'dba'
unix_socket_permissions = 0700
shared_buffers=256MB
work_mem=50MB
wal_level=logical
archive_mode=always
archive_command='cp %p /opt/archive/%f'
min_wal_size=1GB
max_wal_size=2GB
wal_keep_segments=128
max_wal_senders=10
hot_standby=on
logging_collector=on
log_directory='pg_log'
log_destination=csvlog
log_filename='pg_log_%u.log'
log_file_mode=0600
log_truncate_on_rotation=on
log_rotation_age=1d
log_rotation_size=100MB
log_min_messages=warning
log_min_duration_statement=30
log_checkpoints=on
log_connections=on
log_duration=on
log_lock_waits=on
log_statement='DDL'
10、配置环境变量
$ vi ~/.bashrc
export PGPORT=5432
export PGUSER=xyc
export PGHOME=/opt/pgsql
export PGDATA=/opt/data
export PGHOST=127.0.0.01
export PATH=$PGHOME/bin:$PATH
export LD_LIBRARY_PATH=$PGHOME/lib:/usr/local/lib:/usr/local/lib64:/usr/lib64
11、手工创建日志目录
$ mkdir $PGDATA/log
$ chmod 700 $PGDATA/log
$ touch $PGDATA/log/startup.log
12.手工启停数据库
查看数据库运行状态
$ pg_ctl -D /opt/data status
启动数据库
$ pg_ctl -D /opt/data start &
停止数据库
$ pg_ctl -D /opt/data stop
查看实例进程
$ ps f -u xyc
PID TTY STAT TIME COMMAND
1323 pts/0 S 0:00 -ba
1456 pts/0 R+ 0:00 \_ ps f -u postgr
1058 ? Ss 0:00 /opt/pg12/bin/postgres -D /opt/data6000
1068 ? Ss 0:00 \_ postgres: logger
1070 ? Ss 0:00 \_ postgres: checkpointer
1071 ? Ss 0:00 \_ postgres: background writer
1072 ? Ss 0:00 \_ postgres: walwriter
1073 ? Ss 0:00 \_ postgres: autovacuum launcher
1074 ? Ss 0:00 \_ postgres: archiver last was 000000010000000000000008
1075 ? Ss 0:00 \_ postgres: stats collector
1076 ? Ss 0:00 \_ postgres: logical replication launche
12、主节点配置PostgreSQL的访问策略文
vi $PGDATA/pg_hba.conf
#修改为如下:
host all all 0.0.0.0/00 md5
13、配置systemctl
# vi /usr/lib/systemd/system/postgresql-12.service
[Unit]
Description=PostgreSQL 12 database server
After=syslog.target network.target
[Service]
Type=forking
TimeoutSec=120
User=xyc
Environment=PGDATA=/opt/data
ExecStart=/opt/pgsql/bin/pg_ctl start -w -D /opt/data -l /opt/data/log/startup.log
ExecStop=/opt/pgsql/bin/pg_ctl stop -m fast -w -D /opt/data
ExecReload=/opt/pgsql/bin/pg_ctl reload -D /opt/data
[Install]
WantedBy=multi-user.target
通过systemctl启停服务
systemctl daemon-reload
systemctl start postgresql-12
systemctl stop postgresql-12
systemctl reload postgresql-12
systemctl restart postgresql-12
14、主节点安装 pg_stat_statements
cd /soft/postgresql-12.15/contrib/pg_stat_statements/
make && make install
vi $PGDATA/postgresql.conf
# pg_stat_statements
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = top
pg_stat_statements.track_utility = true
pg_stat_statements.save = true
pg_ctl restart -D /opt/data/
# 进入 PostgreSQL 数据库安装扩展插件
[2023-11-13 14:29:38] xyc@node1 ~ $ psql -U xyc postgres
psql (12.15)
Type "help" for help.
[2023-11-13 14:29:47] xyc@postgres=# create extension pg_stat_statements;
CREATE EXTENSION
# 确认安装成功
[2023-11-13 14:29:52] xyc@postgres=# select * from pg_stat_statements;
userid | dbid | queryid | query | calls | total_time | min_time | max_time | mean_time | stddev_time | rows | share
d_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | temp_blks
_read | temp_blks_written | blk_read_time | blk_write_time
--------+-------+----------------------+-------------------------------------+-------
10 | 13593 | -1739183385080879393 | create extension pg_stat_statements | 1 | 41.783084 | 41.783084 | 41.783084 | 41.783084 | 0 | 0 |
857 | 139 | 56 | 2 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0
(1 row)
15、部署 replication 同步两个备库
# 主库上创建用于流复制的用户
[2023-11-13 14:32:50] xyc@postgres=# create user replica WITH REPLICATION ENCRYPTED PASSWORD 'replica';
CREATE ROLE
# 配置主库允许接受流复制的连接
[xyc@node1 ~]$ vi $PGDATA/pg_hba.conf
host replication replica 10.211.55.0/24 md5
[xyc@node1 ~]# systemctl reload postgres-13.service
# 建议所有节点配置密码文件
[xyc@node1 ~]$cat >> ~/.pgpass << EOF
# hostname:port:database:username:password
10.211.55.13:5432:replication:replica:replica
10.211.55.14:5432:replication:replica:replica
10.211.55.16:5432:replication:replica:replica
EOF
[xyc@node1 ~]$ chmod 0600 .pgpass
# 所有备库节点执行pg_basebackup命令初始化数据库
su - xyc
pg_basebackup -h 10.211.55.13 -p 5432 -U replica -D $PGDATA -Fp -P -X stream -R -v -l replica_20211016
# 所有备库节点创建PostgreSQL服务postgres-13.service,同主库一样,启动所有备库
pg_ctl start -D /opt/data
# 在主库上查询主备同步状态
select pid,state,client_addr,sync_priority,sync_state from pg_stat_replication;
# 主库创建测试表,插入数据
create table test_1 (id int4,create_time timestamp(0) without time zone);
insert into test_1 values (1,now());
# 备库查询测试表,确认是否同步成功
select * from test_1;
Etcd 部署和管理
Etcd下载地址: https://mirrors.huaweicloud.com/etcd/v3.5.1/ 文件名: etcd-v3.5.1-linux-amd64.tar.gz Etcd官方文档: https://etcd.io/docs/v3.5/
1、Etcd安装
# 解压目录,创建软链接
mkdir /app
tar -zxvf etcd-v3.5.1-linux-amd64.tar.gz -C /app
mv /app/etcd-v3.5.1-linux-amd64 /app/etcd-v3.5.1
ln -s /app/etcd-v3.5.1 /app/etcd
# 配置环境变量
cat >> /etc/profile << EOF
export PATH=/app/etcd:$PATH
EOF
source /etc/profile
2、所有节点创建 etcd 启动脚本
主节点创建 etcd 启动脚本
vi /app/etcd/start_etcd.sh
/app/etcd/etcd --data-dir=data.etcd \
--name etcd_pgtest01 \
--listen-peer-urls http://10.211.55.13:2380 \
--listen-client-urls http://10.211.55.13:2379,http://127.0.0.1:2379 \
--initial-advertise-peer-urls http://10.211.55.13:2380 \
--advertise-client-urls http://10.211.55.13:2379 \
--initial-cluster-token etcd-cluster-pgtest \
--initial-cluster etcd_pgtest01=http://10.211.55.13:2380,etcd_pgtest02=http://10.211.55.14:2380,etcd_pgtest03=http://10.211.55.16:2380 \
--initial-cluster-state new \
--enable-v2
备节点创建 etcd 启动脚本
备节点1创建 etcd 启动脚本
vi /app/etcd/start_etcd.sh
/app/etcd/etcd --data-dir=data.etcd \
--name etcd_pgtest02 \
--listen-peer-urls http://10.211.55.14:2380 \
--listen-client-urls http://10.211.55.14:2379,http://127.0.0.1:2379 \
--initial-advertise-peer-urls http://10.211.55.14:2380 \
--advertise-client-urls http://10.211.55.14:2379 \
--initial-cluster-token etcd-cluster-pgtest \
--initial-cluster etcd_pgtest01=http://10.211.55.13:2380,etcd_pgtest02=http://10.211.55.14:2380,etcd_pgtest03=http://10.211.55.16:2380 \
--initial-cluster-state new \
--enable-v2
# 备节点2创建 etcd 启动脚本
vi /app/etcd/start_etcd.sh
/app/etcd/etcd --data-dir=data.etcd \
--name etcd_pgtest03 \
--listen-peer-urls http://10.211.55.16:2380 \
--listen-client-urls http://10.211.55.16:2379,http://127.0.0.1:2379 \
--initial-advertise-peer-urls http://10.211.55.16:2380 \
--advertise-client-urls http://10.211.55.16:2379 \
--initial-cluster-token etcd-cluster-pgtest \
--initial-cluster etcd_pgtest01=http://10.211.55.13:2380,etcd_pgtest02=http://10.211.55.14:2380,etcd_pgtest03=http://10.211.55.16:2380 \
--initial-cluster-state new \
--enable-v2
有节点配置etcd启动脚本可执行权限 chmod +x /app/etcd/start_etcd.sh
3、所有节点配置etcd服务并启动
# 配置服务
# vi /usr/lib/systemd/system/etcd.service
[Unit]
Description=etcd
After=network.target remote-fs.target nss-lookup.target
[Service]
Type=forking
ExecStart=/bin/bash -c "/app/etcd/start_etcd.sh >> /app/etcd/start_etcd.log 2>&1 &"
ExecStop=/usr/bin/killall start_etcd
[Install]
WantedBy=multi-user.target
# 启动服务
systemctl daemon-reload
systemctl start etcd.service
systemctl enable etcd.service
systemctl status etcd
systemctl stop etcd.service
4、检查etcd集群状态
# 检查集群节点的状态
[root@node1 ~]# etcdctl endpoint status --cluster -w table
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://10.211.55.14:2379 | 7da7b3e2fde23974 | 3.5.1 | 20 kB | false | false | 2 | 8 | 8 | |
| http://10.211.55.13:2379 | b25aafbf34512a07 | 3.5.1 | 20 kB | true | false | 2 | 8 | 8 | |
| http://10.211.55.16:2379 | bf7cf4fbdf4fd62b | 3.5.1 | 20 kB | false | false | 2 | 8 | 8 | |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# 列出集群中的所有成员
[root@node1 ~]# etcdctl member list -w table
+------------------+---------+---------------+--------------------------+--------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+---------------+--------------------------+--------------------------+------------+
| 7da7b3e2fde23974 | started | etcd_pgtest02 | http://10.211.55.14:2380 | http://10.211.55.14:2379 | false |
| b25aafbf34512a07 | started | etcd_pgtest01 | http://10.211.55.13:2380 | http://10.211.55.13:2379 | false |
| bf7cf4fbdf4fd62b | started | etcd_pgtest03 | http://10.211.55.16:2380 | http://10.211.55.16:2379 | false |
+------------------+---------+---------------+--------------------------+--------------------------+------------+
5、检查集群节点健康状况
[root@node1 ~]# etcdctl endpoint health --cluster -w table
+--------------------------+--------+------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+--------------------------+--------+------------+-------+
| http://10.211.55.13:2379 | true | 2.125253ms | |
| http://10.211.55.16:2379 | true | 2.130828ms | |
| http://10.211.55.14:2379 | true | 2.252558ms | |
+--------------------------+--------+------------+-------+
Patroni 部署和管理
Patroni 基于 Python 开发的模板,需要运行在 Python 环境下。
1、安装python3
yum install python3 python3-devel -y
[root@node1 ~]# rm -f /usr/bin/python
[root@node1 ~]# ln -s /usr/bin/python3 /usr/bin/python
# python3安装后会报错修改yum的配置
[root@node1 ~]# sed -i "s:\<python\>:python2:g" /usr/bin/yum
[root@node1 ~]# sed -i "s:\<python\>:python2:g" /usr/libexec/urlgrabber-ext-down
2、所有节点使用pip3安装patroni
pip3 install psutil-5.8.0.tar.gz
pip3 install ydiff-1.2.tar.gz
pip3 install *.whl
pip3 install python-etcd-0.4.5.tar.gz
pip3 install psycopg2-binary-2.9.1.tar.gz
pip3 install psycopg2-2.9.1.tar.gz
3、所有节点配置patroni的参数文件
创建参数文件和日志文件的存放路径
mkdir /app/patroni
主节点创建文件 patroni_config.yml
vi /app/patroni/patroni_config.yml
scope: pg_cluster
namespace: /service
name: pgtest1
log:
level: INFO
traceback_level: ERROR
dir: /app/patroni
file_num: 10
file_size: 104857600
restapi:
listen: 10.211.55.13:8008
connect_address: 10.211.55.13:8008
etcd:
host: 10.211.55.13:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
max_timelines_history: 0
master_start_timeout: 300
master_stop_timeout: 0
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 3000
superuser_reserved_connections: 100
max_locks_per_transaction: 64
max_worker_processes: 2
max_prepared_transactions: 0
wal_level: logical
wal_log_hints: on
track_commit_timestamp: off
max_wal_senders: 10
max_replication_slots: 10
wal_keep_size: 4096MB # wal_keep_size = wal_keep_segments * wal_segment_size = 128 * 32MB
hot_standby: "on"
listen_addresses: "*"
port: 5432
cluster_name: "pg_cluster"
archive_mode: on
archive_command: "cp %p /opt/archive/%f"
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.211.55.13:5432
data_dir: /opt/data
pgpass: /home/xyc/.pgpass
pg_ctl_timeout: 60
use_pg_rewind: true
remove_data_directory_on_rewind_failure: false
remove_data_directory_on_diverged_timelines: true
authentication:
replication:
username: replica
password: replica
superuser:
username: xyc
password: Enmo@2022
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
备节点1创建文件 patroni_config.yml
vi /app/patroni/patroni_config.yml
scope: pg_cluster
namespace: /service
name: pgtest2
log:
level: INFO
traceback_level: ERROR
dir: /app/patroni
file_num: 10
file_size: 104857600
restapi:
listen: 10.211.55.14:8008
connect_address: 10.211.55.14:8008
etcd:
host: 10.211.55.14:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
max_timelines_history: 0
master_start_timeout: 300
master_stop_timeout: 0
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 3000
superuser_reserved_connections: 100
max_locks_per_transaction: 64
max_worker_processes: 2
max_prepared_transactions: 0
wal_level: logical
wal_log_hints: on
track_commit_timestamp: off
max_wal_senders: 10
max_replication_slots: 10
wal_keep_size: 4096MB # wal_keep_size = wal_keep_segments * wal_segment_size = 128 * 32MB
hot_standby: "on"
listen_addresses: "*"
port: 5432
cluster_name: "pg_cluster"
archive_mode: on
archive_command: "cp %p /opt/archive/%f"
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.211.55.14:5432
data_dir: /opt/data
pgpass: /home/xyc/.pgpass
pg_ctl_timeout: 60
use_pg_rewind: true
remove_data_directory_on_rewind_failure: false
remove_data_directory_on_diverged_timelines: true
authentication:
replication:
username: replica
password: replica
superuser:
username: xyc
password: Enmo@2022
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
备节点2创建文件 patroni_config.yml
vi /app/patroni/patroni_config.yml
scope: pg_cluster
namespace: /service
name: pgtest3
log:
level: INFO
traceback_level: ERROR
dir: /app/patroni
file_num: 10
file_size: 104857600
restapi:
listen: 10.211.55.16:8008
connect_address: 10.211.55.16:8008
etcd:
host: 10.211.55.16:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
max_timelines_history: 0
master_start_timeout: 300
master_stop_timeout: 0
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 3000
superuser_reserved_connections: 100
max_locks_per_transaction: 64
max_worker_processes: 2
max_prepared_transactions: 0
wal_level: logical
wal_log_hints: on
track_commit_timestamp: off
max_wal_senders: 10
max_replication_slots: 10
wal_keep_size: 4096MB # wal_keep_size = wal_keep_segments * wal_segment_size = 128 * 32MB
hot_standby: "on"
listen_addresses: "*"
port: 5432
cluster_name: "pg_cluster"
archive_mode: on
archive_command: "cp %p /opt/archive/%f"
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.211.55.16:5432
data_dir: /opt/data
pgpass: /home/xyc/.pgpass
pg_ctl_timeout: 60
use_pg_rewind: true
remove_data_directory_on_rewind_failure: false
remove_data_directory_on_diverged_timelines: true
authentication:
replication:
username: replica
password: replica
superuser:
username: xyc
password: Enmo@2022
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
所有节点创建 patroni 服务并启动
vi /usr/lib/systemd/system/patroni.service
[Unit]
Description=patroni
After=network.target remote-fs.target nss-lookup.target etcd.service
Requires=etcd.service
[Service]
Type=forking
User=xyc
Group=dba
Environment="PGHOME=/opt/pgsql"
Environment="PGDATA=/opt/data"
Environment="PGPORT=5432"
Environment="LD_LIBRARY_PATH=/opt/pgsql/lib"
Environment="PATH=/opt/pgsql/bin:/usr/local/bin"
ExecStart=/bin/bash -c "patroni /app/patroni/patroni_config.yml >> /app/patroni/patroni.log 2>&1 &"
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/usr/bin/killall patroni
KillMode=process
TimeoutSec=30
Restart=no
[Install]
WantedBy=multi-user.target
# 启动服务
# systemctl daemon-reload
# systemctl restart patroni.service
# systemctl enable patroni.service
# systemctl status patroni.service
4、所有节点设置patronictl别名,方便维护
cat >> /etc/profile << EOF
alias patronictl='patronictl -c /app/patroni/patroni_config.yml'
EOF
source /etc/profile
patronictl list
5、patronictl list
[root@node1 ~]# patronictl list
+---------+--------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7301011775213054418) -+----+-----------+
| pgtest1 | 10.211.55.13 | Leader | running | 6 | |
| pgtest2 | 10.211.55.14 | Replica | running | 6 | 0 |
| pgtest3 | 10.211.55.16 | Replica | running | 6 | 0 |
+---------+--------------+---------+---------+----+-----------+
6、Switchover方法一
[root@node1 ~]# curl -s http://10.211.55.13:8008/failover -XPOST -d '{"candidate":"pgtest2"}'
[root@node1 ~]# patronictl list
+---------+--------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7301011775213054418) -+----+-----------+
| pgtest1 | 10.211.55.13 | Replica | running | 7 | 0 |
| pgtest2 | 10.211.55.14 | Leader | running | 7 | |
| pgtest3 | 10.211.55.16 | Replica | running | 7 | 0 |
+---------+--------------+---------+---------+----+-----------+
7、Switchover方法二
[root@node2 ~]# patronictl switchover
Master [pgtest2]:
Candidate ['pgtest1', 'pgtest3'] []: pgtest1
When should the switchover take place (e.g. 2023-11-14T10:39 ) [now]:
Current cluster topology
+---------+--------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7301011775213054418) -+----+-----------+
| pgtest1 | 10.211.55.13 | Replica | running | 7 | 0 |
| pgtest2 | 10.211.55.14 | Leader | running | 7 | |
| pgtest3 | 10.211.55.16 | Replica | running | 7 | 0 |
+---------+--------------+---------+---------+----+-----------+
Are you sure you want to switchover cluster pg_cluster, demoting current master pgtest2? [y/N]: y
2023-11-14 09:39:12.97581 Successfully switched over to "pgtest1"
+---------+--------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7301011775213054418) -+----+-----------+
| pgtest1 | 10.211.55.13 | Leader | running | 7 | |
| pgtest2 | 10.211.55.14 | Replica | stopped | | unknown |
| pgtest3 | 10.211.55.16 | Replica | running | 7 | 0 |
+---------+--------------+---------+---------+----+-----------+
[root@node2 ~]# patronictl list
+---------+--------------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+ Cluster: pg_cluster (7301011775213054418) -+----+-----------+
| pgtest1 | 10.211.55.13 | Leader | running | 8 | |
| pgtest2 | 10.211.55.14 | Replica | running | 8 | 0 |
| pgtest3 | 10.211.55.16 | Replica | running | 8 | 0 |
+---------+--------------+---------+---------+----+-----------+
总结
通过以上step by step的操作,我相信各位少侠都能玩PostgreSQL高可用之Patroni + etcd架构。
强烈建议收藏!
作者介绍:云和恩墨资深Oracle dba,有12年左右的金融、保险、政府、地税、运营商等业务关键型系统的运维经验,曾担任公司异常恢复东区接口人,负责紧急异常恢复工作,技术二线专家。目前负责PG、openGauss/MogDB运维、国产化MogDB数据库的推广工作
本文由 mdnice 多平台发布