一、软件及版本:
CentOS 7.5
PostgreSQL 11
patroni 2.0.1
etcd 3.3.11
二、主机规划
ip | hostname | 占用端口 | vip |
10.109.1.21 | wlan-pg-001 | 5432(pg)、8008(patroni)、etcd(2379传输,2380心跳) | 10.109.1.20 |
10.109.1.22 | wlan-pg-002 | 5432(pg)、8008(patroni)、etcd(2379传输,2380心跳) | |
10.109.1.23 | wlan-pg-003 | 5432(pg)、8008(patroni)、etcd(2379传输,2380心跳) |
pg+patroni+etcd类似mysql+orchestrator+sqllite,都是实现pg\mysql的高可用和自动failover,并通过嵌入在patroni\orchestrator的脚本实现vip的切换
三、pg三节点部署
可参考https://www.yuque.com/sipc/database/dchdcg文档--主从复制搭建部分
SELECT pg_start_backup('for wal backup');
tar cvf data.tar data/
scp data.tar wlan-pg-002:/postgres
scp data.tar wlan-pg-003:/postgres
SELECT pg_stop_backup();
四、三节点安装配置etcd
etcd三节点可以与pg三节点合设,实现etcd的高可用
1、配置/etc/hosts
10.109.1.21 wlan-pg-001 wlan_pg_001
10.109.1.22 wlan-pg-002 wlan_pg_002
10.109.1.23 wlan-pg-003 wlan_pg_003
2、etcd rpm包安装
rpm -ivh etcd-3.3.11-2.el7.centos.x86_64.rpm
mv /etc/etcd/etcd.conf /etc/etcd/etcd.conf.bak
cat /etc/etcd/etcd.conf
ETCD_DATA_DIR="/var/lib/etcd/pg01.etcd"
ETCD_LISTEN_PEER_URLS="http://10.109.1.21:2380"
ETCD_LISTEN_CLIENT_URLS="http://10.109.1.21:2379,http://127.0.0.1:2379"
ETCD_NAME="pg01"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://10.109.1.21:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://10.109.1.21:2379,http://127.0.0.1:2379"
ETCD_INITIAL_CLUSTER="pg01=http://10.109.1.21:2380,pg02=http://10.109.1.22:2380,pg03=http://10.109.1.23:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
标红部分根据主机ip修改
3、修改etcd的service内容
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/etcd/etcd.conf
User=etcd
# set GOMAXPROCS to number of processors
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd \
--name=\"${ETCD_NAME}\" \
--data-dir=\"${ETCD_DATA_DIR}\" \
--listen-peer-urls=\"${ETCD_LISTEN_PEER_URLS}\" \
--listen-client-urls=\"${ETCD_LISTEN_CLIENT_URLS}\" \
--initial-advertise-peer-urls=\"${ETCD_INITIAL_ADVERTISE_PEER_URLS}\" \
--advertise-client-urls=\"${ETCD_ADVERTISE_CLIENT_URLS}\" \
--initial-cluster=\"${ETCD_INITIAL_CLUSTER}\" \
--initial-cluster-token=\"${ETCD_INITIAL_CLUSTER_TOKEN}\" \
--initial-cluster-state=\"${ETCD_INITIAL_CLUSTER_STATE}\""
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
4、启动etcd服务
三个节点同时启动etcd服务
systemctl start etcd
systemctl enable etcd
检查etcd的健康度和成员列表
etcdctl cluster-health
etcdctl member list
五、patroni安装配置
1、在线安装patroni
安装依赖包
yum -y install gcc python-devel
安装pip
cd /tmp
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
利用pip安装patroni包
pip install --upgrade setuptools
pip install psycopg2-binary
pip install patroni[etcd]
修改python权限
[root@wlan-pg-003 python2.7]# cd /usr/lib/python2.7
[root@wlan-pg-003 python2.7]# chmod -Rf 755 site-packages
否则报错:
Jan 04 18:59:32 wlan-pg-003 patroni[1839]: File "/usr/bin/patroni", line 5, in <module>
Jan 04 18:59:32 wlan-pg-003 patroni[1839]: from patroni import main
Jan 04 18:59:32 wlan-pg-003 patroni[1839]: ImportError: No module named patroni
2、配置patroni
patroni配置
mkdir -p /usr/patroni/conf
cd /usr
chown -R postgres.postgres patroni(此步必须做,否则报错配置文件不存在)
Jan 3 22:46:26 wlan-pg-001 patroni: Config is empty.
Jan 3 22:46:26 wlan-pg-001 patroni: usage: patroni [-h] [--version] [--validate-config] [configfile]
Jan 3 22:46:26 wlan-pg-001 patroni: positional arguments:
Jan 3 22:46:26 wlan-pg-001 patroni: configfile Patroni may also read the configuration from the
Jan 3 22:46:26 wlan-pg-001 patroni: PATRONI_CONFIGURATION environment variable
Jan 3 22:46:26 wlan-pg-001 patroni: optional arguments:
Jan 3 22:46:26 wlan-pg-001 patroni: -h, --help show this help message and exit
Jan 3 22:46:26 wlan-pg-001 patroni: --version show program's version number and exit
Jan 3 22:46:26 wlan-pg-001 patroni: --validate-config Run config validator and exit
vi /usr/patroni/conf/patroni_postgresql.yml
scope: ire-pgsql
namespace: /pgsql/
name: wlan_pg_001
restapi:
listen: 10.109.1.21:8008
connect_address: 10.109.1.21:8008
etcd:
host: 10.109.1.21:2379
bootstrap:
# this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
# and all other cluster members will use it as a `global configuration`
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
listen_addresses: "0.0.0.0"
port: 5432
wal_level: logical
hot_standby: "on"
wal_keep_segments: 1000
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
# archive_mode: "on"
# archive_timeout: 1800s
# archive_command: gzip < %p > /data/backup/pgwalarchive/%f.gz
# recovery_conf:
# restore_command: gunzip < /data/backup/pgwalarchive/%f.gz > %p
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.109.1.21:5432
data_dir: /postgres/data
bin_dir: /usr/pgsql-11/bin
# config_dir: /etc/postgresql/10/main
authentication:
replication:
username: repluser
password: repluser
superuser:
username: postgres
password: postgres
#watchdog:
# mode: automatic # Allowed values: off, automatic, required
# device: /dev/watchdog
# safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
标红部分替代本机ip
创建patroni的service
cat /etc/systemd/system/patroni.service
[Unit]
Description=patroni - a high-availability PostgreSQL
Documentation=https://patroni.readthedocs.io/en/latest/index.html
After=syslog.target network.target etcd.target
Wants=network-online.target
[Service]
Type=simple
User=postgres
Group=postgres
PermissionsStartOnly=true
ExecStart=/usr/bin/patroni /usr/patroni/conf/patroni_postgresql.yml
ExecReload=/bin/kill -HUP $MAINPID
LimitNOFILE=65536
KillMode=process
KillSignal=SIGINT
Restart=on-abnormal
RestartSec=30s
TimeoutSec=0
StandardOutput=syslog
StandardError=syslog
[Install]
WantedBy=multi-user.target
配置callback绑定vip
添加以下配置到patroni配置文件 /usr/patroni/conf/patroni_postgresql.yml
callbacks:
on_start: /usr/patroni/conf/patroni_callback.sh
on_stop: /usr/patroni/conf/patroni_callback.sh
on_role_change: /usr/patroni/conf/patroni_callback.sh
创建patroni_callback.sh脚本:
#!/bin/bash
readonly cb_name=$1
readonly role=$2
readonly scope=$3
VIP=10.109.1.20
VIPBRD=10.109.1.255
VIPNETMASK=255.255.255.0
VIPNETMASKBIT=24
VIPDEV=eth0
VIPLABEL=1
function usage() {
echo "Usage: $0 <on_start|on_stop|on_role_change> <role> <scope>";
exit 1;
}
function addvip(){
echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` INFO: /sbin/ip addr add ${VIP}/${VIPNETMASKBIT} brd ${VIPBRD} dev ${VIPDEV} label ${VIPDEV}:${VIPLABEL}"
sudo /sbin/ip addr add ${VIP}/${VIPNETMASKBIT} brd ${VIPBRD} dev ${VIPDEV} label ${VIPDEV}:${VIPLABEL}
sudo /usr/sbin/arping -q -A -c 1 -I ${VIPDEV} ${VIP}
#sudo /sbin/iptables -F
}
function delvip(){
echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` INFO: sudo /sbin/ip addr del ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV} label ${VIPDEV}:${VIPLABEL}"
sudo /sbin/ip addr del ${VIP}/${VIPNETMASKBIT} dev ${VIPDEV} label ${VIPDEV}:${VIPLABEL}
sudo /usr/sbin/arping -q -A -c 1 -I ${VIPDEV} ${VIP}
#sudo /sbin/iptables -F
}
echo "`date +%Y-%m-%d\ %H:%M:%S,%3N` WARNING: patroni callback $cb_name $role $scope"
case $cb_name in
on_stop)
delvip
;;
on_start)
;;
on_role_change)
if [[ $role == 'master' ]]; then
addvip
elif [[ $role == 'slave' ]]||[[ $role == 'replica' ]]||[[ $role == 'logical' ]]; then
delvip
fi
;;
*)
usage
;;
esac
加粗的部分根据网卡和vip情况配置
给postgres配置sudo权限
因为postgres用户需要起ip等,需要sudo权限才行
visudo
postgres ALL=(ALL) NOPASSWD: ALL
创建复制槽
patroni的切换需要复制槽来保证故障转移,创建命令如下:
select pg_create_physical_replication_slot('wlan_pg_001');
select pg_create_physical_replication_slot('wlan_pg_002');
select pg_create_physical_replication_slot('wlan_pg_003');
查看复制槽状态:
select * from pg_replication_slots;
select * from pg_stat_replication;
3、三节点依次启动patroni
systemctl start patroni
systemctl enable patroni
systemctl status patroni
4、查看当前集群状态并测试
patronictl -c /usr/patroni/conf/patroni_postgresql.yml list
select client_addr,sync_state from pg_stat_replication;
switchover:
patronictl -c /usr/patroni/conf/patroni_postgresql.yml switchover
可以看出leader节点已经被替换成wlan_pg_002
pg的主节点也切成了wlan_pg_002
此时做一下failover测试
当前的leader节点wlan_pg_002停patroni服务,systemctl stop patroni,会发现postgres进程也被停掉,同时leader和postgre主节点易主,vip也切到新leader上
继续停wlan-pg-001的patroni服务,发现leader变为wlan-pg-003,vip漂移,且postgres数据库仍可访问
wlan-pg-001和wlan-pg-002重新起patroni服务,恢复集群,主节点不切换
systemctl start patroni