既然要做生产环境,高可用、读写分离,咱都得怼上去,不然不够装逼。
一、zabbix—使用pgpool4.5搭建pgsql16高可用集群-CSDN博客
二、zabbix—zabbix7 PRE-RELEASE单机搭建-CSDN博客
献上官方架构图
参考资料Pgpool-II + Watchdog Setup Example
环境准备
pgsql版本16
pgpool2版本4.5
操作系统均为rockylinux9.3
注意:10.10.104.41是我的zabbix server,我zabbix部署的部分配置,提前配置了。
hostname | ip地址 | 配置 | 用途 | |
zabbix-pgsql-0 | 10.10.104.42 | 8C16G500G | pgsql主服务器 | |
zabbix-pgsql-1 | 10.10.104.43 | 8C16G500G | pgsql从服务器 | |
zabbix-pgsql-2 | 10.10.104.44 | 8C16G500G | pgsql从服务器 |
一、基础环境准备(所有主机)
按照表里修改主机名
hostnamectl set-hostname XXXX
所有主机hosts里面加入解析
echo "10.10.104.42 zabbix-pgsql-0" | sudo tee -a /etc/hosts
echo "10.10.104.43 zabbix-pgsql-1" | sudo tee -a /etc/hosts
echo "10.10.104.44 zabbix-pgsql-2" | sudo tee -a /etc/hosts
关闭防火墙以及selinux
systemctl stop firewalld
systemctl disable firewalld
sudo sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config && sudo reboot
启用 CodeReady Builder 仓库
dnf config-manager --set-enabled crb
修改ssh配置(all servers)
sudo sed -i '/^#PubkeyAuthentication/s/^#//' /etc/ssh/sshd_config && sudo systemctl restart sshd
二、数据库以及pgpool2安装及基础配置(所有主机)
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo dnf -qy module disable postgresql
dnf install -y postgresql16-server bash-completion
剔除postgresql源里面的pgpool2
sed -i '/^\[pgdg-common\]$/a exclude=pgpool*' /etc/yum.repos.d/pgdg-redhat-all.repo
sed -i '/^\[pgdg16\]$/a exclude=pgpool*' /etc/yum.repos.d/pgdg-redhat-all.repo
sed -i '/^\[pgdg15\]$/a exclude=pgpool*' /etc/yum.repos.d/pgdg-redhat-all.repo
sed -i '/^\[pgdg14\]$/a exclude=pgpool*' /etc/yum.repos.d/pgdg-redhat-all.repo
sed -i '/^\[pgdg13\]$/a exclude=pgpool*' /etc/yum.repos.d/pgdg-redhat-all.repo
sed -i '/^\[pgdg12\]$/a exclude=pgpool*' /etc/yum.repos.d/pgdg-redhat-all.repo
sed -i '/^\[pgdg11\]$/a exclude=pgpool*' /etc/yum.repos.d/pgdg-redhat-all.repo
使用pgpool2官方源安装pgpool2
dnf install -y https://www.pgpool.net/yum/rpms/4.5/redhat/rhel-9-x86_64/pgpool-II-release-4.5-1.noarch.rpm
dnf install -y pgpool-II-pg16-*
修改postgres账户密码
passwd postgres
切换到postgres账户
su - postgres
创建归档目录
mkdir /var/lib/pgsql/archivedir
退出postgres用户,切换回root
exit
设置服务器间免密登陆(all servers)
切换到postgres账户(all servers)
su - postgres
zabbix-pgsql-0服务器执行 (这个一定要用postgres的系统账户执行)
mkdir ~/.ssh
cd ~/.ssh
ssh-keygen -t rsa -f id_rsa_pgpool
ssh-copy-id -i /var/lib/pgsql/.ssh/id_rsa_pgpool.pub postgres@10.10.104.43
ssh-copy-id -i /var/lib/pgsql/.ssh/id_rsa_pgpool.pub postgres@10.10.104.44
zabbix-pgsql-1服务器执行 (这个一定要用postgres的系统账户执行)
cd ~/.ssh
ssh-keygen -t rsa -f id_rsa_pgpool
ssh-copy-id -i /var/lib/pgsql/.ssh/id_rsa_pgpool.pub postgres@10.10.104.42
ssh-copy-id -i /var/lib/pgsql/.ssh/id_rsa_pgpool.pub postgres@10.10.104.44
zabbix-pgsql-2服务器执行 (这个一定要用postgres的系统账户执行)
cd ~/.ssh
ssh-keygen -t rsa -f id_rsa_pgpool
ssh-copy-id -i /var/lib/pgsql/.ssh/id_rsa_pgpool.pub postgres@10.10.104.43
ssh-copy-id -i /var/lib/pgsql/.ssh/id_rsa_pgpool.pub postgres@10.10.104.42
互相间测试免密登陆命令
ssh -i /var/lib/pgsql/.ssh/id_rsa_pgpool postgres@10.10.104.42
ssh -i /var/lib/pgsql/.ssh/id_rsa_pgpool postgres@10.10.104.43
ssh -i /var/lib/pgsql/.ssh/id_rsa_pgpool postgres@10.10.104.44
三、主数据库配置
使用postgres用户初始化主库
su - postgres
/usr/pgsql-16/bin/initdb -D $PGDATA
exit
配置主库配置文件
echo "listen_addresses = '*'
archive_mode = on
archive_command = 'cp \"%p\" \"/var/lib/pgsql/archivedir/%f\"'
max_wal_senders = 5
max_replication_slots = 5
wal_level = replica
hot_standby = on
wal_log_hints = on
shared_buffers = '4GB'
work_mem = '64MB'
maintenance_work_mem = '512MB'
effective_cache_size = '12GB'
wal_buffers = '16MB'
max_wal_size = '1GB'
min_wal_size = '256MB'
max_connections = '500'
superuser_reserved_connections = '3'
max_parallel_workers_per_gather = '2'
max_parallel_workers = '4'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_min_duration_statement = '500ms'
autovacuum = on" | tee /var/lib/pgsql/16/data/postgresql.conf
配置pg_hba.conf
echo "host replication repl 10.10.104.42/32 trust" >> /var/lib/pgsql/16/data/pg_hba.conf
echo "host replication repl 10.10.104.43/32 trust" >> /var/lib/pgsql/16/data/pg_hba.conf
echo "host replication repl 10.10.104.44/32 trust" >> /var/lib/pgsql/16/data/pg_hba.conf
echo "host all pgpool 10.10.104.42/32 trust" >> /var/lib/pgsql/16/data/pg_hba.conf
echo "host all pgpool 10.10.104.43/32 trust" >> /var/lib/pgsql/16/data/pg_hba.conf
echo "host all pgpool 10.10.104.44/32 trust" >> /var/lib/pgsql/16/data/pg_hba.conf
echo "host all postgres 10.10.104.41/32 trust" >> /var/lib/pgsql/16/data/pg_hba.conf
echo "host all postgres 10.10.104.42/32 trust" >> /var/lib/pgsql/16/data/pg_hba.conf
echo "host all postgres 10.10.104.43/32 trust" >> /var/lib/pgsql/16/data/pg_hba.conf
echo "host all postgres 10.10.104.44/32 trust" >> /var/lib/pgsql/16/data/pg_hba.conf
echo "host all all 10.10.104.45/32 trust" >> /var/lib/pgsql/16/data/pg_hba.conf
echo "host zabbix zabbix 0.0.0.0/0 md5" >> /var/lib/pgsql/16/data/pg_hba.conf
启动数据库
systemctl start postgresql-16.service
systemctl enable postgresql-16.service
配置数据库用户
psql -U postgres -p 5432 #进入数据库
postgres=# alter system set password_encryption='md5'; #设置全局密码加密方式
postgres=# CREATE ROLE pgpool WITH LOGIN; #创建pgpool用户
postgres=# CREATE ROLE repl WITH REPLICATION LOGIN; #创建内建流复制用户
postgres=# \password pgpool #设置pgpool用户密码
postgres=# \password repl #设置repl用户密码
postgres=# \password postgres #设置postgres用户密码
postgres=# GRANT pg_monitor TO pgpool;
postgres=# CREATE USER zabbix WITH PASSWORD 'V#M9b&7ycvCx4KLa'; #创建zabbix账户
postgres=# CREATE DATABASE zabbix OWNER zabbix; #创建zabbix库
四、配置.pgpass,所有机器
为了允许repl用户无需指定密码即可进行流式复制和在线恢复,并 使用postgres执行pg_rewind,我们在postgres用户的主目录中创建.pgpass文件,并将 每个PostgreSQL服务器上的 权限更改为600。该文件允许repl用户和postgres 用户无需提供密码即可进行流式复制和故障转移。
记得替换密码
echo -e "10.10.104.42:5432:replication:repl:<repl user password>\n10.10.104.43:5432:replication:repl:<repl user password>\n10.10.104.42:5432:postgres:postgres:<postgres user password>\n10.10.104.43:5432:postgres:postgres:<postgres user password>" | sudo tee /var/lib/pgsql/.pgpass > /dev/null
chown -R postgres:postgres /var/lib/pgsql/.pgpass
chmod 600 /var/lib/pgsql/.pgpass
五、配置pgpool2 (all servers)
编辑 pgpool.conf(all servers)
cat << EOF > /etc/pgpool-II/pgpool.conf
unix_socket_directories = '/run/pgpool'
pid_file_name = '/run/pgpool/pgpool.pid'
backend_status_file = '/run/pgpool/pgpool_status'
backend_clustering_mode = 'streaming_replication'
listen_addresses = '*'
pcp_listen_addresses = '*'
port = 9999
sr_check_user = 'pgpool'
sr_check_password = ''
health_check_period = 5
health_check_timeout = 30
health_check_user = 'pgpool'
health_check_password = ''
health_check_max_retries = 3
enable_pool_hba = on
pool_passwd = 'pool_passwd'
recovery_user = 'postgres'
recovery_password = ''
recovery_1st_stage_command = 'recovery_1st_stage'
# - Backend Connection Settings -
backend_hostname0 = '10.10.104.42'
backend_application_name0 = '10.10.104.42'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/pgsql/16/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '10.10.104.43'
backend_application_name1 = '10.10.104.43'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/16/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_hostname2 = '10.10.104.44'
backend_application_name2 = '10.10.104.44'
backend_port2 = 5432
backend_weight2 = 1
backend_data_directory2 = '/var/lib/pgsql/16/data'
backend_flag2 = 'ALLOW_TO_FAILOVER'
failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
follow_primary_command = '/etc/pgpool-II/follow_primary.sh %d %h %p %D %m %H %M %P %r %R'
# 看门狗基本设置
use_watchdog = on
# VIP设置
delegate_IP = '10.10.104.45' # 虚拟IP地址
if_cmd_path = '/usr/sbin'
if_up_cmd = 'ip addr add 10.10.104.45 dev ens192'
if_down_cmd = 'ip addr del 10.10.104.45 dev ens192'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U 10.10.104.45 -w 1'
#看门狗节点设置
hostname0 = '10.10.104.42'
wd_port0 = 9000
pgpool_port0 = 9999
hostname1 = '10.10.104.43'
wd_port1 = 9000
pgpool_port1 = 9999
hostname2 = '10.10.104.44'
wd_port2 = 9000
pgpool_port2 = 9999
# 心跳设置
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_hostname0 = '10.10.104.42'
heartbeat_port0 = 9694
heartbeat_device0 = ''
heartbeat_hostname1 = '10.10.104.43'
heartbeat_port1 = 9694
heartbeat_device1 = ''
heartbeat_hostname2 = '10.10.104.44'
heartbeat_port2 = 9694
heartbeat_device2 = ''
wd_escalation_command = '/etc/pgpool-II/escalation.sh'
#log设置
log_destination = 'stderr'
logging_collector = on
log_directory = '/var/log/pgpool_log'
log_filename = 'pgpool-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 10MB
# 优化的连接池大小配置
num_init_children = 200 # 修改:初始化子进程数为200,以支持更高的并发连接数
max_pool = 10 # 修改:每个子进程的连接池大小为10
child_life_time = 300 # 修改:子进程生命周期为300秒
connection_life_time = 0 # 修改:连接生命周期为0(无限制)
client_idle_limit = 0 # 修改:客户端空闲时间限制为0(无限制)
child_max_connections = 0 # 修改:每个子进程最大连接数为0(无限制)
EOF
编辑 pcp.conf 文件(all servers)
:
执行命令生成pcp用户的md5密码(all servers)我的pcp用户名也是用的pgpool
pg_md5 -p
提示你输入你的密码,输入之后回车,如下图(all servers)
接着按照格式 用户名:md5的密文密码 写入到下面的文件(all servers)
sudo vim /etc/pgpool-II/pcp.conf
配置.pcppass
su - postgres
echo 'localhost:9898:pgpool:pgpool_password' > ~/.pcppass
chmod 600 ~/.pcppass
配置pgpool_node_id(zabbix-pgsql-0)
cat << EOF > /etc/pgpool-II/pgpool_node_id
0
EOF
配置pgpool_node_id(zabbix-pgsql-1)
cat << EOF > /etc/pgpool-II/pgpool_node_id
1
EOF
配置pgpool_node_id(zabbix-pgsql-2)
cat << EOF > /etc/pgpool-II/pgpool_node_id
2
EOF
配置pool_hba.conf文件(all servers)
cat << EOF > /etc/pgpool-II/pool_hba.conf
host all all 10.10.104.41/32 trust
host all all 10.10.104.42/32 trust
host all all 10.10.104.43/32 trust
host all all 10.10.104.44/32 trust
host all all 10.10.104.45/32 trust
host zabbix zabbix 10.0.0.0/8 md5
EOF
配置pool_passwd文件(all servers)
执行命令生成repl、postgresql、pgpool这三个数据库用户的md5密码(all servers)
pg_md5 -p -m -u postgres pool_passwd
pg_md5 -p -m -u repl pool_passwd
pg_md5 -p -m -u pgpool pool_passwd
pg_md5 -p -m -u zabbix pool_passwd
配置故障转移(all servers)
cp -p /etc/pgpool-II/sample_scripts/failover.sh.sample /etc/pgpool-II/failover.sh
cp -p /etc/pgpool-II/sample_scripts/follow_primary.sh.sample /etc/pgpool-II/follow_primary.sh
chown postgres:postgres /etc/pgpool-II/{failover.sh,follow_primary.sh}
配置故障恢复(all servers)
cp -p /etc/pgpool-II/sample_scripts/recovery_1st_stage.sample /var/lib/pgsql/16/data/recovery_1st_stage
cp -p /etc/pgpool-II/sample_scripts/pgpool_remote_start.sample /var/lib/pgsql/16/data/pgpool_remote_start
chown postgres:postgres /var/lib/pgsql/16/data/{recovery_1st_stage,pgpool_remote_start}
主数据库中创建 pgpool_recovery
扩展
su - postgres
psql template1 -c "CREATE EXTENSION pgpool_recovery"
psql template1 -c "CREATE EXTENSION pgpool_recovery"
配置看门狗脚本(all servers)
cp -p /etc/pgpool-II/sample_scripts/escalation.sh.sample /etc/pgpool-II/escalation.sh
chown postgres:postgres /etc/pgpool-II/escalation.sh
vi /etc/pgpool-II/escalation.sh
内容里需要修改的就是下面这3行,根据实际的写(all servers)
PGPOOLS=(server1 server2 server3)
VIP=192.168.100.50
DEVICE=enp0s8
可以参考我的。 (all servers)
提权
sudo chmod u+s /usr/sbin/ip
sudo chmod u+s /usr/sbin/arping
配置pgpool_status文件
touch /run/pgpool/pgpool_status
chown -R postgres:postgres /run/pgpool/pgpool_status
六、2台从数据库配置(zabbix-pgsql-1、zabbix-pgsql-2执行)
确保从库的PostgreSQL服务已停止
sudo systemctl stop postgresql-16
清空从库的现有数据目录
sudo rm -rf /var/lib/pgsql/16/data/*
使用pg_basebackup从主库创建基础备份
sudo -u postgres pg_basebackup -h 10.10.104.42 -D /var/lib/pgsql/16/data -U repl -P -R
七、启动pgpool(3台机器)
systemctl start pgpool
systemctl enable pgpool
systemctl enable postgresql-16.service
八、配置备库节点
任意机器执行
psql -h 10.10.104.45 -p 9999 -U pgpool postgres -c "show pool_nodes"
输入数据库pgpool2账户的密码
得到如下图信息
设置备库为流复制模式,因为我们node_id是0、1、2,目前主节点是0,只需要给1、2节点使用如下命令即可
pcp_recovery_node -h 10.10.104.45 -p 9898 -U pgpool -n 1 -W
pcp_recovery_node -h 10.10.104.45 -p 9898 -U pgpool -n 2 -W
配置完成查看pool nodes如下
查看看门狗状态,这里用的pgpool是pcp的账户,前面有pcp.conf配置
pcp_watchdog_info -h 10.10.104.45 -p 9898 -U pgpool -W
如下图
九、关于故障转移后恢复
主节点故障后,主库会转移到从库。主库恢复后,需要通过下面的命令反注册。
pcp_recovery_node -h 10.10.104.45 -p 9898 -U pgpool -n your-node-id -W