本文介绍集群学习分享:CentOS7使用pcs+nginx+nfs快速搭建web 在web架构基础上再加入pcs+postgresql数据库高可用:
配置之前最好先做快照
虚拟机节点基本信息:
主机名 | ip |
pcsnas | 192.168.146.149 |
pcsnode01 | 192.168.146.150 |
pcsnode02 | 192.168.146.151 |
vip-master | 192.168.146.251 |
vip-slave | 192.168.146.252 |
一.查询pcs集群基础信息:
[root@pcsnode01 ~]# pcs status
Cluster name: cluster1
Stack: corosync
Current DC: pcsnode03 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Sun Mar 28 23:15:06 2021
Last change: Sun Mar 28 23:00:44 2021 by root via crm_attribute on pcsnode03
3 nodes configured
3 resource instances configured
Online: [ pcsnode01 pcsnode02 pcsnode03 ]
Full list of resources:
Resource Group: web_cluster
vip (ocf::heartbeat:IPaddr2): Started pcsnode03
web_file (ocf::heartbeat:Filesystem): Started pcsnode03
web (ocf::heartbeat:nginx): Started pcsnode03
二.安装配置postgresql
1.创建配置目录
所有节点执行:
#创建目录
mkdir -p /data/postgresql/{data,xlog_archive}
#授权
chown -R postgres:postgres /data/postgresql/
chmod 0700 /data/postgresql/data
2.初始化db
在pcsnode01执行:
[root@pcsnode01 ~]# su - postgres
-bash-4.2$ initdb -D /data/postgresql/data/
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
fixing permissions on existing directory /data/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in /data/postgresql/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
Success. You can now start the database server using:
postgres -D /data/postgresql/data
or
pg_ctl -D /data/postgresql/data -l logfile start
3.修改配置文件:
-bash-4.2$ vi /data/postgresql/data/postgresql.conf
listen_addresses = '*'
wal_level = hot_standby
synchronous_commit = on
archive_mode = on
archive_command = 'cp %p /data/postgresql/xlog_archive/%f'
max_wal_senders=5
wal_keep_segments = 32
hot_standby = on
restart_after_crash = off
replication_timeout = 5000
wal_receiver_status_interval = 2
max_standby_streaming_delay = -1
max_standby_archive_delay = -1
synchronous_commit = on
restart_after_crash = off
hot_standby_feedback = on
include '/var/lib/pgsql/tmp/rep_mode.conf' # added by pgsql RA
-bash-4.2$ vi /data/postgresql/data/pg_hba.conf
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 192.168.0.0/16 md5
# IPv6 local connections:
host replication all 192.168.0.0/16 md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local replication postgres trust
#host replication postgres 127.0.0.1/32 trust
#host replication postgres ::1/128 trust
修改recovery.conf配置
vi /data/postgresql/data/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.146.251 port=5432 user=replicator application_name=pcsnode01 password=8d5e9531-3817-460d-a851-659d2e51ca99 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'cp /data/postgresql/xlog_archive/%f %p'
recovery_target_timeline = 'latest'
4.启动数据库:
#启动
pg_ctl -D /data/postgresql/data/ start
#进入postgresql
psql -U postgres
psql (9.2.24)
输入 "help" 来获取帮助信息.
postgres=# postgres=# create role replicator with login replication password '8d5e9531-3817-460d-a851-659d2e51ca99';
创建密码:
postgres=# alter user postgres with password ‘postgres’;
5. 配置slave:
在pcsnode02、pcsnode03上执行:
#node02-03节点执行:
su - postgres
node02修改recovery.conf:
vi /data/postgresql/data/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.146.251 port=5432 user=replicator application_name=pcsnode02 password=8d5e9531-3817-460d-a851-659d2e51ca99 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'cp /data/postgresql/xlog_archive/%f %p'
recovery_target_timeline = 'latest'
node03修改recovery.conf
vi /data/postgresql/data/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.146.251 port=5432 user=replicator application_name=pcsnode03 password=8d5e9531-3817-460d-a851-659d2e51ca99 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'cp /data/postgresql/xlog_archive/%f %p'
recovery_target_timeline = 'latest'
#同步数据:
pg_basebackup -h 192.168.146.149 -U postgres -D /data/postgresql/data/ -X stream -P
6.启动slave:
pg_ctl -D /data/postgresql/data/ start
7.查看集群:
在pcsnode01查询:
-bash-4.2$ psql -U postgres
psql (9.2.24)
输入 "help" 来获取帮助信息.
postgres=# select * from pg_stat_replication ;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_st
ate
-------+----------+------------+------------------+-----------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+--------
----
13383 | 16384 | replicator | pcsnode02 | 192.168.146.150 | | 35884 | 2021-03-28 23:00:34.793995+08 | streaming | 0/120000E0 | 0/120000E0 | 0/120000E0 | 0/120000E0 | 1 | sync
13401 | 16384 | replicator | pcsnode03 | 192.168.146.151 | | 42186 | 2021-03-28 23:00:34.994035+08 | streaming | 0/120000E0 | 0/120000E0 | 0/120000E0 | 0/120000E0 | 0 | async
(2 行记录)
postgres-# \q
停止PostgreSQL服务因为集群会自动拉起
在node1、node2和node3上执行:
$ pg_ctl -D /data/postgresql/data/ -mi stop
waiting for server to shut down.... done
server stopped
8.配置pcs的资源(注意根据实际情况配置):
任意节点:
vi cluster_setup.sh
# 将cib配置保存到文件
pcs cluster cib pgsql_cfg
# 在pacemaker级别忽略quorum
pcs -f pgsql_cfg property set no-quorum-policy="ignore"
# 禁用STONITH
pcs -f pgsql_cfg property set stonith-enabled="false"
# 设置资源粘性,防止节点在故障恢复后发生迁移
pcs -f pgsql_cfg resource defaults resource-stickiness="INFINITY"
# 设置多少次失败后迁移
pcs -f pgsql_cfg resource defaults migration-threshold="3"
# 设置master节点虚ip
pcs -f pgsql_cfg resource create vip-master IPaddr2 ip="192.168.146.251" cidr_netmask="24" op start timeout="60s" interval="0s" on-fail="restart" op monitor timeout="60s" interval="10s" on-fail="restart" op stop timeout="60s" interval="0s" on-fail="block"
# 设置slave节点虚ip
pcs -f pgsql_cfg resource create vip-slave IPaddr2 ip="192.168.146.252" cidr_netmask="24" op start timeout="60s" interval="0s" on-fail="restart" op monitor timeout="60s" interval="10s" on-fail="restart" op stop timeout="60s" interval="0s" on-fail="block"
# 设置pgsql集群资源
# pgctl、psql、pgdata和config等配置根据自己的环境修改
pcs -f pgsql_cfg resource create pgsql pgsql pgctl="/usr/bin/pg_ctl" psql="/usr/bin/psql" pgdata="/data/postgresql/data/" config="/data/postgresql/data/postgresql.conf" rep_mode="sync" node_list="pcsnode01 pcsnode02 pcsnode03" master_ip="192.168.146.251" repuser="replicator" primary_conninfo_opt="password=8d5e9531-3817-460d-a851-659d2e51ca99 keepalives_idle=60 keepalives_interval=5 keepalives_count=5" restore_command="cp /data/postgresql/xlog_archive/%f %p" restart_on_promote='true' op start timeout="60s" interval="0s" on-fail="restart" op monitor timeout="60s" interval="4s" on-fail="restart" op monitor timeout="60s" interval="3s" on-fail="restart" role="Master" op promote timeout="60s" interval="0s" on-fail="restart" op demote timeout="60s" interval="0s" on-fail="stop" op stop timeout="60s" interval="0s" on-fail="block"
# 设置master/slave模式
pcs -f pgsql_cfg resource master clusterpsql1 pgsql master-max=1 master-node-max=1 clone-max=3 clone-node-max=1 notify=true
# 配置master ip组
pcs -f pgsql_cfg resource group add master-group vip-master
# 配置slave ip组
pcs -f pgsql_cfg resource group add slave-group vip-slave
# 配置master ip组绑定master节点
pcs -f pgsql_cfg constraint colocation add master-group with master clusterpsql1 INFINITY
# 配置启动master节点
pcs -f pgsql_cfg constraint order promote clusterpsql1 then start master-group symmetrical=false score=INFINITY
# 配置停止master节点
pcs -f pgsql_cfg constraint order demote clusterpsql1 then stop master-group symmetrical=false score=0
# 配置slave ip组绑定slave节点
pcs -f pgsql_cfg constraint colocation add slave-group with slave cluster1 INFINITY
# 配置启动slave节点
pcs -f pgsql_cfg constraint order promote clusterpsql1 then start slave-group symmetrical=false score=INFINITY
# 配置停止slave节点
pcs -f pgsql_cfg constraint order demote clusterpsql1 then stop slave-group symmetrical=false score=0
# 把配置文件push到cib
pcs cluster cib-push pgsql_cfg
注意的点(本人在这里卡两天):
# pgctl、psql、pgdata和config等配置根据自己的环境修改
pcs -f pgsql_cfg resource create pgsql pgsql pgctl="/usr/bin/pg_ctl" psql="/usr/bin/psql" pgdata="/data/postgresql/data/" config="/data/postgresql/data/postgresql.conf" rep_mode="sync" node_list="pcsnode01 pcsnode02 pcsnode03" master_ip="192.168.146.251" repuser="replicator" primary_conninfo_opt="password=8d5e9531-3817-460d-a851-659d2e51ca99 keepalives_idle=60 keepalives_interval=5 keepalives_count=5" restore_command="cp /data/postgresql/xlog_archive/%f %p" restart_on_promote='true' op start timeout="60s" interval="0s" on-fail="restart" op monitor timeout="60s" interval="4s" on-fail="restart" op monitor timeout="60s" interval="3s" on-fail="restart" role="Master" op promote timeout="60s" interval="0s" on-fail="restart" op demote timeout="60s" interval="0s" on-fail="stop" op stop timeout="60s" interval="0s" on-fail="block"
9.执行脚本:
chmod +x cluster_setup.sh
sh cluster_setup.sh
10.查看集群:
[root@pcsnode01 ~]# pcs status
Cluster name: cluster1
Stack: corosync
Current DC: pcsnode03 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Mon Mar 29 00:24:27 2021
Last change: Sun Mar 28 23:00:44 2021 by root via crm_attribute on pcsnode03
3 nodes configured
8 resource instances configured
Online: [ pcsnode01 pcsnode02 pcsnode03 ]
Full list of resources:
Resource Group: web_cluster
vip (ocf::heartbeat:IPaddr2): Started pcsnode03
web_file (ocf::heartbeat:Filesystem): Started pcsnode03
web (ocf::heartbeat:nginx): Started pcsnode03
Master/Slave Set: clusterpsql1 [pgsql]
Masters: [ pcsnode03 ]
Slaves: [ pcsnode01 pcsnode02 ]
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started pcsnode03
Resource Group: slave-group
vip-slave (ocf::heartbeat:IPaddr2): Started pcsnode01
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
11.常用查询集群信息:
集群资产列表:
[root@pcsnode01 ~]# pcs property list
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: cluster1
dc-version: 1.1.23-1.el7_9.1-9acf116022
have-watchdog: false
maintenance-mode: false
no-quorum-policy: ignore
stonith-enabled: false
Node Attributes:
pcsnode01: pgsql-data-status=STREAMING|SYNC
pcsnode02: pgsql-data-status=STREAMING|ASYNC
pcsnode03: pgsql-data-status=LATEST
集群资源展示:
[root@pcsnode01 ~]# pcs resource show
Resource Group: web_cluster
vip (ocf::heartbeat:IPaddr2): Started pcsnode03
web_file (ocf::heartbeat:Filesystem): Started pcsnode03
web (ocf::heartbeat:nginx): Started pcsnode03
Master/Slave Set: clusterpsql1 [pgsql]
Masters: [ pcsnode03 ]
Slaves: [ pcsnode01 pcsnode02 ]
Resource Group: master-group
vip-master (ocf::heartbeat:IPaddr2): Started pcsnode03
Resource Group: slave-group
vip-slave (ocf::heartbeat:IPaddr2): Started pcsnode01
三、验证:
1.查看集群信息是否正常
如果不正常看配置文件是否有误
回退命令:
集群删除slave-group
pcs resource delete slave-group
集群删除master-group
pcs resource delete master-group
集群删除clusterpsql1
pcs resource delete cluster1
2.关闭主节点看是否转移
注意:集群状态报错/var/lib/pgsql/tmp/PGSQL.lock导致某个节点停止,删除这个文件即可。
rm -f /var/lib/pgsql/tmp/PGSQL.lock
补充postgresql recovery.conf文件内容说明
standby_mode = 'on' #说明自己是备库
primary_conninfo = 'user=postgres host=10.9.5.22 port=5433 sslmode=prefer sslcompression=1 krbsrvname=postgres '#同步连接信息
recovery_target_timeline='latest'#同步到什么时候
trigger_file='/var/lib/pgsql/9.5/data/.tfile'#当改文件存在时,备机切换为主机,否则备机只能读不能写。在做HA时,主机掉线后,需要将备机自动切换为主机,只需要创建改文件,备机间隔5s之内检测到后,自动切换为主机。 业务繁忙时检测较快,业务少则最迟5s检测到,并进行激活。除了使用trigger_file来激活备库,还可以使用其他工具:pg_ctl promotepg_standby