1.环境配置
ip | 虚拟ip | 版本 | |
master | 192.168.20.71 | 192.168.20.99 | postgresql v14.2 keepalived-1.3.5-19.el7.x86_64 |
slave | 192.168.20.75 | postgresql v14.2 keepalived-1.3.5-19.el7.x86_64 |
2.安装步骤
安装postgresql
yum install -y postgresql14-server
以下命令在使用解压缩方式安装时执行,使用yum安装时不需要执行。
命令补充,如果使用解压缩方式安装,则需要创建postgres用户 并且安装postgresql过程中需要手动安装如下依赖工具 |
启动方法命令:
su - postgres
pg_ctl start
pg_ctl restart
pg_ctl stop
安装完以后,安装路径如下:
/var/lib/pgsql/14
进入postgres用户
su - postgres
通过vi ~/.bash_profile查看安装路径和PDATA的配置
[ -f /etc/profile ] && source /etc/profile
PATH=/usr/pgsql-14/bin:$PATH
export PATH
PGDATA=/var/lib/pgsql/14/data
export PGDATA
# If you want to customize your settings,
# Use the file below. This is not overridden
# by the RPMS.
[ -f /var/lib/pgsql/.pgsql_profile ] && source /var/lib/pgsql/.pgsql_profile
可以看到postgresql安装路径在/usr/pgsql-14/bin下,PGDATA在路径/var/lib/pgsql/14/data下。
创建数据库Keepalived,并且创建表探测表sr_delay,后续Keepalived探测,刷新sr_delay表的last_alive字段为当前探测时间。这张表用来判断主备延迟情况,数据库故障切换时会用到这张表。
数据库配置:
使用psql命令进入postgres数据库后台
-bash-4.2$ psql
psql (14.2)
Type "help" for help.
创建测试表test
postgres=# create table test(id int);
创建keepalived用户和数据库,用于监测主备机器数据库启动情况
postgres=# create user keepalived password 'sw12345' CONNECTION LIMIT 4 ;
CREATE ROLE
postgres=# create database keepalived owner keepalived;
CREATE DATABASE
postgres=# \c keepalived keepalived
You are now connected to database "keepalived" as user "keepalived".
keepalived=> create table sr_delay(id int4, last_alive timestamp(0) without time zone);
CREATE TABLE
表sr_delay只允许写入一条记录,并且不允许删除此表数据,通过触发器实现。创建触发器函数,如下所示:
CREATE FUNCTION cannt_delete ()
RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
RAISE EXCEPTION 'Table sr_delay can not delete !';
END;
$$;
创建触发器:
CREATE TRIGGER trigger_sr_delay_del
BEFORE DELETE ON sr_delay
FOR EACH ROW EXECUTE PROCEDURE cannt_delete() ;
CREATE TRIGGER trigger_sr_delay_tru BEFORE TRUNCATE ON sr_delay FOR STATEMENT EXECUTE PROCEDURE cannt_delete() ;
插入数据:
INSERT INTO sr_delay VALUES(1,now()) ;
退出pgsql后台,退出postgres用户
\q
exit
创建归档文件夹,并赋权
mkdir /var/lib/pgsql/archivedir
chown postgres /var/lib/pgsql/archivedir
进入postgres用户,打开postgres配置文件,如下参数需要修改:
su - postgres
vi /var/lib/pgsql/14/data/postgresql.conf
修改参数如下:
listen_address = '*'
wal_level = replica
hot_standby = on
wal_log_hints = on
max_wal_senders= 10
wal_keep_segments=1024
archive_mode=on
archive_command='test ! -f /var/lib/pgsql/archivedir/%f && cp %p /var/lib/pgsql/archivedir/%f'
参数简要说明:
listen_address: 按需设置,本次测试配置为所有主机均可以访问,生产环境可以按需配置网段或IP等
wal_level: 设置流复制模式至少设置为replica
archive_mode: 本次启用归档
archive_command:WAL日志归档命令,生产环境可以将归档拷贝到对应目录或其他机器上,本次测试配置为归档到本机的另一个目录下
max_wal_senders:最大WAL发送进程数,此数量需大于等于从库个数且比max_connections小。
wal_keep_segments:pg_wal目录下保留WAL日志的个数,每个WAL文件默认16M,为保障从库能在应用归档落后时依旧能追上主库,此值建议设置较大一点。
hot_standby:此参数控制在恢复归档期间是否支持只读操作,设置为ON后从库为只读模式。
后续Keepalived会每隔指定时间探测PostgreSQL数据库存活, 并且以Keepalived用户登录Keepalived数据库刷新这张表, 配置主备库pg_hba.conf,增加如下内容:
vi /var/lib/pgsql/14/data/pg_hba.conf
host keepalived keepalived 192.168.20.71/32 md5
host keepalived keepalived 192.168.20.75/32 md5
为了使服务器之间可以通信,还需要添加如下配置
host replication all 192.168.20.99/32 trust
host replication all 192.168.20.75/24 md5
host replication all 192.168.20.71/24 md5
host replication all 192.168.20.1/24 md5
host all all 0.0.0.0/0 md5
随后对配置文件进行重新加载使配置生效。
pg_ctl reload
创建复制账号
为了数据安全及便于权限控制,创建一个复制专用的数据库账号
postgres=# create user repl REPLICATION LOGIN ENCRYPTED PASSWORD 'sw12345';
CREATE ROLE
修改配置文件pg_hba.conf
添加复制账号的权限,因可能会主从切换,因此 主从机器的IP均添加。也可以设置网段,以便于后期添加从库。
# replication privilege.
host replication repl 192.168.20.71/24 md5
host replication repl 192.168.20.75/24 md5
备份数据,此步骤只需要在从机上操作,用于复制主机数据
从机上在线备份主库数据,并将数据放在指定路径,此路径建议与主库路径一致。可以使用
pg_basebackup
命令在线热备份,具体命令如下:
pg_basebackup -h 192.168.20.71 -U repl -p 5432 -F p -X s -v -P -R -D /var/lib/pgsql/14/data/ -l postgres32
pg_basebackup命令中的参数说明:
-h 指定连接的数据库的主机名或IP地址,这里就是主库的ip
-U 指定连接的用户名,此处是我们刚才创建的专门负责流复制的repl用户
-F 指定生成备份的数据格式,支持p(plain原样输出)或者t(tar格式输出)
-X 表示备份开始后,启动另一个流复制连接从主库接收WAL日志,有 f(fetch)和s (stream)两种方式,建议使用s方式
-P 表示显示数据文件、表空间传输的近似百分比 允许在备份过程中实时的打印备份的进度
-v 表示启用verbose模式,命令执行过程中会打印各阶段日志,建议启用
-R 表示会在备份结束后自动生成recovery.conf文件,这样也就避免了手动创建
-D 指定把备份写到哪个目录,这里尤其要注意一点就是做基础备份之前从库的数据目录(/data/postgresql/data)目录需要手动清空
-l 表示指定个备份的标识,运行命令后可以看到进度提示
以上备份命令输出过程如下
[postgres@PG33 data]$ pg_basebackup -h 192.168.20.71 -U repl -p 5432 -F p -X s -v -P -R -D /var/lib/pgsql/14/data/ -l postgres32
Password:
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17737"
56041/56041 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/20000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
从以上日志信息看出pg_basebackup命令首先对数据库做一次checkpoint,之后基于时间点做一个全库基准备份,全备过程中会拷贝$PGDATA数据文件和表空间文件到备库节点对应目录。
进入数据库,查看副本同步状态
select pid ,usesysid,usename,client_addr,state,sync_state from pg_stat_replication;
查看是否是副本,如果结果返回f说明是主库,返回t说明是从库。
select pg_is_in_recovery();
安装keepalived
关闭防火墙和selinux
systemctl stop firewalld
vi /etc/selinux/config
SELINUX=disabled
getenforce
在71和75机器上分别进行如下操作,两台机器都要进行:
yum install gcc openssl openssldevel libnl libnldevel ipvsadm -y
yum install -y keepalived
编辑keepalived.conf配置文件
vi /etc/keepalived/keepalived.conf
内容如下:
说明:Keepalived主节点的priority参数配成100,备节点的priority参数改成90 ,主节点的state改为MASTER,备节点改为BACKUP,以及smtp_server 分别改为192.168.20.71和192.168.20.75,其余参数配置一样。
!Configuration File for keepalived
global_defs {
notification_email {
343547566@qq.com
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 192.168.20.71
smtp_connect_timeout 30
router_id LVS_DEVEL
vrrp_skip_check_adv_addr
vrrp_strict
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_script check_pg_alived {
script "/etc/keepalived/scripts/pg_monitor.sh"
interval 10
fall 3
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
check_pg_alived
}
virtual_ipaddress {
192.168.20.99
}
notify_master "/etc/keepalived/scripts/failover.sh"
notify_fault "/etc/keepalived/scripts/fault.sh"
}
创建scripts路径
mkdir /etc/keepalived/scripts
创建pg_monitor.sh,failover.sh,fault.sh,并赋权限
touch /etc/keepalived/scripts/pg_monitor.sh
touch /etc/keepalived/scripts/failover.sh
touch /etc/keepalived/scripts/fault.sh
chmod a+x /etc/keepalived/scripts/pg_monitor.sh
chmod a+x /etc/keepalived/scripts/failover.sh
chmod a+x /etc/keepalived/scripts/fault.sh
分别编写pg_monitor.sh,failover.sh和fault.sh脚本
vi /etc/keepalived/scripts/pg_monitor.sh
#!/bin/bash
pgport=5432
pguser=keepalived
pgdb=keepalived
pgpwd='sw12345'
LANG=en_US.utf8
PGHOME=/usr/pgsql-14/
LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib
PATH=$PGHOME/bin:$PATH:.
MONITOR_LOG="/etc/keepalived/scripts/pg_monitor.log"
SQL1="UPDATE sr_delay SET last_alive= now();"
SQL2='SELECT 1;'
keeplognums=30000
#pg_port_status=`lsof i :$pgport | grep LISTEN | wc l`
#pg_port_status=`ps ef | grep LISTEN | wc l`
#if [ $pg_port_status lt 1 ];then
# echo e `date +"%F %T"` "Error: The postgreSQL is not running,please chec k the postgreSQL server status!" >> $LOGFILE
# exit 1
#fi
#此脚本不检查备库存活状态,如果是备库则退出
standby_flg=`psql -p $pgport -U postgres -At -c "SELECT pg_is_in_recovery();"`
if [ ${standby_flg} == 't' ];
then
echo e "`date +%F\ %T`: This is a standby database, exit!\n" > $MONITOR_LOG
exit 0
fi
export PGPASSWORD=$pgpwd
#主库更新sr_delay 表
echo $SQL1 | psql -At -p $pgport -U $pguser -d $pgdb >> $MONITOR_LOG
#判断主库是否可用
echo $SQL2 | psql -At -p $pgport -U $pguser -d $pgdb
if [ $? -eq 0 ] ;
then
echo -e "`date +%F\ %T`:Primary db is health." >> $MONITOR_LOG
exit 0
else
echo -e "`date +%F\ %T`:Attention: Primary db is not health!" >> $MONITOR_LOG
exit 1
fi
#日志保留 keeplognums 行
if [ ! -f ${MONITOR_LOG} ];
then touch ${MONITOR_LOG};
fi
lognums=`cat ${MONITOR_LOG} |wc -l`
catnum=$((${lognums} -${keeplognums}))
if [[ $lognums -gt ${keeplognums} ]] ; then sed -i "1,${catnum}d" ${MONITOR_LOG};
fi
编写failover.sh
vi /etc/keepalived/scripts/failover.sh
内容如下:
#!/bin/bash
export PGPORT=5432
export PGUSER=keepalived
export PG_OS_USER=postgres
export PGDBNAME=keepalived
export.UTF-8
export PGPATH=/usr/pgsql-14/bin/
export PATH=$PATH:$PGPATH
export PGMIP=127.0.0.1
LOGFILE='/etc/keepalived/scripts/failover.log'
# 主备数据库同步时延,单位为秒
sr_allowed_delay_time=100
SQL1='select pg_is_in_recovery from pg_is_in_recovery();'
SQL2="select count(1) as delay_time from sr_delay where now()<(last_alive + interval '${sr_allowed_delay_time} seconds');"
#SQL2="select count(1) as delay_time from sr_delay where now()<(last_alive + interval '100 seconds');"
sleep $sr_allowed_delay_time
db_role=`echo $SQL1 | psql -At -p $PGPORT -U $PGUSER -d $PGDBNAME -w`
db_sr_delaytime=`echo $SQL2 | psql -p $PGPORT -d $PGDBNAME -U $PGUSER -At -w`
SWITCH_COMMAND='pg_ctl promote -D /var/lib/pgsql/14/data/'
# 如果为备库,且延迟大于指定时间则切换为主库
if [ ${db_role} == 'f' ];
then
echo -e `date +"%F %T"` "Attention:The current postgreSQL DB is master database,cannot switched!" >> $LOGFILE
exit 0
fi
if [ $db_sr_delaytime -gt 0 ];
then
echo -e `date +"%F %T"` "Attention:The current master database is health,the standby DB cannot switched!" >> $LOGFILE
exit 0
fi
if [ !$db_sr_delaytime ];
then
echo -e `date +"%F %T"` "Attention:The current database is statndby,ready to switch master database!" >> $LOGFILE
su - $PG_OS_USER -c "$SWITCH_COMMAND"
elif [ $? -eq 0 ];
then
echo -e `date +"%F %T"` "success:The current standby database successed to switched the primary PG database !" >> $LOGFILE
exit 0
else
echo -e `date +"%F %T"` "Error: the standby database failed to switch the primary PG database !,pelease checked it!" >> $LOGFILE
exit 1
fi
编写fault.sh
vi /etc/keepalived/scripts/fault.sh
内容如下:
#!/bin/bash
GFILE=/etc/keepalived/scripts/pg_db_fault.log
PGPORT=5432
PGMIP=192.168.20.71
echo -e `date +"%F %T"` "Error:Because of the priamry DB happend some unknown problem,So turn off the PostgreSQL Database!" >> $LOGFILE
PGPID="`head -n1 /var/lib/pgsql/14/data/postmaster.pid`"
systemctl stop keepalived
kill -9 $PGPID
if [ $? -eq 0 ] ;
then
echo -e `date +"%F %T"` "Error:Because of the priamry DB happend some unknown problem,So turn off the PostgreSQL Database!" >> $LOGFILE
systemctl stop keepalived
exit 1
fi
Keepalived启动、停止和状态查看
systemctl start keepalived
systemctl stop keepalived
systemctl status keepalived
3.测试
在主机上停掉数据库
[root@localhost ~]# su - postgres
[postgres@localhost ~]$ pg_ctl -D /var/lib/pgsql/14/data/ stop
到之前的从机上看看能否插入数据,如果显示insert 0 1 说明插入成功,说明主机切换成功。
[postgres@localhost ~]$ psql
psql (14.2)
Type "help" for help.
postgres=# insert into test(name) values (2);
INSERT 0 1
然后去老主机使用pg_rewind将现在主机的内容进行同步,具体命令如下:
pg_rewind -R --target-pgdata '/var/lib/pgsql/14/data/' --source-server 'host=192.168.20.75 port=5432 user=repl password=sw12345 dbname=postgres'
最后重新启动从库,并检查主备wal sender和receiver是否正常:
[postgres@localhost ~]$ pg_ctl -D /var/lib/pgsql/14/data/ start
[postgres@localhost ~]$ ps axu |grep walreceiver
由于postgresql12版本后的调整,需要在从库新增standby.signal文件
touch /var/lib/pgsql/14/data/standby.signal
再通过psql登录到数据库,查看主从情况:
select pg_is_in_recovery();
至此,整个PostgreSQL+Keepalived集群配置配置完成。