keepalived实现greenplum的自动切换高可用
机器已经搭建了gp的高可用
192.168.60.221 master节点
192.168.60.222 master-standby节点
192.168.60.223 segement节点
segement节点数量根据需求变化
机器已经搭建了keepalived
https://blog.csdn.net/weixin_44385419/article/details/111543115
192.168.60.221 keepalived主节点
192.168.60.222 keepalived备节点
192.168.60.100 虚拟ip(vip)
keepalived执行脚本完成业务所需操作
需要用的脚本:
1.监控greenplum进程的脚本(本次未写)
2.停止keepalived进程的脚本(本次未写)
3.激活greenplum的standby节点
4.增加greenplum、standby节点的脚本
常用命令
gp:
启动: gpatart
停止: gpstop -f
停止master:pg_ctl stop -D /home/soft/greenplum/gpdata/master/gpseg-1/ -m fast
启动standby:gpactivatestandby -d /home/soft/greenplum/gpdata/master/gpseg-1
增加standby:gpinitstandby -s standbyMac (standbyMac 为主机名)
移除standby:gpinitstandby standbyMac -r (standbyMac 为主机名)
查看状态:gpstate -s
查看standby:gpstate -f
keepalived:
启动:systemctl start keepalived
停止:systemctl stop keepalived
查看: ps -ef|grep [k]eepalived
ip a
温馨提示:
1.本gp版本为6.0.0 不同的版本,脚本需要微调
2.keepalived最好要安装在gp的master和standby节点上
3.两个机器的keepalived配置文件都需要修改,脚本两个机器都需要
keepavlied增加脚本配置
vi /etc/keepalived/keepalived.conf
notify_master /etc/keepalived/bin/changeMaster.sh
notify_backup /etc/keepalived/bin/changeUser.sh
changeUser.sh
#!/bin/sh
username=`whoami`
if [ "$username" = "gpadmin" ]; then
# /etc/keepalived/bin/monitor.sh
/etc/keepalived/bin/gpMonitor.sh
else
# /bin/su - gpadmin -c /etc/keepalived/bin/monitor.sh
/bin/su - gpadmin -c /etc/keepalived/bin/gpMonitor.sh
fi
changeMaster.sh
#!/bin/sh
username=`whoami`
if [ "$username" = "gpadmin" ]; then
# /etc/keepalived/bin/monitor.sh
/etc/keepalived/bin/activateGPstandby.sh
else
# /bin/su - gpadmin -c /etc/keepalived/bin/monitor.sh
/bin/su - gpadmin -c /etc/keepalived/bin/activateGPstandby.sh
fi
activateGPstandby.sh
source /piflow/soft/greenplum/greenplum/greenplum-db/greenplum_path.sh
gpsegParh=/piflow/soft/greenplum/data/master/gpseg-1
activityLog=/piflow/soft/greenplum/statuslog/activityLog.log
nowTime=`date +%Y-%m-%d`
log_file=/piflow/soft/greenplum/statuslog/status-$nowTime
export export MASTER_DATA_DIRECTOR=$gpsegParh
export PGPORT=5432
echo `gpstate -s` > $activityLog
#已经启功的master
isMasterOKLog=`find $activityLog|xargs grep -ri "Master Configuration & Status"|wc -l`
if [ "$isMasterOKLog" -ne "0" ]; then
echo "$dateTime----------------The current node is a msater machine is working----------------" >> $log_file
echo "$dateTime:>>>The current node is a msater machine is working"
exit
else
echo "Activating MASTER on cdh09 date" >> /piflow/soft/greenplum/status.log
gpactivatestandby -a -d $gpsegParh
fi
gpMonitor.sh
# monitor.sh
username=`whoami`
nowTimeStamp=`date +%Y-%m-%d-%H%M%S`
nowTime=`date +%Y-%m-%d`
dateTime=`date +"%Y-%m-%d %H:%M:%S"`
changeLog=/piflow/soft/greenplum/statuslog/changeLog.log
#主机查询备机
masterCheckStandbyLog=/piflow/soft/greenplum/statuslog/masterCheckStandbyLog.log
#激活master的日志
changeLog2=/piflow/soft/greenplum/statuslog/masterchangeLog.log
log_file=/piflow/soft/greenplum/statuslog/status-$nowTime
## 集群的另一个主机
masterHost=cdh08
## 本机的主机名
standbyHost=cdh09
greenplum_path=/piflow/soft/greenplum/greenplum/greenplum-db/greenplum_path.sh
masterGpseg=/piflow/soft/greenplum/data/master/gpseg-1
if [ "$username" = "gpadmin" ]; then
#首先执行查询命令,查询集群状态
echo `gpstate -s` > $changeLog
else
su gpadmin
fi
if [ "$username" = "gpadmin" ]; then
#已经启功的master
isMasterOKLog=`find $changeLog|xargs grep -ri "Master Configuration & Status"|wc -l`
if [ "$isMasterOKLog" -ne "0" ]; then
#本节点为主节点,并且正在运行
echo "$dateTime----------------The current node is a msater machine is working----------------" >> $log_file
echo "$dateTime:>>>The current node is a msater machine is working"
exit
else
#如果等于0,代表主节点没启动,或者非主节点,执行启动命令
echo y|gpstart > $changeLog
#主节点的正常启动 Database successfully started
isOKMaster=`find $changeLog|xargs grep -ri "Database successfully started"|wc -l`
if [ "$isOKMaster" -ne "0" ]; then
#主节点正常
echo "$dateTime----------------The current node is a msater machine Database successfully started----------------" >> $log_file
echo "$dateTime:>>>The current node is a msater machine Database successfully started"
else
#启功异常,判定为非主节点,本节点为子节点,
# `pg_ctl start -D $masterGpseg > $changeLog 2>&1`
# isOKStandby=find $changeLog|xargs grep -ri 'FATAL'|wc -l
isOKStandby=1
if [ "$isOKStandby" -ne "1" ]; then
#备节点正常
echo "$dateTime----------------The current node is a standby machine successfully started----------------" >> $log_file
echo "$dateTime:>>>The current node is a standby machine successfully started"
else
#备节点异常
`mv $masterGpseg $masterGpseg$nowTimeStamp`
echo "$dateTime---------------The current node initialization----------------" >> $log_file
echo "$dateTime:>>>The current node initialization"
ssh $masterHost "hostname"
ssh $masterHost "source $greenplum_path"
ssh $masterHost "source ~/.bash_profile"
# ssh $masterHost "echo y | gpinitstandby -s $standbyHost > $changeLog"
ssh $masterHost "gpstate -f > $masterCheckStandbyLog"
#是否存在备机
checkStandbyIsExist=`ssh $masterHost "find $masterCheckStandbyLog|xargs grep -ri 'Standby address = $standbyHost'|wc -l"`
#备机是否存活
checkStandbyIsOk=`ssh $masterHost "find $masterCheckStandbyLog|xargs grep -ri 'Standby host passive'|wc -l"`
if [ "$checkStandbyIsExist" -ne "0" ] && [ "$checkStandbyIsOk" -eq "0" ]; then
#存在但没工作,移除重新添加
ssh $masterHost "echo y|gpinitstandby $standbyHost -r"
#kill postgres 的残余进程
`ps -ef|grep postgres |grep -v grep | awk '{print $2}' | xargs kill -9`
ssh $masterHost "echo y|gpinitstandby -s $standbyHost > $changeLog"
elif [ "$checkStandbyIsOk" -ne "0" ]; then
#存在且工作
echo "$dateTime---------------standby node is ok----------------" >> $log_file
echo "$dateTime:>>>standby node is ok"
elif [ "$checkStandbyIsExist" -ne "1" ]; then
#不存在,添加
echo "$dateTime---------------add standby----------------" >> $log_file
echo "$dateTime:>>>add standby"
ssh $masterHost "echo y|gpinitstandby -s $standbyHost > $changeLog"
else
#不存在,添加
echo "$dateTime---------------add standby----------------" >> $log_file
echo "$dateTime:>>>add standby"
ssh $masterHost "echo y|gpinitstandby -s $standbyHost > $changeLog"
fi
addStandbyIsOk=`ssh $masterHost "find $changeLog|xargs grep -ri 'Successfully created standby master on $standbyHost'|wc -l"`
if [ "$addStandbyIsOk" -ne "0" ]; then
echo "$dateTime---------------Successfully created standby master on $standbyHost----------------" >> $log_file
echo "$dateTime:>>>Successfully created standby master on $standbyHost"
else
echo "$dateTime---------------Failed created standby master on $standbyHost----------------" >> $log_file
echo "$dateTime:>>>Failed created standby master on $standbyHost"
fi
fi
fi
fi
else
echo "warning: The current user is not supported"
fi
修改对应的文件目录地址即可或其他相关信息
x修改脚本的权限 chmod 755 /etc/keepalived/bin/*
现在就可以测试了
1.停掉gp集群:gpstop -f
2.gp主节点启动keepalived : systemctl start keepalived
3.gp备节点启动keepalived : systemctl start keepalived
4.gp主节点:ip a 主节点占vip
5.gp主节点:gpstate -s 、 gpstate -f 正常
6.navicat 通过gp主节点ip和vip可以连接postgres
7.gp主节点重启reboot
8.gp备节点接管vip接管gp
9.navicat 通过gp备节点ip和vip可以连接postgres
10.gp主节点启动keepalived : systemctl start keepalived
11.gp备节点:gpstate -s 、 gpstate -f 正常
12.gp备节点reboot
13.gp主节点接管vip接管gp
14.navicat 通过gp主节点ip和vip可以连接postgres
示意图