nginx masterIP 192.168.119.120
nginx backupIP 192.168.119.121
VIP 192.168.119.130
MASTER
! Configuration File for keepalived
global_defs {
router_id LVS1
}
vrrp_script chk_nginx {
script "/etc/keepalived/chk_nginx.sh"
interval 3
weight -20
fall 3
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.119.130
}
track_script {
chk_nginx
}
}
BACKUP
! Configuration File for keepalived
global_defs {
router_id LVS2
}
vrrp_script chk_nginx {
script "/etc/keepalived/chk_nginx.sh"
interval 3
weight -20
fall 3
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.119.130
}
track_script {
chk_nginx
}
}
chk_nginx.sh
#!/bin/bash
#
COUNT1=`ss -anpt | grep nginx | wc -l `
if [ $COUNT1 -eq 0 ] ; then
/usr/local/sbin/nginx -s start
sleep 2
COUNT2=`ss -anpt | grep nginx | wc -l`
if [ $COUNT2 -eq 0 ] ; then
/usr/bin/kill -15 `cat /var/run/keepalived.pid`
echo -e "keeplived is stoped"
else
exit 0
fi
fi
tail -50f /var/log/messages
遇到一个问题 /etc/keepalived/chk_nginx.sh exited due to signal 15
起初的chk_nginx.sh脚本
#!/bin/bash -x
#
COUNT1=`ss -anpt | grep nginx | wc -l `
if [ $COUNT1 -eq 0 ] ; then
/usr/local/sbin/nginx -s stop
sleep 1
/usr/local/sbin/nginx -s start
sleep 2
COUNT2=`ss -anpt | grep nginx | wc -l`
if [ $COUNT2 -eq 0 ] ; then
/usr/bin/kill -15 `cat /var/run/keepalived.pid`
echo -e "keeplived is stoped"
else
exit 0
fi
fi
将上面配置文件中vrrp_script interval的值要修改成小于chk_nginx.sh中sleep的值(3),我直接把stop给删掉了
另外chk_nginx.sh sleep的值不宜过大最好不要超过3,否则也会出现exited due to signal 15问题,偶尔会出time out问题
备注:
vrrp_script
告诉 keepalived 在什么情况下切换,所以尤为重要。可以有多个 vrrp_script
script : 自己写的检测脚本。也可以是一行命令如killall -0 nginx
interval 2 : 每2s检测一次
weight -5 : 检测失败(脚本返回非0)则优先级 -5
fall 2 : 检测连续 2 次失败才算确定是真失败。会用weight减少优先级(1-255之间)
rise 1 : 检测 1 次成功就算成功。但不修改优先级
这里要提示一下script一般有2种写法:
1.通过脚本执行的返回结果,改变优先级,keepalived继续发送通告消息,backup比较优先级再决定
2.脚本里面检测到异常,直接关闭keepalived进程,backup机器接收不到advertisement会抢占IP