keepalived主备状态以及zabbix监控脑裂
环境说明:
服务器类型 | IP地址 | 系统版本 |
---|---|---|
zabbix | 192.168.134.162 | centos 8 |
haproxy1(master) | 192.168.134.148 | centos 8 |
haproxy2(slave) | 192.168.134.155 | centos 8 |
web1 | 192.168.134.151 | centos 8 |
web2 | 192.168.134.154 | centos 8 |
注:本次高可用虚拟IP(VIP)地址暂定为 192.168.134.250
配置keepalived监控主备状态的脚本
keepalived通过脚本监控负载均衡机的状态
在master主机上编写脚本
[root@haproxy1 ~]# mkdir /scripts && cd /scripts
[root@haproxy1 scripts]# vim check_haproxy.sh
#此处是过滤进程,看其是否有进程 wc -l:表示查看行数
#!/bin/bash
nginx_status=$(ps -ef|grep -Ev "grep|$0"|grep '\bnginx\b'|wc -l)
if [ $nginx_status -lt 1 ];then #看进程是否小于1,就说明没有进程,代表服务挂机此处停止keepalived,以此来释放资源
systemctl stop keepalived
fi
[root@haproxy1 scripts]# chmod +x check_haproxy.sh
[root@haproxy1 scripts]# ll
total 4
-rwxr-xr-x 1 root root 150 Oct 15 22:31 check_haproxy.sh
在slave主机上编写脚本
//该脚本是为了得知本主机是处于哪种状态(mastert|slave),当本主机变成master主机后,则进行第一个判断,当haproxy服务进程数小于1时,开启haproxy服务,继续进行负载均衡;而当本主机变回slave主机后,则进行第二个判断,当haproxy服务进程大于0时,关闭haproxy服务,避免与master主机上的haproxy服务产生冲突,从而导致流量无法正确转移到后端的web页面主机
[root@haproxy2 ~]# mkdir /scripts && cd /scripts
[root@haproxy2 scripts]# vim notify.sh
[root@haproxy2 scripts]# cat notify.sh
#!/bin/bash
case "$1" in
master)
haproxy_status=$(ps -ef|grep -Ev "grep|$0"|grep '\bhaproxy\b'|wc -l)
if [ $haproxy_status -lt 1 ];then
systemctl start haproxy
fi
;;
backup)
haproxy_status=$(ps -ef|grep -Ev "grep|$0"|grep '\bhaproxy\b'|wc -l)
if [ $haproxy_status -gt 0 ];then
systemctl stop haproxy
fi
;;
*)
echo "Usage:$0 master|backup VIP"
;;
esac
[root@haproxy2 scripts]# chmod +x notify.sh
[root@haproxy2 scripts]# ls
notify.sh
[root@haproxy2 scripts]# ll
total 4
-rwxr-xr-x 1 root root 444 Oct 15 22:34 notify.sh
配置keepalived加入监控脚本的配置
[root@haproxy1 ~]# vim /etc/keepalived/keepalived.conf
[root@haproxy1 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id haproxy1
}
vrrp_script haproxy_check {
script "/scripts/check_haproxy.sh"
interval 1
fall 3
weight -40
}
vrrp_instance VI_1 {
state MASTER
interface ens160
virtual_router_id 80
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 12345678
}
virtual_ipaddress {
192.168.134.250
}
track_script {
haproxy_check
}
}
virtual_server 192.168.134.250 80 {
delay_loop 6
lb_algo rr
lb_kind NAT
persistence_timeout 50
protocol TCP
real_server 192.168.134.151 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
real_server 192.168.134.154 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
}
[root@haproxy1 ~]# systemctl restart keepalived.service
配置备keepalived
[root@haproxy2 ~]# vim /etc/keepalived/keepalived.conf
[root@haproxy2 ~]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id haproxy2
}
vrrp_instance VI_1 {
state BACKUP
interface ens160
virtual_router_id 80
priority 80
advert_int 1
authentication {
auth_type PASS
auth_pass 12345678
}
virtual_ipaddress {
192.168.134.250
}
notify_master "/scripts/notify.sh master"
notify_backup "/scripts/notify.sh backup"
}
virtual_server 192.168.134.250 80 {
delay_loop 6
lb_algo rr
lb_kind NAT
persistence_timeout 50
protocol TCP
real_server 192.168.134.151 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
real_server 192.168.134.154 80 {
weight 1
TCP_CHECK {
connect_port 80
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
}
[root@haproxy2 ~]# systemctl restart keepalived.service
测试keepalived是否监控haproxy负载均衡机
测试前查看服务状态
master主机
//keepalived服务和haproxy服务正常运行,查看vip
[root@haproxy1 ~]# systemctl is-active haproxy.service
active
[root@haproxy1 ~]# systemctl is-active keepalived.service
active
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.134.250
inet 192.168.134.250/32 scope global ens160
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.134.250 | wc -l
1
[root@haproxy1 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:80 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 0.0.0.0:8189 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
slave主机
//haproxy服务关闭,keepalved服务保持开启
[root@haproxy2 ~]# systemctl is-active haproxy.service
inactive
[root@haproxy2 ~]# systemctl is-active keepalived.service
active
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.134.250
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.134.250 | wc -l
0
[root@haproxy2 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
//因为vip在master主机上,所以salve主机上没有vip
模拟master主机(haproxy1)的haproxy服务超负载导致服务关闭
//关闭haproxy服务后,keepalived配置文件中追踪的脚本检测到haproxy服务进程消失,则执行关闭keepalived服务的命令,自动释放内存,同时vip也会跳转到slave主机(haproxy2)主机上,从而成为新的master
[root@haproxy1 ~]# systemctl stop haproxy.service
[root@haproxy1 ~]# systemctl is-active haproxy.service
inactive
[root@haproxy1 ~]# systemctl is-active keepalived.service
inactive
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.134.250
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.134.250 | wc -l
0
[root@haproxy1 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
//此时我们再去查看slave主机(haproxy2)上的haproxy服务和vip,通过keepalived配置文件中的脚本检测,vip跳转到本机,本机成为新的master主机之后,执行master主机的任务,从而开启haproxy服务,继续进行负载均衡的任务
[root@haproxy2 ~]# systemctl is-active haproxy.service
active
[root@haproxy2 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 128 0.0.0.0:80 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 0.0.0.0:8189 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.134.250
inet 192.168.134.250/32 scope global ens160
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.134.250 | wc -l
1
//当我们的运维人员检修之后,使得源master主机(haproxy1)上的haproxy服务重启运行之后,我们再次开启keepalived服务,我们的vip将会被抢回来,从而重新成为master,而salve主机上的则会失去master的权利
#master主机(haproxy1)
[root@haproxy1 ~]# systemctl start haproxy.service keepalived.service
[root@haproxy1 ~]# systemctl is-active haproxy.service
active
[root@haproxy1 ~]# systemctl is-active keepalived.service
active
[root@haproxy1 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:80 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 0.0.0.0:8189 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.134.250
inet 192.168.134.250/32 scope global ens160
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.134.250 | wc -l
1
#slave主机(haproxy2)
[root@haproxy2 ~]# systemctl is-active haproxy.service
inactive
[root@haproxy2 ~]# systemctl is-active keepalived.service
active
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.134.250
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.134.250 | wc -l
0
[root@haproxy2 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
对keepalived进行监控
对keepalived服务的监控应在备用服务器上进行,通过添加zabbix自定义监控进行。
监控的信息是备上面有无VIP地址(192.168.134.250)
备机上出现VIP有两种情况:
发生了脑裂
正常的主备切换
监控只是监控发生脑裂的可能性,不能保证一定是发生了脑裂,因为正常的主备切换VIP也是会到备上的。
[root@haproxy2 ~]# cd /scripts/
[root@haproxy2 scripts]# vim check_keepalived.sh
[root@haproxy2 scripts]# cat check_keepalived.sh
#!/bin/bash
if [ `ip a show ens160 | grep 192.168.134.250 | wc -l` -ne 0 ]
then
echo "1"
else
echo "0"
fi
[root@haproxy2 scripts]# chmod +x check_keepalived.sh
[root@haproxy2 scripts]# ll
total 8
-rwxr-xr-x 1 root root 115 Oct 14 00:06 check_keepalived.sh
-rwxr-xr-x 1 root root 444 Oct 13 22:01 notify.sh
在要slave主机(haproxy2)上安装agent
//下载zabbix
[root@haproxy2 ~]# wget https://cdn.zabbix.com/zabbix/sources/stable/6.4/zabbix-6.4.6.tar.gz
--2023-10-15 23:04:47-- https://cdn.zabbix.com/zabbix/sources/stable/6.4/zabbix-6.4.6.tar.gz
Resolving cdn.zabbix.com (cdn.zabbix.com)... 172.67.69.4, 104.26.6.148, 104.26.7.148, ...
Connecting to cdn.zabbix.com (cdn.zabbix.com)|172.67.69.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 43744978 (42M) [application/octet-stream]
Saving to: ‘zabbix-6.4.6.tar.gz’
zabbix-6.4.6.tar 100%[========>] 41.72M 166KB/s in 3m 36s
2023-10-15 23:08:25 (198 KB/s) - ‘zabbix-6.4.6.tar.gz’ saved [43744978/43744978]
[root@haproxy2 ~]# ls
anaconda-ks.cfg haproxy-2.7.10.tar.gz
haproxy-2.7.10 zabbix-6.4.6.tar.gz
//创建用户并解压zabbix压缩包
[root@haproxy2 ~]# tar xf zabbix-6.4.6.tar.gz -C /usr/local/
[root@haproxy2 ~]# cd /usr/local/ && ls
bin games include lib64 sbin src
etc haproxy lib libexec share zabbix-6.4.6
[root@haproxy2 local]# cd zabbix-6.4.6/
[root@haproxy2 zabbix-6.4.6]# useradd -r -M -s /sbin/nologin zabbix
//安装编译安装所需要的软件包
[root@haproxy2 zabbix-6.4.6]# yum -y install gcc gcc-c++ make
//进入zabbix-6.4.6的目录进行编译
[root@haproxy2 zabbix-6.4.6]# ./configure --enable-agent
省略. . .
***********************************************************
* Now run 'make install' *
* *
* Thank you for using Zabbix! *
* <http://www.zabbix.com> *
***********************************************************
//报这个则表示编译成功,可直接使用make install安装
[root@haproxy2 zabbix-6.4.6]# make install
//修改zabbix客户端的配置文件
[root@haproxy2 zabbix-6.4.6]# vim /usr/local/etc/zabbix_agentd.conf
[root@haproxy2 zabbix-6.4.6]# grep -A2 '# ServerActive=' /usr/local/etc/zabbix_agentd.conf
# ServerActive=
ServerActive=192.168.134.162 //改为server端的ip
[root@haproxy2 zabbix-6.4.6]# grep -A2 '# Server=' /usr/local/etc/zabbix_agentd.conf
# Server=
Server=192.168.134.162 //改为server端的ip
[root@haproxy2 zabbix-6.4.6]# grep -A2 '# Hostname=' /usr/local/etc/zabbix_agentd.conf
# Hostname=
Hostname=haproxy2 //修改主机名,必须全局唯一(要与zabbix网页上一致)
//设置zabbix_agentd开机自启,将zabbix_server端配置好了的service文件传到slave(haproxy2)这台主机
[root@zabbix ~]# scp /usr/lib/systemd/system/zabbix_agentd.service root@192.168.134.151:/usr/lib/systemd/system/
root@192.168.134.151's password:
zabbix_agentd.service 100% 227 211.4KB/s 00:00
//重新加载文件
[root@haproxy2 ~]# systemctl daemon-reload
[root@haproxy2 ~]# systemctl enable --now zabbix_agentd.service
Created symlink /etc/systemd/system/multi-user.target.wants/zabbix_agentd.service → /usr/lib/systemd/system/zabbix_agentd.service.
[root@haproxy2 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:10050 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
服务启动成功
编辑一个脚本文件,用于获取服务的进程(脚本默认放在同一个地方,此处我们创建一个专门用于放置脚本文件的目录,放置到/scripts,不要放在用户家目录下面,防止后续出现权限受限的问题)
//该脚本得到的是主机上是否存在vip,如果slave主机(haproxy2)上存在vip,则说明master主机(haproxy1)上的haproxy服务出现问题,返回值报1,说明服务出现问题
[root@haproxy2 ~]# cd /scripts/
[root@haproxy2 scripts]# vim check_keepalived.sh
[root@haproxy2 scripts]# cat check_keepalived.sh
#!/bin/bash
if [ `ip a show ens160 | grep 192.168.134.250 | wc -l` -ne 0 ]
then
echo "1"
else
echo "0"
fi
[root@haproxy2 scripts]# chmod +x check_keepalived.sh
[root@haproxy2 scripts]# ./check_keepalived.sh //显示0说明该主机上没有vip
0
//进入配置文件,创建自定义监控任务
[root@haproxy2 scripts]# vim /usr/local/etc/zabbix_agentd.conf
[root@haproxy2 scripts]# tail -1 /usr/local/etc/zabbix_agentd.conf
UserParameter=check_keepalived,/bin/bash /scripts/check_keepalived.sh
//因为我们修改了配置文件,所以需要重启服务,重新读取配置文件内容
[root@haproxy2 scripts]# systemctl restart zabbix_agentd.service
//创建自定义监控任务后,我们需要在server端去测试一下是否能接受到被监控端的值
[root@client ~]# zabbix_get -s 192.168.134.151 -k check_keepalived
0 //成功接收到值
主机上的配置完成
添加主机
添加主机成功
创建监控项
创建成功
创建触发器
触发成功
检查是否连接成功:
模拟master主机(haproxy1)的haproxy服务超负载导致服务关闭
此时zabbix页面正常
当主设备down掉:
master主机(haproxy1)
[root@haproxy1 ~]# systemctl stop haproxy.service
[root@haproxy1 ~]# systemctl is-active haproxy.service
inactive
[root@haproxy1 ~]# systemctl is-active keepalived.service
inactive
[root@haproxy1 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22 [::]:*
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.195.100
[root@haproxy1 ~]# ip a show ens160 | grep 192.168.195.100 | wc -l
0
slave主机(haproxy2)
[root@haproxy2 ~]# systemctl is-active haproxy.service
active
[root@haproxy2 ~]# systemctl is-active keepalived.service
active
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.134.250
inet 192.168.134.250/32 scope global ens160
[root@haproxy2 ~]# ip a show ens160 | grep 192.168.134.250 | wc -l
1
[root@haproxy2 ~]# ss -antl
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:8189 0.0.0.0:*
LISTEN 0 128 0.0.0.0:10050 0.0.0.0:*
LISTEN 0 128 0.0.0.0:80 0.0.0.0:*
LISTEN 0 128 0.0.0.0:22 0.0.0.0:*
LISTEN 0 128 [::]:22
此时zabbix网页会有提示