zabbix高可用方案

​本次采用rhcs高可用套件pacemaker+corosync+pcs完成zabbix系统高可用部署。当然zabbix官方也已经从6.0版本开始原生支持高可用,不再依赖第三方组件来实现高可用,此文通过使用红帽官方高可用套件来实现zabbix系统的高可用性,对比使用keepalived实现zabbix高可用,此方案更加简洁高效。有兴趣的也可以参考此方案配置举一反三尝试实现其他业务场景的高可用性。

1、服务器规划
服务器主机名地址软件
zabbix-server1192.168.59.128pacemaker corosync pcs zabbix5.x php72 httpd
zabbix-server2192.168.59.129pacemaker corosync pcs zabbix5.x php72 httpd
mysql-server192.168.59.130mariadb
vip:192.168.59.162
数据库安装及zabbix安装忽略
2、系统环境初始化
  • 时间同步
  • 关闭系统防火墙
  • 关闭selinux
  • 主机名解析
3、高可用套件安装(两台zabbix主机上执行)
安装
yum install pacemaker pcs -y
4、设置集群用户mima
echo 123456 |passwd --stdin hacluster
5、启动pcsd
systemctl enable pcsd && systemctl start pcsd
6、认证(在任意一台节点执行即可)
pcs cluster auth zabbix-server1 zabbix-server2
Username: hacluster
Password:
zabbix-server1: Authorized
zabbix-server2: Authorized
7、创建集群(在任意一台节点执行即可)
pcs cluster setup --name zabbixserver zabbix-server1 zabbix-server2
8、启动集群并设置开机自启(在任意一台节点执行即可)
pcs cluster start --all
pcs cluster enable --all
9、查看集群状态
pcs status cluster

image.png

10、配置服务
# 由于没有配置fence设备,所以关闭stonith
pcs property set stonith-enabled=false
# 由于集群是双节点,所以关闭仲裁机制
pcs property set no-quorum-policy=ignore
# 配置vip
pcs resource create cluster_vip ocf:heartbeat:IPaddr2 ip=192.168.59.162 cidr_netmask=24 op monitor interval=20s
# 配置php-fpm
pcs resource create php-fpm systemd:rh-php72-php-fpm op monitor interval=10s
# 配置httpd
pcs resource create httpd systemd:httpd op monitor interval=10s
# 配置zabbix-server
pcs resource create zabbix_server systemd:zabbix-server op monitor interval=10s
# 配置zabbix-agent
pcs resource create zabbix_agent systemd:zabbix-agent op monitor interval=10s
# 配置资源组
pcs resource group add grp_zabbix_httpd php-fpm zabbix_server httpd zabbix_agent
# 配置资源绑定(确保vip资源和zabbix服务在同一台节点上启动)
pcs constraint colocation add grp_zabbix_httpd cluster_vip INFINITY
# 配置资源启动顺序
pcs constraint order cluster_vip then grp_zabbix_httpd

# 查看资源状态
pcs status
[root@zabbix-server1 web]# pcs status
Cluster name: zabbixserver
Stack: corosync
Current DC: zabbix-server2 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Sat Aug 20 09:43:39 2022
Last change: Fri Aug 19 18:53:24 2022 by root via cibadmin on zabbix-server1

2 nodes configured
5 resource instances configured

Online: [ zabbix-server1 zabbix-server2 ]

Full list of resources:

 cluster_vip	(ocf::heartbeat:IPaddr2):	Started zabbix-server1
 Resource Group: grp_zabbix_httpd
     php-fpm	(systemd:rh-php72-php-fpm):	Started zabbix-server1
     zabbix_server	(systemd:zabbix-server):	Started zabbix-server1
     httpd	(systemd:httpd):	Started zabbix-server1
     zabbix_agent	(systemd:zabbix-agent):	Started zabbix-server1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

image.png 可以看到,目前所有的资源都在zabbix-server1节点上启动。访问http://vip/zabbix测试 image.png

11、故障转移测试
# 将zabbix-server1设置为standby或者直接关机,查看资源转移及运行情况
[root@zabbix-server1 ~]# pcs node standby
[root@zabbix-server1 ~]# pcs status nodes
Pacemaker Nodes:
 Online: zabbix-server2
 Standby: zabbix-server1
 Standby with resource(s) running:
 Maintenance:
 Offline:
Pacemaker Remote Nodes:
 Online:
 Standby:
 Standby with resource(s) running:
 Maintenance:
 Offline:
 
 [root@zabbix-server1 ~]# pcs status
Cluster name: zabbixserver
Stack: corosync
Current DC: zabbix-server2 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Sat Aug 20 09:54:10 2022
Last change: Sat Aug 20 09:51:42 2022 by root via cibadmin on zabbix-server1

2 nodes configured
5 resource instances configured

Node zabbix-server1: standby
Online: [ zabbix-server2 ]

Full list of resources:

 cluster_vip	(ocf::heartbeat:IPaddr2):	Started zabbix-server2
 Resource Group: grp_zabbix_httpd
     php-fpm	(systemd:rh-php72-php-fpm):	Started zabbix-server2
     zabbix_server	(systemd:zabbix-server):	Started zabbix-server2
     httpd	(systemd:httpd):	Started zabbix-server2
     zabbix_agent	(systemd:zabbix-agent):	Started zabbix-server2

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

image.png ​ 可以看到资源已自动切换至zabbix-server2节点,并正常提供服务,zabbix-server1上面的服务自动停止,实现了zabbix监控系统不需要人为干预的故障自动转移,同时同一时间只用一个zabbix-server处于运行状态,保证了后端数据库数据的一致性。

12、架构优化

​ 在本次部署中没有部署zabbix-proxy代理组件,有兴趣的朋友也可以尝试将zabbix-proxy组件也加加进来一并部署,同样后端数据库也可以参考同样的方式实现主备高可用,真正实现zabbix系统的分部署高可用。

如果文章对您有帮助,还想了解更过关于k8s相关的实战经验,请微信扫描下方二维码关注“IT运维图谱”公众号或着通过微信搜一搜关注公众号。 扫码_搜索联合传播样式-白色版.png