简介
之前讲过基于zookeeper+leveldb实现activemq集群,但是没有关于这方面的合适监控方案,因此本文通过python脚本,由nagios调用实现监控zookeeper+activemq。
ps:在网上找了一些nagios插件,用perl或ruby需要安装额外的组件,安装起来就比较麻烦;而通过python什么也不需要安装,nagios直接就可以调用,可能没有人家专业,但对我来说够用了,见仁见智吧。
原理
利用zookeeper+activemq的配置方案,能够提供(3-1)/2的容错率,也就是3台服务器允许宕机一台,而不影响整个集群的对外提供服务;也就是说3台activemq中只有一台能够对外提供61616端口进行服务器,同时只有一个8161端口对集群的activemq队列情况进行管理。
我们对3台服务器的61616端口进行监控,若检测到有一个61616端口存在,则说明集群正常运行;没有61616端口,则说明集群无法对外提供服务。另外我们由于集群中也只有一个8161端口开放,因此我们从8161端口获取amq中的队列的pending,consumers,enqueue,dequeue数据,将其交给nagios进行画图。
python脚本
vim check_amq_status.py
#!/usr/bin/env python
#-*- coding: UTF-8 -*-
import os
import sys
import commands
import pycurl
import StringIO
count = 0
if len(sys.argv) != 3:
print 'Usage:python check_amq.py ip queue'
exit(1)
ip_input = sys.argv[1]
queue_input = sys.argv[2]
hosts = ['10','11','12']
#判断集群61616端口在哪台服务器启动
for host in hosts:
order1 = 'nc -z ' + '10.50.10.' + host + ' 61616'
#print order
output1 = commands.getstatusoutput(order1)
status1 = int(output1[0])
if status1 == 0:
ip = host
count = count + 1
#判断集群的8161端口在哪台服务器启动,并获取监控数据
order2 = 'nc -z ' + '10.50.10.' + host + ' 8161'
output2 = commands.getstatusoutput(order2)
status2 = int(output2[0])
if status2 == 0:
#从此url中获取监控数据
url = 'http://10.50.10.' + ip + ':8161/admin/queues.jsp'
#print url
curl = pycurl.Curl()
body = StringIO.StringIO()
curl.setopt(pycurl.URL, url)
curl.setopt(pycurl.USERPWD, 'admin:test')
#连接时间
curl.setopt(pycurl.CONNECTTIMEOUT, 5)
#超时时间
curl.setopt(pycurl.TIMEOUT, 5)
#写回调
curl.setopt(pycurl.WRITEFUNCTION, body.write)
curl.perform()
response = body.getvalue()
body.close()
curl.close()
lines = response.split('\n')
#获取每个队列的名称,pending,consumers,enqueue,dequeue
for i in range(len(lines)):
#for k in queue_list:
temp = queue_input + '</a></td>'
if lines[i].find(temp) == 0:
queue = lines[i].strip('</td>').rstrip('</a')
pending = lines[i+1].strip('</td>')
consumers = lines[i+2].strip('</td>')
enqueue = lines[i+3].strip('</td>')
dequeue = lines[i+4].strip('</td>')
#集群中有一台存活即正常,没有存活即集群无法对外提供服务;另收集集群8161web页面的数据用于监控队列情况
if count == 1:
print 'OK:activemq ' + ip + ' is online|pending=' + pending + ';consumers=' + consumers + ';enqueue=' + enqueue + ';dequeue=' + dequeue
exit(0)
else :
print 'ERROR:all activemq are offline'
exit(2)
按如下执行:
#执行时输入ip地址和队列名称
[root@test ~]# python check_amq_status.py
Usage:python check_amq.py ip queue
#正常运行,监控amq中的test队列
[root@test ~]# python check_amq_status.py 10.50.10.11 test
OK:activemq 11 is online|pending=0;consumers=1650;enqueue=0;dequeue=0
#不正常
[root@test ~]# python check_amq_status.py 10.50.10.11 test
ERROR:all activemq are offline
配置nagios
#配置监控命令
define command {
command_name check_amq_status
command_line /usr/bin/python $USER1$/check_amq_status.py $ARG1$ $ARG2$
}
#监控amq中的test队列
define service{
use local-service,srv-pnp
host_name server_11
service_description amq_test_11
check_command check_amq_status!10.50.10.11!test
service_groups amq_cluster_services
check_interval 2
notifications_enabled 1
notification_interval 0
contact_groups admin
}
语法检查无误,启动查看出图情况,默认情况出图会报错:
XML file "/usr/local/pnp4nagios/var/perfdata/server-11/amq_test_11.xml" not found. Read FAQ online
这是由于我们在pnp4nagios中没有配置相关画图模板,我们自己创建即可:
cd /usr/local/pnp4nagios/share/templates
vim check_amq_status.php
<?php
$ds_name[0] = "pending";
$opt[0] = "--title \"number of pending messages\"";
$def[0] = "";
$ds_name[1] = "consumers";
$opt[1] = "--title \"number of consumers\"";
$def[1] = "";
$ds_name[2] = "enqueue";
$opt[2] = "--title \"messages enqueued\"";
$def[2] = "";
$ds_name[3] = "dequeue";
$opt[3] = "--title \"messages dequeued\"";
$def[3] = "";
$def[0] .= rrd::def("var1", $RRDFILE[1], $DS[1], "AVERAGE");
$def[0] .= rrd::area("var1", "#0000ff", rrd::cut($NAME[1],15));
$def[0] .= rrd::gprint("var1", array("LAST","MAX","AVERAGE"),
"%6.2lf %s");
$def[1] .= rrd::def("var2", $RRDFILE[2], $DS[2], "AVERAGE");
//$def[1] .= rrd::area("var2", "#ff0000", rrd::cut($NAME[2],15), "STACK");
$def[1] .= rrd::area("var2", "#ff0000", rrd::cut($NAME[2],15));
$def[1] .= rrd::gprint("var2", array("LAST", "MAX", "AVERAGE"),
"%6.2lf %s");
$def[2] .= rrd::def("var3", $RRDFILE[3], $DS[3], "AVERAGE");
$def[2] .= rrd::area("var3", "#00ff00", rrd::cut($NAME[3],15),"STACK");
$def[2] .= rrd::gprint("var3", array("LAST", "MAX", "AVERAGE"),
"%6.2lf %s");
//$def[1] .= rrd::def("var4", $RRDFILE[4], $DS[4], "AVERAGE");
//$def[1] .= rrd::line1("var4", "#ff0000", rrd::cut($NAME[4],15));
//$def[1] .= rrd::gprint("var4", array("LAST", "MAX", "AVERAGE"),
// "%6.2lf %s");
$def[3] .= rrd::def("var4", $RRDFILE[4], $DS[4], "AVERAGE");
$def[3] .= rrd::area("var4", "#00ffff", rrd::cut($NAME[4],15),"STACK");
$def[3] .= rrd::gprint("var4", array("LAST", "MAX", "AVERAGE"),
"%6.2lf %s");
?>
至此,nagios可以正常画图了,我们可以通过图表了解activemq及集群的运行情况。
总结
有的nagios监控总是依赖于网上现成的插件,实行拿来主义,虽然简单,但是效果一般,只有最适合自己的才是最好的,因此有的情况下还是下决心自己写监控脚本。
注意: 没有必要为activemq集群中的每台服务器都监控,因为61616和8161上的数据是同步的,因此我们找集群中的一台来监控画图即可,其他服务器只需监控没有必要画图。