一 应用场景描述
线上业务使用RabbitMQ作为消息队列中间件,那么作为运维人员对RabbitMQ的监控就很重要,本文就针对如何从头到尾使用Zabbix来监控RabbitMQ进行说明。
二 RabbitMQ监控要点
RabbitMQ官方提供两种方法来管理和监控RabbitMQ。
1.使用rabbitmqctl管理和监控
Usage:
rabbitmqctl [-n <node>] [-q] <command> [<command options>]
查看虚拟主机
# rabbitmqctl list_vhosts
查看队列
# rabbitmqctl list_queues
查看exchanges
# rabbitmqctl list_exchanges
查看用户
# rabbitmqctl list_users
查看连接
# rabbitmqctl list_connections
查看消费者信息
# rabbitmqctl list_consumers
查看环境变量
# rabbitmqctl environment
查看未被确认的队列
# rabbitmqctl list_queues name messages_unacknowledged
查看单个队列的内存使用
# rabbitmqctl list_queues name memory
查看准备就绪的队列
# rabbitmqctl list_queues name messages_ready
2.使用RabbitMQ Management插件来监控和管理
开启Management插件
# rabbitmq-plugins enable rabbitmq_management
通过这样的网址访问可以看到RabbitMQ的状态
http://172.28.2.157:15672/cli/rabbitmqadmin
下载rabbitmqadmin管理工具
获取vhost列表
# curl -i -u guest:guest http://localhost:15672/api/vhosts
获取频道列表,限制显示格式
# curl -i -u guest:guest "http://localhost:15672/api/channels?sort=message_stats.publish_details.rate&sort_reverse=true&columns=name,message_stats.publish_details.rate,message_stats.deliver_get_details.rate"
显示概括信息
# curl -i -u guest:guest "http://localhost:15672/api/overview"
management_version 管理插件版本
cluster_name 整个RabbitMQ集群的名称,通过rabbitmqctl set_cluster_name 进行设置
publish 发布的消息总数
queue_totals 显示准备就绪的消息,未确认的消息,未提交的消息等
statistics_db_event_queue 显示还未必数据库处理的事件数量
consumers 消费者个数
queues 队列长度
exchanges 队列交换机的数量
connections 连接数
channels 频道数量
显示节点信息
# curl -i -u guest:guest "http://localhost:15672/api/nodes"
disk_free 磁盘剩余空间,以字节表示
disk_free_limit 磁盘报警的阀值
fd_used 使用掉的文件描述符数量
fd_total 可用的文件描述符数量
io_read_avg_time 读操作平均时间,毫秒为单位
io_read_bytes 总共读入磁盘数据大小,以字节为单位
io_read_count 总共读操作的数量
io_seek_avg_time seek操作的平均时间,毫秒单位
io_seek_count seek操作总量
io_sync_avg_time fsync操作的平均时间,毫秒为单位
io_sync_count fsync操作的总量
io_write_avg_time 每个磁盘写操作的平均时间,毫秒为单位
io_write_bytes 写入磁盘数据总量,以字节为单位
io_write_count 磁盘写操作总量
mem_used 内存使用字节
mem_limit 内存报警阀值,默认是总的物理内存的40%
mnesia_disk_tx_count 需要写入到磁盘的Mnesia事务的数量
mnesia_ram_tx_count 不需要写入到磁盘的Mnesia事务的数量
msg_store_write_count 写入到消息存储的消息数量
msg_store_read_count 从消息存储读入的消息数量
proc_used Erlang进程的使用数量
proc_total Erlang进程的最大数量
queue_index_journal_write_count 写入到队列索引日志的记录数量。每条记录表示一个被发布到队列,从消息队列中被投递出或者在消息队列中被q确认的消息
queue_index_read_count 从队列索引读出的记录数量
queue_index_write_count 写入到队列索引的记录数量
sockets_used 以socket方式使用掉的文件描述符数量
partitions
uptime 自从Erlang VM启动时,运行的时间,单位好毫秒
run_queue 等待运行的Erlang进程数量
processors 检测到被Erlang进程使用到的内核数量
net_ticktime 当前设置的内核tick time
查看频道信息
# curl -i -u guest:guest "http://localhost:15672/api/channels"
查看交换机信息
# curl -i -u guest:guest "http://localhost:15672/api/exchanges"
查看队列信息
# curl -i -u guest:guest "http://localhost:15672/api/queues"
查看vhosts信息
# curl -i -u guest:guest "http://localhost:15672/api/vhosts/?name=/"
三 编写监控脚本和添加Zabbix配置文件
监控脚本主要包括三个部分,监控overview,监控当前主机的节点信息,还有监控各个队列
根据网上的脚本进行了修改,新增加了很多监控项目,把原来脚本中的filter去掉了
这里顺便提一下,对于网上的各种代码,不能拿来就用,要结合自身的需求对代码进行分析,也可以提升自己的编码能力,如果只是一味地拿来就用,那永远也得不到提高。
rabbitmq_status.py
#!/usr/bin/env /usr/bin/python
'''Python module to query the RabbitMQ Management Plugin REST API and get
results that can then be used by Zabbix.
https://github.com/jasonmcintosh/rabbitmq-zabbix
'''
'''
This script is tested on RabbitMQ 3.5.3
'''
import json
import optparse
import socket
import urllib2
import subprocess
import tempfile
import os
import logging
logging.basicConfig(filename='/opt/logs/zabbix/rabbitmq_zabbix.log', level=logging.WARNING, format='%(asctime)s %(levelname)s: %(message)s')
class RabbitMQAPI(object):
'''Class for RabbitMQ Management API'''
def __init__(self, user_name='guest', password='guest', host_name='',
protocol='http', port=15672, conf='/opt/app/zabbix/conf/zabbix_agentd.conf', senderhostname=None):
self.user_name = user_name
self.password = password
self.host_name = host_name or socket.gethostname()
self.protocol = protocol
self.port = port
self.conf = conf or '/opt/app/zabbix/conf/zabbix_agentd.conf'
self.senderhostname = senderhostname if senderhostname else host_name
def call_api(self, path):
'''
All URIs will server only resource of type application/json,and will require HTTP basic authentication. The default username and password is guest/guest. /%sf is encoded for the default virtual host '/'
'''
url = '{0}://{1}:{2}/api/{3}'.format(self.protocol, self.host_name, self.port, path)
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, url, self.user_name, self.password)
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
logging.debug('Issue a rabbit API call to get data on ' + path)
######## json.loads() transfer json data to python data
######## json.dump() transfer python data to json data
return json.loads(urllib2.build_opener(handler).open(url).read())
def list_queues(self):
''' curl -i -u guest:guest http://localhost:15672/api/queues
return a list
'''
queues = []
for queue in self.call_api('queues'):
logging.debug("Discovered queue " + queue['name'])
element = {'{#VHOSTNAME}': queue['vhost'],
'{#QUEUENAME}': queue['name']
}
queues.append(element)
logging.debug('Discovered queue '+queue['vhost']+'/'+queue['name'])
return queues
def list_nodes(self):
'''Lists all rabbitMQ nodes in the cluster'''
nodes = []
for node in self.call_api('nodes'):
# We need to return the node name, because Zabbix
# does not support @ as an item parameter
name = node['name'].split('@')[1]
element = {'{#NODENAME}': name,
'{#NODETYPE}': node['type']}
nodes.append(element)
logging.debug('Discovered nodes '+name+'/'+node['type'])
return nodes
def check_queue(self):
'''Return the value for a specific item in a queue's details.'''
return_code = 0
#### use tempfile module to create a file on memory, will not be deleted when it is closed , because 'delete' argument is set to False
rdatafile = tempfile.NamedTemporaryFile(delete=False)
for queue in self.call_api('queues'):
self._get_queue_data(queue, rdatafile)
rdatafile.close()
return_code = self._send_queue_data(rdatafile)
#### os.unlink is used to remove a file
os.unlink(rdatafile.name)
return return_code
def _get_queue_data(self, queue, tmpfile):
'''Prepare the queue data for sending'''
'''
### one single queue's information like this #####
### curl -i -u guest:guest http://localhost:15672/api/queues dumps a list ###
{"memory":32064,"message_stats":{"ack":3870,"ack_details":{"rate":0.0},"deliver":3871,"deliver_details":{"rate":0.0},"deliver_get":3871,"deliver_get_details":{"rate":0.0},"disk_writes":3870,"disk_writes_details":{"rate":0.0},"publish":3870,"publish_details":{"rate":0.0},"redeliver":1,"redeliver_details":{"rate":0.0}},"messages":0,"messages_details":{"rate":0.0},"messages_ready":0,"messages_ready_details":{"rate":0.0},"messages_unacknowledged":0,"messages_unacknowledged_details":{"rate":0.0},"idle_since":"2016-03-01 22:04:22","consumer_utilisation":"","policy":"","exclusive_consumer_tag":"","consumers":4,"recoverable_slaves":"","state":"running","messages_ram":0,"messages_ready_ram":0,"messages_unacknowledged_ram":0,"messages_persistent":0,"message_bytes":0,"message_bytes_ready":0,"message_bytes_unacknowledged":0,"message_bytes_ram":0,"message_bytes_persistent":0,"disk_reads":0,"disk_writes":3870,"backing_queue_status":{"q1":0,"q2":0,"delta":["delta",0,0,0],"q3":0,"q4":0,"len":0,"target_ram_count":"infinity","next_seq_id":3870,"avg_ingress_rate":0.060962064328682466,"avg_egress_rate":0.060962064328682466,"avg_ack_ingress_rate":0.060962064328682466,"avg_ack_egress_rate":0.060962064328682466},"name":"app000","vhost":"/","durable":true,"auto_delete":false,"arguments":{},"node":"rabbit@test2"}
'''
for item in [ 'memory','messages','messages_ready','messages_unacknowledged','consumers' ]:
#key = rabbitmq.queues[/,queue_memory,queue.helloWorld]
key = '"rabbitmq.queues[{0},queue_{1},{2}]"'.format(queue['vhost'], item, queue['name'])
### if item is in queue,value=queue[item],else value=0
value = queue.get(item, 0)
logging.debug("SENDER_DATA: - %s %s" % (key,value))
tmpfile.write("- %s %s\n" % (key, value))
## This is a non standard bit of information added after the standard items
for item in ['deliver_get', 'publish']:
key = '"rabbitmq.queues[{0},queue_message_stats_{1},{2}]"'.format(queue['vhost'], item, queue['name'])
value = queue.get('message_stats', {}).get(item, 0)
logging.debug("SENDER_DATA: - %s %s" % (key,value))
tmpfile.write("- %s %s\n" % (key, value))
def _send_queue_data(self, tmpfile):
'''Send the queue data to Zabbix.'''
'''Get key value from temp file. '''
args = '/opt/app/zabbix/sbin/zabbix_sender -c {0} -i {1}'
if self.senderhostname:
args = args + " -s " + self.senderhostname
return_code = 0
process = subprocess.Popen(args.format(self.conf, tmpfile.name),
shell=True, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = process.communicate()
logging.debug("Finished sending data")
return_code = process.wait()
logging.info("Found return code of " + str(return_code))
if return_code != 0:
logging.warning(out)
logging.warning(err)
else:
logging.debug(err)
logging.debug(out)
return return_code
def check_aliveness(self):
'''Check the aliveness status of a given vhost. '''
'''virtual host '/' should be encoded as '/%2f' '''
return self.call_api('aliveness-test/%2f')['status']
def check_overview(self, item):
'''First, check the overview specific items'''
''' curl -i -u guest:guest http://localhost:15672/api/overview '''
## rabbitmq[overview,connections]
if item in [ 'channels','connections','consumers','exchanges','queues' ]:
return self.call_api('overview').get('object_totals').get(item,0)
## rabbitmq[overview,messages]
elif item in [ 'messages','messages_ready','messages_unacknowledged' ]:
return self.call_api('overview').get('queue_totals').get(item,0)
elif item == 'message_stats_deliver_get':
return self.call_api('overview').get('message_stats', {}).get('deliver_get',0)
elif item == 'message_stats_publish':
return self.call_api('overview').get('message_stats', {}).get('publish',0)
elif item == 'message_stats_ack':
return self.call_api('overview').get('message_stats', {}).get('ack',0)
elif item == 'message_stats_redeliver':
return self.call_api('overview').get('message_stats', {}).get('redeliver',0)
elif item == 'rabbitmq_version':
return self.call_api('overview').get('rabbitmq_version', 'None')
def check_server(self,item,node_name):
'''Return the value for a specific item in a node's details. '''
'''curl -i -u guest:guest http://localhost:15672/api/nodes'''
'''return a list'''
# hostname hk-prod-mq1.example.com
# self.call_api('nodes')[0]['name'] rabbit@hk-prod-mq1
node_name = node_name.split('.')[0]
for nodeData in self.call_api('nodes'):
if node_name in nodeData['name']:
return nodeData.get(item,0)
return 'Not Found'
def main():
'''Command-line parameters and decoding for Zabbix use/consumption.'''
choices = ['list_queues', 'list_nodes', 'queues', 'check_aliveness',
'overview','server']
parser = optparse.OptionParser()
parser.add_option('--username', help='RabbitMQ API username',
default='guest')
parser.add_option('--password', help='RabbitMQ API password',
default='guest')
parser.add_option('--hostname', help='RabbitMQ API host',
default=socket.gethostname())
parser.add_option('--protocol', help='RabbitMQ API protocol (http or https)',
default='http')
parser.add_option('--port', help='RabbitMQ API port', type='int',
default=15672)
parser.add_option('--check', type='choice', choices=choices,
help='Type of check')
parser.add_option('--metric', help='Which metric to evaluate', default='')
parser.add_option('--node', help='Which node to check (valid for --check=server)')
parser.add_option('--conf', default='/opt/app/zabbix/conf/zabbix_agentd.conf')
parser.add_option('--senderhostname', default='', help='Allows including a sender parameter on calls to zabbix_sender')
(options, args) = parser.parse_args()
if not options.check:
parser.error('At least one check should be specified')
logging.debug("Started trying to process data")
api = RabbitMQAPI(user_name=options.username, password=options.password,
host_name=options.hostname, protocol=options.protocol, port=options.port,
conf=options.conf, senderhostname=options.senderhostname)
if options.check == 'list_queues':
print json.dumps({'data': api.list_queues()},indent=4,separators=(',',':'))
elif options.check == 'list_nodes':
print json.dumps({'data': api.list_nodes()},indent=4,separators=(',',':'))
elif options.check == 'queues':
print api.check_queue()
elif options.check == 'check_aliveness':
print api.check_aliveness()
elif options.check == 'overview':
#rabbitmq[overview,connections]
#--check=overview --metric=connections
if not options.metric:
parser.error('Missing required parameter: "metric"')
else:
if options.node:
print api.check_overview(options.metric)
else:
print api.check_overview(options.metric)
elif options.check == 'server':
#rabbitmq[server,sockets_used]
#--check=server --metric=sockets_used
if not options.metric:
parser.error('Missing required parameter: "metric"')
else:
if options.node:
print api.check_server(options.metric,options.node)
else:
print api.check_server(options.metric,api.host_name)
if __name__ == '__main__':
main()
脚本思路:
使用urllib2模块访问RabbitMQ的API接口
对API接口返回的数据进行处理
overview和nodes的数据通过zabbix_agent获取,queues通过zabbix_sender推送给zabbix,zabbix_sender推送之前需要有一个zabbix_agent的key进行主动触发
rabbitmq_status.conf
UserParameter=rabbitmq.discovery_queue,/usr/bin/python /opt/app/zabbix/sbin/rabbitmq_status.py --check=list_queues
UserParameter=rabbitmq.queues,/usr/bin/python /opt/app/zabbix/sbin/rabbitmq_status.py --check=queues
UserParameter=rabbitmq[*],/usr/bin/python /opt/app/zabbix/sbin/rabbitmq_status.py --check=$1 --metric=$2
四 添加Zabbix监控模板
模板参加附件
参考文档:
http://blog.thomasvandoren.com/monitoring-rabbitmq-queues-with-zabbix.html
http://www.rabbitmq.com/how.html#management
https://github.com/alfss/zabbix-rabbitmq
https://cdn.rawgit.com/rabbitmq/rabbitmq-management/rabbitmq_v3_6_0/priv/www/api/index.html
https://github.com/jasonmcintosh/rabbitmq-zabbix
http://chase-seibert.github.io/blog/2011/07/01/checking-rabbitmq-queue-sizeage-with-nagios.html
转载于:https://blog.51cto.com/john88wang/1745824