zabbix使用爬虫监控ibmv3700存储设备
背景介绍
对于中小型企业来讲,zabbix是一款非常优秀的监控工具。最近,想要用它来监控存储设备。对于一般的硬件(交换机、服务器、存储)设备,可以通过IPMI、SNMP的方式获取监控数据。尤其是SNMP,很多产品都会提供对应接口,很方便的就可以获取到监控数据,很多常用设备的监控模板,网上的高手都已经写好了,我们将模板下载下来,导入即可使用。但是,总有一些“奇葩”的产品。比如我们公司使用的IBM V3700
,这种设备在SNMP层面,只支持snmp trap(设备产生告警信息,由设备自身将告警信息发送给服务端。),无法通过snmp get/walk的方式获取到监控数据。
在网上看到人针对zabbix 4.0写的监控IBM V3700
的方案。地址“https://share.zabbix.com/storage-devices/ibm/ibm-storwize-v3700”。看了他的python脚本,老夫大受启发。**他的原理是通过ssh的方式登录到存储设备上,执行命令,然后将命令的返回值格式化后,通过zabbix采集器
,将监控数据上报给zabbix。目前,我最想监控的是存储上的硬盘的状态,如果硬盘出现故障,希望其触发告警。**可是我对存储的命令不熟悉,查了一下官方文档,简单的看了一下,并没有看到和硬盘状态相关的信息。
好在这种大多数硬件设备,通过http协议,提供了一个管理接口。管理员登录后,可以查看到硬盘的状态。然后就想到 通过爬虫的方式,获取硬盘状态数据,然后将数据格式化后,通过zabbix采集器,发送给zabbix server。 此处只是演示大致的逻辑,爬虫通过shell来实现。
实现原理
- 定义监控模板。
- 最关键的监控项是ibm_3700_collection,键值是
ibm_3700_get_store_status.sh["{$STORAGE_IP}","{$USERNAME}","{$PASSWORD}"]
- 因为存储上面的硬盘数量会调整,所以定义一个
自动发现规则
。
-
创建主机,并关联模板。
-
脚本名称是ibm_3700_get_store_status.sh,注意脚本的权限,要保证zabbix进程的用户拥有执行权限。将脚本放到zabbix服务端的zabbix/share/zabbix/externalscripts目录下。
-
执行脚本后,脚本会爬取监控数据,将数据保存在server主机的/tmp目录下,生成三个临时文件。并将数据上报到zabbix server。
脚本内容
- 监控脚本:ibm_3700_get_store_status.sh
[root@zabbixser externalscripts]# cat ibm_3700_get_store_status.sh
#!/bin/bash
export LANG=en_CN.UTF-8
STORAGE_IP=$1
USERNAME=$2
PASSWORD=$3
get_jsessionid () {
local url="https://${STORAGE_IP}"
jsessionid=$(curl -k -I ${url} 2> /dev/null| awk '/JSESSIONID/{print $2}' | cut -d ';' -f 1)
if [[ ! ${jsessionid} ]];then
echo "${url} failed."
exit 1
fi
}
get_auth () {
local url="https://${STORAGE_IP}/login"
local data="login=${USERNAME}&password=${PASSWORD}"
get_jsessionid
_auth=$(curl -b ${jsessionid} -d ${data} -k ${url} -i 2> /dev/null | awk '/auth/{print $2}' | cut -d ';' -f 1)
if [[ ! ${_auth} ]];then
echo "user,password error."
exit 1
fi
}
get_device () {
local url="https://${STORAGE_IP}/RPCAdapter"
timestame=$(date -d "-8 hour" +%FT%H:%M:%S)
local data='{"clazz":"com.ibm.evo.rpc.RPCRequest","methodClazz":"com.ibm.svc.gui.logic.PhysicalRPC","methodName":"getInternalDriveInfo","methodArgs":[],"guiUsage":[{"timestamp":"'${timestame}'.088Z","event":"Fisheye Navigation Clicked","details":{"fisheyeNavLevel":2,"fisheyeNavLabel":"内部存储器"},"eventType":"fisheyeNavClick"}]}'
curl -b ${jsessionid} -b ${_auth} -d "${data}" -k ${url} -o /tmp/ibm_3700_${STORAGE_IP}.tmp 2> /dev/null
[[ $? == 0 ]] || exit 1
}
get_value_uid () {
TMPFILE="/tmp/ibm_3700_${STORAGE_IP}.tmp"
VALUEFILE="/tmp/ibm_3700_discovery${STORAGE_IP}.tmp"
STORAGE_JSON='{"data": ['
JSON_LENGTH=$(cat ${TMPFILE} | jq '.result.drives | length')
let JSON_LENGTH=${JSON_LENGTH}-1
for ID in $(seq 0 ${JSON_LENGTH});do
NAME=$(cat ${TMPFILE} | jq ".result.drives | .[${ID}].uid")
ID_NAME='{"{#ID}": "'${ID}
ID_NAME=${ID_NAME}'","{#NAME}": '${NAME}
ID_NAME=${ID_NAME}'}'
STORAGE_JSON=$(echo -n "${STORAGE_JSON}${ID_NAME}")
if [[ ${ID} != ${JSON_LENGTH} ]];then
STORAGE_JSON=$(echo -n "${STORAGE_JSON},")
fi
done
STORAGE_JSON=$(echo -n "${STORAGE_JSON}]}")
echo -n "${STORAGE_IP} " > ${VALUEFILE}
echo -n "diskuid " >> ${VALUEFILE}
echo ${STORAGE_JSON} >> ${VALUEFILE}
/data/zabbix/bin/zabbix_sender -z 127.0.0.1 -i ${VALUEFILE} > /dev/null
echo 1
sleep 2
VALUEFILE="/tmp/ibm_3700_value${STORAGE_IP}.tmp"
> ${VALUEFILE}
for ID in $(seq 0 ${JSON_LENGTH});do
STORAGE_STATUS=$(cat ${TMPFILE} | jq ".result.drives | .[${ID}].status")
echo -n "${STORAGE_IP} " >> ${VALUEFILE}
echo -n "ibm_3700_get_status_[${ID}] " >> ${VALUEFILE}
echo ${STORAGE_STATUS} >> ${VALUEFILE}
done
for ID in $(seq 0 ${JSON_LENGTH});do
PORT1_STATUS=$(cat ${TMPFILE} | jq ".result.drives | .[${ID}].port1Status")
echo -n "${STORAGE_IP} " >> ${VALUEFILE}
echo -n "ibm_3700_get_port1Status_[${ID}] " >> ${VALUEFILE}
echo ${STORAGE_STATUS} >> ${VALUEFILE}
done
for ID in $(seq 0 ${JSON_LENGTH});do
PORT1_STATUS=$(cat ${TMPFILE} | jq ".result.drives | .[${ID}].port2Status")
echo -n "${STORAGE_IP} " >> ${VALUEFILE}
echo -n "ibm_3700_get_port2Status_[${ID}] " >> ${VALUEFILE}
echo ${STORAGE_STATUS} >> ${VALUEFILE}
done
/data/zabbix/bin/zabbix_sender -z 127.0.0.1 -i ${VALUEFILE} > /dev/null
}
get_auth
get_device
get_value_uid
- 爬取的临时文件:ibm_3700_192.168.40.4.tmp
{
"clazz": "com.ibm.svc.devicelayer.api.output.TBirdDriveWithClassBean",
"driveClass": "io_grp0-XXXXXXXXXXXXXXXXXXXXXXX",
"id": 6,
"errorSequenceNumber": null,
"uid": "XXXXXXXXXXXX",
"capacity": 899647799296,
"blockSize": 512,
"vendorId": "IBM-A040",
"productId": "XXXXXXXXX ",
"fruPartNumber": "XXXXXX",
"fruIdentity": "XXXXXXXXXXXXXXXXX",
"rpm": 10000,
"firmwareLevel": "B56M",
"fpgaLevel": "",
"mdiskId": 0,
"mdiskName": "mdisk0",
"memberId": 6,
"enclosureId": 1,
"slotId": 7,
"nodeId": null,
"nodeName": "",
"quorumId": null,
"wasSpare": false,
"encrypted": false,
"interfaceSpeed": "6Gb",
"status": "online",
"use": "member",
"techType": "sas_hdd",
"port1Status": "online",
"port2Status": "online",
"healthState": null,
"autoManage": "inactive"
}
- 自动发现的临时文件:ibm_3700_discovery192.168.40.4.tmp
[root@zabbixser tmp]# cat ibm_3700_discovery192.168.40.4.tmp
192.168.40.4 diskuid {"data": [{"{#ID}": "0","{#NAME}": "5000c5006b9d4f07"},{"{#ID}": "1","{#NAME}": "5000c5006bd4be6f"},{"{#ID}": "2","{#NAME}": "5000c5006b97b907"},{"{#ID}": "3","{#NAME}": "5000c5006b98e7af"},{"{#ID}": "4","{#NAME}": "5000c5006b9b8c7f"},{"{#ID}": "5","{#NAME}": "5000c5006bd21917"},{"{#ID}": "6","{#NAME}": "5000c5006b991e6b"}]}
- 监控数据的临时文件:ibm_3700_value192.168.40.4.tmp
[root@zabbixser tmp]# cat ibm_3700_value192.168.40.4.tmp
192.168.40.4 ibm_3700_get_status_[0] "online"
192.168.40.4 ibm_3700_get_status_[1] "online"
192.168.40.4 ibm_3700_get_status_[2] "online"
192.168.40.4 ibm_3700_get_status_[3] "online"
192.168.40.4 ibm_3700_get_status_[4] "online"
192.168.40.4 ibm_3700_get_status_[5] "online"
192.168.40.4 ibm_3700_get_status_[6] "online"
192.168.40.4 ibm_3700_get_port1Status_[0] "online"
192.168.40.4 ibm_3700_get_port1Status_[1] "online"
192.168.40.4 ibm_3700_get_port1Status_[2] "online"
192.168.40.4 ibm_3700_get_port1Status_[3] "online"
192.168.40.4 ibm_3700_get_port1Status_[4] "online"
192.168.40.4 ibm_3700_get_port1Status_[5] "online"
192.168.40.4 ibm_3700_get_port1Status_[6] "online"
192.168.40.4 ibm_3700_get_port2Status_[0] "online"
192.168.40.4 ibm_3700_get_port2Status_[1] "online"
192.168.40.4 ibm_3700_get_port2Status_[2] "online"
192.168.40.4 ibm_3700_get_port2Status_[3] "online"
192.168.40.4 ibm_3700_get_port2Status_[4] "online"
192.168.40.4 ibm_3700_get_port2Status_[5] "online"
192.168.40.4 ibm_3700_get_port2Status_[6] "online"
注意事项:
- 由于脚本的执行时间不稳定,建议调整一下zaerver的timeout时间。
- 注意脚本的权限。
- 监控模板有很多可以优化的地方,比如将“online”转换成数字0/1、添加一个触发器等等。
- 很多硬件设备都提供了这种管理口,此处只提供思路。
- 因为密码存在暴露风险,最好创建一个普通用户,来实现监控。