简单脚本：需要匹配日志中的ERROR字眼，打印特定的输出结果，结合监控agent进行告警

本文分享了一种实时监控特定日志中ERROR信息的方法，并通过shell脚本输出自定义告警信息。考虑到日志文件量大及IO消耗，选择了shell脚本而非Python实现。

需求是：需要实时（监控设定的时间是每分钟进行一次扫描）扫描特定的日志，匹配日志中的ERROR日志，然后输出自定义的字符串。监控agent会匹配字符串，判断是否告警。这里只分享一下脚本，监控机制有空再聊。

原先准备的python脚本：

由于环境为redhat5.9，python环境为2.4.4,所以用不了with open的方式。

#encoding:utf-8
import re


# with open('/Users/yahaha/Desktop/ums-gateway.log') as f:
#     t1 = time.time()
#     logs = f.read()
#     # print(logs)

f = None
try:
    f = open('/Users/yahaha/Desktop/ums-gateway.log')
    logs = f.read()

finally:
    if f is not None:
        f.close()

pattern = re.compile(r'ERROR')
if re.findall(pattern, logs):
    print('error')
else
    pass


shell：

#!/bin/bash

find_error(){
log=`tail -20000 /Users/yahaha/Desktop/ums-gateway.log | grep "ERROR"`
# echo $log
if [ $? -ne 1 ];then
echo "短信通道异常"
else
echo "ok"
fi
}

find_error

日志文件的量比较大，而且10分钟会被清理一次，容量保持在2G左右，所以考虑到IO消耗，和打开文件速度考虑，还是选用的了shell脚本。

监控的配置:本机上安装了监控agent，通过agent自动下发脚本，根据脚本返回的参数，设置阈值，判断是否告警。