日志检查时我们平时用的非常多的一种监控方式,检查日志我们需要使用nagios插件,比如nagios自带的check_logfile,功能比较有限;我们使用ConSol Labs出品的check_logfiles,它能够处理截断日志,支持宏定义,支持正则等功能,使我们的监控更加灵活。
一.安装
1.安装check_logfiles
- tar -zxvf check_logfiles-3.6.3.tar.gz
- cd /usr/local/src/ check_logfiles-3.6.3
- ./configure --prefix=/usr/local/nagios/ --with-nagios-user=nagios --with-nagios-group=nagios --with-seekfiles-dir=/usr/local/nagios/var/tmp --with-protocols-dir=/usr/local/nagios/var/tmp --with-perl=/usr/bin/perl --with-gzip=/bin/gzip
- make
此时可能会报错:
- CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /usr/local/src/check_logfiles-3.6.3/missing autoconf
- aclocal.m4:21: warning: this file was generated for autoconf 2.69.
- You have another version of autoconf. It may work, but is not guaranteed to.
- If you have problems, you may need to regenerate the build system entirely.
- To do so, use the procedure documented by the package, typically 'autoreconf'.
- configure.ac:4: error: Autoconf version 2.65 or higher is required
- aclocal.m4:278: AM_INIT_AUTOMAKE is expanded from...
- configure.ac:4: the top level
- autom4te: /usr/bin/m4 failed with exit status: 63
- WARNING: 'autoconf' is probably too old.
- You should only need it if you modified 'configure.ac',
- or m4 files included by it.
- The 'autoconf' program is part of the GNU Autoconf package:
- <http://www.gnu.org/software/autoconf/>
- It also requires GNU m4 and Perl in order to run:
- <http://www.gnu.org/software/m4/>
- <http://www.perl.org/>
- make: *** [configure] 错误 63
这是由于服务器的autoconf版本问题导致,正如提示说“aclocal.m4:21: warning: this file was generated for autoconf 2.69.” 编译需要autoconf的版本为2.6.9,而我们的版本为
- [root@nagios monitors]# /usr/bin/autoconf -V
- autoconf (GNU Autoconf) 2.63
- Copyright (C) 2008 Free Software Foundation, Inc.
- License GPLv2+: GNU GPL version 2 or later
- <http://gnu.org/licenses/old-licenses/gpl-2.0.html>
- This is free software: you are free to change and redistribute it.
- There is NO WARRANTY, to the extent permitted by law.
-
- Written by David J. MacKenzie and Akim Demaille.
因此我们需要升级将autoconf版本升级为2.69.
2.安装autoconf
- [root@test src]# wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz
- [root@test src]# cd autoconf-2.69
- [root@test src]# ./configure --prefix=/usr
- [root@test src]# make && make install
注意:我们一定要将其安装到/usr下,否则编译check_logfiles时不会使用新版的autoconf
3.编译安装check_logfiles
make && make install
安装完成后check_logfiles插件将安装到/usr/local/nagios/libexec下,我们需要配置下权限
chown nagios.nagios /usr/local/nagios/libexec/check_logfiles
另外,由于我们检查下是否有/usr/local/nagios/var/tmp这个目录,如果没有的话还要新建,因为我们之前将seekfile及protocols目录安装在此。
至此,安装完毕。
二.配置
首先我们来看下check_logfiles自带的帮助信息
- [root@nagios src]# /usr/local/nagios/libexec/check_logfiles -h
- This Nagios Plugin comes with absolutely NO WARRANTY. You may use
- it on your own risk!
- Copyright by ConSol Software GmbH, Gerhard Lausser.
-
- This plugin looks for patterns in logfiles, even in those who were rotated
- since the last run of this plugin.
-
- You can find the complete documentation at
- http://labs.consol.de/nagios/check_logfiles/
-
- Usage: check_logfiles [-t timeout] -f <configfile>
-
- The configfile looks like this:
-
- $seekfilesdir = '/opt/nagios/var/tmp'; 写状态信息的目录,这里面记录已经检查过的日志内容,相当于历史记录
- # where the state information will be saved.
-
- $protocolsdir = '/opt/nagios/var/tmp'; 写协议信息的目录,这里面记录日志检查的匹配信息
- # where protocols with found patterns will be stored.
-
- $scriptpath = '/opt/nagios/var/tmp'; 可调用的脚本或程序
- # where scripts will be searched for.
-
- $MACROS = { CL_DISK01 => "/dev/dsk/c0d1", CL_DISK02 => "/dev/dsk/c0d2" };定义宏,我们可以调用的变量
-
- @searches = (此处为配置文件的内容,我们可以通过配置文件来执行程序,也可以通过在命令行中直接定义。通过配置文件更方便
- {
- tag => 'temperature',<span style="white-space:pre"> </span>tag可以理解为一个自定义的标志,它将在生成状态信息或协议信息中作为名字中的一部分使用,并没有实际的意义
- logfile => '/var/adm/syslog/syslog.log',<span style="white-space:pre"> </span>logfile为所要监控的日志文件
- rotation => 'bmwhpux',<span style="white-space:pre"> </span>rotation如果有截断日志的话用来定义如何匹配截断日志
- criticalpatterns => ['OVERTEMP_EMERG', 'Power supply failed'],<span style="white-space:pre"> </span>严重错误,可以匹配一个或多个正则表达式
- warningpatterns => ['OVERTEMP_CRIT', 'Corrected ECC Error'],<span style="white-space:pre"> </span>警告错误,可以匹配一个或多个正则表达式
- options => 'script,protocol,nocount',<span style="white-space:pre"> </span>选项列表,我们可以选择启动脚本,写协议,不计数等操作
- script => 'sendnsca_cmd'<span style="white-space:pre"> </span>脚本的名字
- },
- {
- tag => 'scsi',
- logfile => '/var/adm/messages',
- rotation => 'solaris',
- criticalpatterns => 'Sense Key: Not Ready',
- criticalexceptions => 'Sense Key: Not Ready /dev/testdisk',
- options => 'noprotocol'
- },
- {
- tag => 'logins',
- logfile => '/var/adm/messages',
- rotation => 'solaris',
- criticalpatterns => ['illegal key', 'read error.*$CL_DISK01$'],
- criticalthreshold => 4
- warningpatterns => ['read error.*$CL_DISK02$'],
- }
- );
以上将各个项目统一写到配置文件中,当然也可以将其放入命令行中调用,两种调用方式如下:
- [root@nagios src]# /usr/local/nagios/libexec/check_logfiles
- Usage: check_logfiles [-t timeout] -f <configfile> [--searches=tag1,tag2,...]
- check_logfiles [-t timeout] --logfile=<logfile> --tag=<tag> --rotation=<rotation>
- --criticalpattern=<regexp> --warningpattern=<regexp>
三.应用
1.我们在被监控端编辑一个配置文件,如:
- [root@usvr-218 var]# vim /usr/local/nagios/var/log.cfg
- @searches = (
- {
- tag => 'web_monitor',
- logfile => '/var/log/web_monitor.log',
- criticalpatterns => ['nginx has restart','nginx is down'],
- warningpatterns => ['500','302','502']
- #options => 'noprotocol'
- }
- );
我们定义了一个标志web_monitor,检查的日志文件为/var/log/web_monitor.log,当日志信息中匹配ciriticalpattern中的内容时会报严重错误,当匹配warningcriticals中的内容时会报警告错误;状态信息和协议信息会写入到/usr/local/nagios/var/tmp中,如
log._var_log_web_monitor.log.web_monitor,其中web_monitor就是我们配置中的tag
- [root@usvr-218 tmp]# cat log._var_log_web_monitor.log.web_monitor
- $state = {
- 'runcount' => 17,
- 'serviceoutput' => '',
- 'logoffset' => 642985,
- 'runtime' => 1431504819,
- 'devino' => '64768:1178440',
- 'privatestate' => {
- 'runcount' => 17,
- 'lastruntime' => 1431504220,
- 'logfile' => '/var/log/web_monitor.log'
- },
- 'logtime' => 1431504602,
- 'servicestateid' => 0,
- 'tag' => 'web_monitor'
- };
-
-
- 1;
被监控端的check_logfiles配置好了后,我们还需在nrpe.cfg中添加命令
- command[check_logfile]=/usr/local/nagios/libexec/check_logfiles -f /usr/local/nagios/var/log.cfg
-
- service xinetd reload
2.被监控端端我们再来看下监控端
- define service{
- use nrpe-service ; Name of service template to use
- host_name test
- service_description web_monitor
- check_command check_nrpe!check_logfile
- check_interval 10
- notifications_enabled 1
- service_groups logfile_check
- contact_groups test
- }
重启后,就可以看到我们的监控项了