nagios+check_logifiles实现日志监控

最新推荐文章于 2022-12-01 16:13:32 发布

samuel-preamble

最新推荐文章于 2022-12-01 16:13:32 发布

阅读量1k

点赞数

分类专栏： nagios

nagios 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

nagios+check_logifiles实现日志监控

日志检查时我们平时用的非常多的一种监控方式，检查日志我们需要使用nagios插件，比如nagios自带的check_logfile，功能比较有限；我们使用ConSol Labs出品的check_logfiles,它能够处理截断日志，支持宏定义，支持正则等功能，使我们的监控更加灵活。

一.安装

1.安装check_logfiles

[html]view plain copy 
    
 
 tar -zxvf check_logfiles-3.6.3.tar.gz   
 cd /usr/local/src/ check_logfiles-3.6.3  
 ./configure --prefix=/usr/local/nagios/ --with-nagios-user=nagios --with-nagios-group=nagios --with-seekfiles-dir=/usr/local/nagios/var/tmp --with-protocols-dir=/usr/local/nagios/var/tmp --with-perl=/usr/bin/perl --with-gzip=/bin/gzip  
 make  

此时可能会报错：

[html]view plain copy 
    
 
 CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /usr/local/src/check_logfiles-3.6.3/missing autoconf  
 aclocal.m4:21: warning: this file was generated for autoconf 2.69.  
 You have another version of autoconf.  It may work, but is not guaranteed to.  
 If you have problems, you may need to regenerate the build system entirely.  
 To do so, use the procedure documented by the package, typically 'autoreconf'.  
 configure.ac:4: error: Autoconf version 2.65 or higher is required  
 aclocal.m4:278: AM_INIT_AUTOMAKE is expanded from...  
 configure.ac:4: the top level  
 autom4te: /usr/bin/m4 failed with exit status: 63  
 WARNING: 'autoconf' is probably too old.  
          You should only need it if you modified 'configure.ac',  
          or m4 files included by it.  
          The 'autoconf' program is part of the GNU Autoconf package:  
          <http://www.gnu.org/software/autoconf/>  
          It also requires GNU m4 and Perl in order to run:  
          <http://www.gnu.org/software/m4/>  
          <http://www.perl.org/>  
 make: *** [configure] 错误 63  

这是由于服务器的autoconf版本问题导致，正如提示说“aclocal.m4:21: warning: this file was generated for autoconf 2.69.” 编译需要autoconf的版本为2.6.9，而我们的版本为

[html]view plain copy 
    
 
 [root@nagios monitors]# /usr/bin/autoconf -V  
 autoconf (GNU Autoconf) 2.63  
 Copyright (C) 2008 Free Software Foundation, Inc.  
 License GPLv2+: GNU GPL version 2 or later  
 <http://gnu.org/licenses/old-licenses/gpl-2.0.html>  
 This is free software: you are free to change and redistribute it.  
 There is NO WARRANTY, to the extent permitted by law.  
   
 Written by David J. MacKenzie and Akim Demaille.  

因此我们需要升级将autoconf版本升级为2.69.

2.安装autoconf

[html]view plain copy 
    
 
 [root@test src]# wget http://ftp.gnu.org/gnu/autoconf/autoconf-2.69.tar.gz  
 [root@test src]# cd autoconf-2.69  
 [root@test src]# ./configure --prefix=/usr  
 [root@test src]# make && make install  

注意：我们一定要将其安装到/usr下，否则编译check_logfiles时不会使用新版的autoconf

3.编译安装check_logfiles

make && make install

安装完成后check_logfiles插件将安装到/usr/local/nagios/libexec下，我们需要配置下权限

chown nagios.nagios /usr/local/nagios/libexec/check_logfiles

另外，由于我们检查下是否有/usr/local/nagios/var/tmp这个目录，如果没有的话还要新建，因为我们之前将seekfile及protocols目录安装在此。

至此，安装完毕。

二.配置

首先我们来看下check_logfiles自带的帮助信息

[html]view plain copy 
    
 
 [root@nagios src]# /usr/local/nagios/libexec/check_logfiles -h  
 This Nagios Plugin comes with absolutely NO WARRANTY. You may use  
 it on your own risk!  
 Copyright by ConSol Software GmbH, Gerhard Lausser.  
   
 This plugin looks for patterns in logfiles, even in those who were rotated  
 since the last run of this plugin.  
   
 You can find the complete documentation at   
 http://labs.consol.de/nagios/check_logfiles/  
   
 Usage: check_logfiles [-t timeout] -f <configfile>  
   
 The configfile looks like this:  
   
 $seekfilesdir = '/opt/nagios/var/tmp';      写状态信息的目录，这里面记录已经检查过的日志内容，相当于历史记录  
 # where the state information will be saved.  
   
 $protocolsdir = '/opt/nagios/var/tmp';                  写协议信息的目录，这里面记录日志检查的匹配信息  
 # where protocols with found patterns will be stored.  
   
 $scriptpath = '/opt/nagios/var/tmp';                可调用的脚本或程序  
 # where scripts will be searched for.  
   
 $MACROS = { CL_DISK01 => "/dev/dsk/c0d1", CL_DISK02 => "/dev/dsk/c0d2" };定义宏，我们可以调用的变量  
   
 @searches = (此处为配置文件的内容，我们可以通过配置文件来执行程序，也可以通过在命令行中直接定义。通过配置文件更方便  
   {  
     tag => 'temperature',<span style="white-space:pre">    </span>tag可以理解为一个自定义的标志，它将在生成状态信息或协议信息中作为名字中的一部分使用，并没有实际的意义  
     logfile => '/var/adm/syslog/syslog.log',<span style="white-space:pre"> </span>logfile为所要监控的日志文件  
     rotation => 'bmwhpux',<span style="white-space:pre">   </span>rotation如果有截断日志的话用来定义如何匹配截断日志  
     criticalpatterns => ['OVERTEMP_EMERG', 'Power supply failed'],<span style="white-space:pre">   </span>严重错误，可以匹配一个或多个正则表达式  
     warningpatterns => ['OVERTEMP_CRIT', 'Corrected ECC Error'],<span style="white-space:pre"> </span>警告错误，可以匹配一个或多个正则表达式  
     options => 'script,protocol,nocount',<span style="white-space:pre">    </span>选项列表，我们可以选择启动脚本，写协议，不计数等操作  
     script => 'sendnsca_cmd'<span style="white-space:pre"> </span>脚本的名字  
   },  
   {  
     tag => 'scsi',  
     logfile => '/var/adm/messages',  
     rotation => 'solaris',  
     criticalpatterns => 'Sense Key: Not Ready',  
     criticalexceptions => 'Sense Key: Not Ready /dev/testdisk',  
     options => 'noprotocol'  
   },  
   {  
     tag => 'logins',  
     logfile => '/var/adm/messages',  
     rotation => 'solaris',  
     criticalpatterns => ['illegal key', 'read error.*$CL_DISK01$'],  
     criticalthreshold => 4  
     warningpatterns => ['read error.*$CL_DISK02$'],  
   }  
 );  

以上将各个项目统一写到配置文件中，当然也可以将其放入命令行中调用，两种调用方式如下：

[html]view plain copy 
    
 
 [root@nagios src]# /usr/local/nagios/libexec/check_logfiles  
 Usage: check_logfiles [-t timeout] -f <configfile> [--searches=tag1,tag2,...]  
        check_logfiles [-t timeout] --logfile=<logfile> --tag=<tag> --rotation=<rotation>  
                       --criticalpattern=<regexp> --warningpattern=<regexp>  

三.应用

1.我们在被监控端编辑一个配置文件，如：

[html]view plain copy 
    
 
 [root@usvr-218 var]# vim /usr/local/nagios/var/log.cfg  
 @searches = (  
     {  
         tag => 'web_monitor',  
         logfile => '/var/log/web_monitor.log',  
         criticalpatterns => ['nginx has restart','nginx is down'],  
         warningpatterns => ['500','302','502']  
         #options => 'noprotocol'  
     }  
 );  

我们定义了一个标志web_monitor,检查的日志文件为/var/log/web_monitor.log,当日志信息中匹配ciriticalpattern中的内容时会报严重错误，当匹配warningcriticals中的内容时会报警告错误；状态信息和协议信息会写入到/usr/local/nagios/var/tmp中，如

log._var_log_web_monitor.log.web_monitor，其中web_monitor就是我们配置中的tag

[html]view plain copy 
    
 
 [root@usvr-218 tmp]# cat log._var_log_web_monitor.log.web_monitor   
 $state = {  
            'runcount' => 17,  
            'serviceoutput' => '',  
            'logoffset' => 642985,  
            'runtime' => 1431504819,  
            'devino' => '64768:1178440',  
            'privatestate' => {  
                                'runcount' => 17,  
                                'lastruntime' => 1431504220,  
                                'logfile' => '/var/log/web_monitor.log'  
                              },  
            'logtime' => 1431504602,  
            'servicestateid' => 0,  
            'tag' => 'web_monitor'  
          };  
   
   
 1;  

被监控端的check_logfiles配置好了后，我们还需在nrpe.cfg中添加命令

[html]view plain copy 
    
 command[check_logfile]=/usr/local/nagios/libexec/check_logfiles -f /usr/local/nagios/var/log.cfg  
   
 service xinetd reload

2.被监控端端我们再来看下监控端

[html]view plain copy 
    
 
 define service{  
     use                     nrpe-service         ; Name of service template to use  
     host_name               test  
     service_description     web_monitor  
     check_command           check_nrpe!check_logfile  
     check_interval          10    
     notifications_enabled   1     
     service_groups          logfile_check  
     contact_groups          test  
     }