nagios+logstash实时监控java日志(一)

简介

  nagios插件check_logfiles可以监控日志,但是实时性及监控效果都不尽如人意。因此介绍naigos的nsca被动监控结合logstash进行日志的实时监控。此种方式适合日质量比较比较小的情况下,如果日志量比较大,logstash还需要配合redis/kafka等工具进行。

需求

nagios 实时监控java日志,当日志中出现ERROR字段时,进行报警通知。

IPhostname组件备注
192.168.1.1nagios servernsca+nagiosnagios服务器
192.168.1.2nagios clientsend_nsca+logstashjava日志

以上服务器的组件安装可参考以下博文:
nagios nsca被动监控
ELKstack日志收集系统

实现

一、nagios server端配置
由于之前nagios server已经配置好,我们继续引用以下监控服务项:

define host{
    use         linux-server
    host_name   nagios-client
    alias       passive-2
    address     192.168.1.2
}
define service{
        use                             passive_service
        host_name                       nagios-client
        service_description             java service
        check_command                   check_dummy!0
        notifications_enabled           1   
}

二、nagios client端配置

1.配置logstash

input {
        log4j {
                type => "log4j-java"
                port => 4560
        }
}
output {
#为方便调试我们可以将logstash设置成console输出到界面或输出到文件
        stdout {
#               codec => "json"
                codec => "rubydebug"
         }
#       file {
#               path => "/logs/out.log"
#       }
}
#启动
/usr/local/logstash/bin/logstash agent -f /usr/local/logstash/etc/logstash.conf -l /usr/local/logstash/logs/stdout.log

2.配置java的log4j输出
由于java由多种日志框架,而logstash可以支持log4j,因此我们需要更改我们java框架的日志打印使用log4j

vim log4j.properties
#加上logstash配置
log4j.rootLogger=INFO, stdout, logstash

log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

#logstash
log4j.appender.logstash=org.apache.log4j.net.SocketAppender
log4j.appender.logstash.Port=4560
log4j.appender.logstash.RemoteHost=192.168.1.2
log4j.appender.logstash.ReconnectionDelay=60000
log4j.appender.logstash.LocationInfo=true
log4j.appender.logstash.Threshold = INFO
#也可自定义日志输出格式
log4j.appender.logstash.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss}[%p] [%t] [%c] [%F:%L]-%m%n

重新启动java程序后,log4j会持续尝试链接你配置的logstash ip:port,建立链接后,即开始发送日志数据。
注意:打印INFO级别的日志会很多,因此log4j传输到logstash速度慢可能会引起java程序所在的服务器io压力大或是java程序处理慢,进而导致java程序异常。建议过滤使用ERROR级别日志,设置如下:

log4j.appender.logstash.Threshold = ERROR

这样stdout打印INFO级别的日志,输出到logstash的是ERROR级别日志。

3.测试
输入访问java程序的命令后,logstash控制台会在屏幕打印日志

{
        "message" => "jdbc:mysql://192.168.1.1::3306;characterEncoding=utf8",
       "@version" => "1",
     "@timestamp" => "2017-03-20T01:24:31.477Z",
      "timestamp" => 1489973070924,
           "path" => "com.atomikos.jdbc.AtomikosXAConnectionFactory",
       "priority" => "WARN",
    "logger_name" => "com.atomikos.jdbc.AtomikosXAConnectionFactory",
         "thread" => "Atomikos:3",
          "class" => "com.atomikos.logging.Slf4jLogger",
           "file" => "Slf4jLogger.java:12",
         "method" => "logWarning",
           "host" => "192.168.1.2:28337",
           "type" => "log4j-java"
}

从logstash的output输出的json格式的数据来看,我们可以根据”priority”字段来进行nagios告警,当”priority”=INFO时,正常;当”priority”=ERROR时,报警通知;另外方便我们迅速定位问题,当报警时,我们需要知道”@timestamp”和”thread”来查找具体问题原因,也就是message_format => “%{@timestamp} %{thread}”。因此我们的logstash具体可以这样配置:

input {
        log4j {
                type => "log4j-java"
                port => 4560
        }
}
output {
#        stdout {
#               codec => "json"
#                codec => "rubydebug"
#        }
#       file {
#               path => "/logs/out.log"
#       }
        if [priority] == "ERROR" {
                nagios_nsca {
                        host => "192.168.1.1"
                        port => "5667"
                        message_format => "%{@timestamp} %{thread}"
                        send_nsca_bin => "/usr/local/nagios/bin/send_nsca"
                        send_nsca_config => "/usr/local/nagios/etc/send_nsca.cfg"
                        nagios_host => "192.168.1.2"
                        nagios_service => "java service"
                        nagios_status => 2
                }
        }
        if [type] == "log4j-jetty" {
                nagios_nsca {
                        host => "192.168.1.1"
                        port => "5667"
                        message_format => "OK"
                        send_nsca_bin => "/usr/local/nagios/bin/send_nsca"
                        send_nsca_config => "/usr/local/nagios/etc/send_nsca.cfg"
                        nagios_host => "192.168.1.2"
                        nagios_service => "java service"
                        nagios_status => 0
                }
        }
}

其中,当”priority”=ERROR时,nagios_status => 2,因此checkdummy接受的参数为2,此时send_nsca会将此值传给nagios server的nsca,从而发出报警。

4.排错
以上过程虽然看似顺利,但是在配置过程中也出现了错误。如通过logstash的输出日志/usr/local/logstash/logs/stdout.log,我们可以看到以下报错:

{:timestamp=>"2017-03-17T09:21:00.097000+0800", :message=>"192.168.1.1~CheckDummy~2~ERROR", :error=>#<NameError: undefined local variable or method `message' for #<LogStash::Outputs::NagiosNsca:0x4e8d7e65>>, :nagios_nsca_command=>"/usr/local/nagios/bin/send_nsca -H 192.168.1.1 -p 5667 -d ~ -c /usr/local/nagios/etc/send_nsca.cfg", :missed_event=>#<LogStash::Event:0x1ad6acb6 @metadata_accessors=#<LogStash::Util::Accessors:0x37843e93 @store={}, @lut={}>, @cancelled=false......}

其中报错”error=NameError: undefined local variable or method `message’ for #>”,经排查根据https://github.com/logstash-plugins/logstash-output-nagios/issues/3,我对logstash插件进行了以下更改:

vim vendor/bundle/jruby/1.9/gems/logstash-output-nagios_nsca-2.0.2/lib/logstash/outputs/nagios_nsca.rb
114行       send_to_nagios(cmd)
改成       send_to_nagios(cmd, message)

131def send_to_nagios(cmd)
改成    def send_to_nagios(cmd, message)

然后logstash能够正常通过nagios进行报警。

  • 4
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值