logstash解析iis日志

入门参考上一篇:logstash快速入门

实际上,filebeat的iis模块对日志的处理已经很完美了。但是,filebeat是用elasticsearch的pipeline去解析字段的,需要提前setup各种准备,且解析的负荷落在了elasticsearch上。所以,我还是研究了下用logstash去解析filebeat下来的信息。

filter

排除日志注释行
  if [message] =~ "^#" {
    drop {}
  }

[message]表示的是message这个字段,=~表示匹配,iis日志是#开头的。drop表示删除到达此过滤器的所有内容。

为了做对比,以下没有移除注释行。

grok vs dissect

我们使用grok来匹配message中的数据。
下面简单解析iis日志为例子学习filter

grok和dissect都有用定界符将非结构化事件数据提取到字段中的效果。

plugindifference
grok使用正则表达式,数据一行行变化效果好
dissect不使用正则表达式,速度更快,数据重复时效果好

对比可知,使用正则表达式的grok满足实际需求。

filter {
  grok {
    match => [
         "message","%{TIMESTAMP_ISO8601:iis.access.time} (?:-|%{IPORHOST:destination.address}) (?:-|%{WORD:http.request.method}) (?:-|%{NOTSPACE:url.path}) (?:-|%{NOTSPACE:url.query}) (?:-|%{NUMBER:destination.port:long}) (?:-|%{NOTSPACE:user.name}) (?:-|%{IPORHOST:source.address}) (?:-|%{NOTSPACE:user_agent.original}) (?:-|%{NOTSPACE:http.request.referrer}) (?:-|%{NUMBER:http.response.status_code:long}) (?:-|%{NUMBER:iis.access.sub_status:long}) (?:-|%{NUMBER:iis.access.win32_status:long}) (?:-|%{NUMBER:temp.duration:long})",
         "message","%{TIMESTAMP_ISO8601:iis.access.time} (?:-|%{NOTSPACE:iis.access.site_name}) (?:-|%{WORD:http.request.method}) (?:-|%{NOTSPACE:url.path}) (?:-|%{NOTSPACE:url.query}) (?:-|%{NUMBER:destination.port:long}) (?:-|%{NOTSPACE:user.name}) (?:-|%{IPORHOST:source.address}) (?:-|%{NOTSPACE:user_agent.original}) (?:-|%{NOTSPACE:iis.access.cookie}) (?:-|%{NOTSPACE:http.request.referrer}) (?:-|%{NOTSPACE:destination.domain}) (?:-|%{NUMBER:http.response.status_code:long}) (?:-|%{NUMBER:iis.access.sub_status:long}) (?:-|%{NUMBER:iis.access.win32_status:long}) (?:-|%{NUMBER:http.response.body.bytes:long}) (?:-|%{NUMBER:http.request.body.bytes:long}) (?:-|%{NUMBER:temp.duration:long})",
         "message","%{TIMESTAMP_ISO8601:iis.access.time} (?:-|%{NOTSPACE:iis.access.site_name}) (?:-|%{NOTSPACE:iis.access.server_name}) (?:-|%{IPORHOST:destination.address}) (?:-|%{WORD:http.request.method}) (?:-|%{NOTSPACE:url.path}) (?:-|%{NOTSPACE:url.query}) (?:-|%{NUMBER:destination.port:long}) (?:-|%{NOTSPACE:user.name}) (?:-|%{IPORHOST:source.address}) (?:-|HTTP/%{NUMBER:http.version}) (?:-|%{NOTSPACE:user_agent.original}) (?:-|%{NOTSPACE:iis.access.cookie}) (?:-|%{NOTSPACE:http.request.referrer}) (?:-|%{NOTSPACE:destination.domain}) (?:-|%{NUMBER:http.response.status_code:long}) (?:-|%{NUMBER:iis.access.sub_status:long}) (?:-|%{NUMBER:iis.access.win32_status:long}) (?:-|%{NUMBER:http.response.body.bytes:long}) (?:-|%{NUMBER:http.request.body.bytes:long}) (?:-|%{NUMBER:temp.duration:long})",
         "message","%{TIMESTAMP_ISO8601:iis.access.time} \\[%{IPORHOST:destination.address}\\]\\(http://%{IPORHOST:destination.address}\\) (?:-|%{WORD:http.request.method}) (?:-|%{NOTSPACE:url.path}) (?:-|%{NOTSPACE:url.query}) (?:-|%{NUMBER:destination.port:long}) (?:-|%{NOTSPACE:user.name}) \\[%{IPORHOST:source.address}\\]\\(http://%{IPORHOST:source.address}\\) (?:-|%{NOTSPACE:user_agent.original}) (?:-|%{NUMBER:http.response.status_code:long}) (?:-|%{NUMBER:iis.access.sub_status:long}) (?:-|%{NUMBER:iis.access.win32_status:long}) (?:-|%{NUMBER:temp.duration:long})",
         "message","%{TIMESTAMP_ISO8601:iis.access.time} (?:-|%{IPORHOST:destination.address}) (?:-|%{WORD:http.request.method}) (?:-|%{NOTSPACE:url.path}) (?:-|%{NOTSPACE:url.query}) (?:-|%{NUMBER:destination.port:long}) (?:-|%{NOTSPACE:user.name}) (?:-|%{IPORHOST:source.address}) (?:-|%{NOTSPACE:user_agent.original}) (?:-|%{NUMBER:http.response.status_code:long}) (?:-|%{NUMBER:iis.access.sub_status:long}) (?:-|%{NUMBER:iis.access.win32_status:long}) (?:-|%{NUMBER:temp.duration:long})"
    ]
    tag_on_failure => ["fail_in_message"]
  }
}

我们知道w3c的iis日志可选字段,所以在匹配的过程中有可能你的日志中并没有该字段。为了更灵活,这里列举了更多的模式供实际匹配。

解析失败将fail_in_message写入字段tags数组中

{
               "source.address" => "192.168.0.9",
              "iis.access.time" => "2020-06-01 00:00:00",
      "iis.access.win32_status" => "0",
                "temp.duration" => "343",
          "destination.address" => "192.168.0.10",
                      "message" => "2020-06-01 00:00:00 10.122.123.22 GET /XXinfoAPI/HeheferEx/haha02-5641 format=json 80 - 12.123.23.226 Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.9.2)+Gecko/20100115+Firefox/3.6 - 200 0 0 343\r",
    "http.response.status_code" => "200",
             "destination.port" => "80",
                   "@timestamp" => 2020-07-09T02:00:57.354Z,
                    "url.query" => "format=json",
        "iis.access.sub_status" => "0",
          "http.request.method" => "GET",
                     "url.path" => "/XXinfoAPI/HeheferEx/haha02-5641",
          "user_agent.original" => "Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.9.2)+Gecko/20100115+Firefox/3.6",
                     "@version" => "1",
                         "host" => "Janey-deMacBook-Pro.local",
                         "path" => "/Users/janeydeng/projects/ELK/mock_logs/u_ex200601.log"
}
{
          "tags" => [
        [0] "fail_in_message"
    ],
       "message" => "#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken\r",
      "@version" => "1",
    "@timestamp" => 2020-07-09T02:00:57.353Z,
          "host" => "Janey-deMacBook-Pro.local",
          "path" => "/Users/janeydeng/projects/ELK/mock_logs/u_ex200601.log"
}

当该行为注释时,解析失败,fail_in_message写入tags。

除了grok自己的选项,以下选项对所有的logstash的过滤插件都适用。
参考:来源:plugins-filters-grok-common-options

SettingInput typeRequired
add_fieldhashNo
add_tagarrayNo
enable_metricbooleanNo
idstringNo
periodic_flushbooleanNo
remove_fieldarrayNo
remove_tagarrayNo

解析完的message字段没有意义了,我们将其删除;如果出错,我们保留message

  if "fail_in_message" not in [tags] {
    mutate {
      remove_field => ["message"]
    } 
  }

此时,数据如下

{
                         "host" => "Janey-deMacBook-Pro.local",
                   "@timestamp" => 2020-07-09T02:54:11.232Z,
                    "url.query" => "format=json",
             "destination.port" => "80",
                     "@version" => "1",
        "iis.access.sub_status" => "0",
              "iis.access.time" => "2020-06-01 00:00:00",
                     "url.path" => "/XXinfoAPI/FindEventEx/haha02-6325",
                "temp.duration" => "265",
          "user_agent.original" => "Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.9.2)+Gecko/20100115+Firefox/3.6",
               "source.address" => "12.123.23.226",
      "iis.access.win32_status" => "0",
                         "path" => "/Users/janeydeng/projects/ELK/mock_logs/u_ex200601.log",
    "http.response.status_code" => "200",
          "destination.address" => "10.122.123.22",
          "http.request.method" => "GET"
}
date

数据导入elasticsearch时,需要用@timestamp选取一段时间做图表。而此时的@timestamp代表的是logstash解析的时间,而不是iis数据生成的时间。如果我们今天之内将一年的数据都解析完了,我们需要选取某段时间的数据出来分析,而@timestamp只会是今天的日期。

  mutate {
    rename => ["@timestamp","event.created"]
  }
  date {
    match => [ "iis.access.time", "YYYY-MM-dd HH:mm:ss" ]
    target => "@timestamp"
    tag_on_failure => ["fail_in_timestamp"]
    timezone => "Etc/GMT+8"
  }

为了保留logstash采集时间,我们将@timestamp的值赋与新字段event.created

我们将数据的生成时间iis.access.time设置给@timestamp,同时iis日志的时间是零时区的,我们需要+8小时。

{
               "source.address" => "12.123.23.226",
      "iis.access.win32_status" => "0",
                   "@timestamp" => 2020-06-01T08:00:00.000Z,
                "event.created" => 2020-07-09T03:45:31.441Z,
              "iis.access.time" => "2020-06-01 00:00:00",
          "http.request.method" => "GET",
                         "host" => "Janey-deMacBook-Pro.local",
                     "url.path" => "/XXinfoAPI/FindEventEx/haha02-4565",
                    "url.query" => "format=json",
                         "path" => "/Users/janeydeng/projects/ELK/mock_logs/u_ex200601.log",
          "user_agent.original" => "Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.9.2)+Gecko/20100115+Firefox/3.6",
          "destination.address" => "10.122.123.22",
    "http.response.status_code" => "200",
                     "@version" => "1",
             "destination.port" => "80",
                "temp.duration" => "281",
        "iis.access.sub_status" => "0"
}
user_agent

客户端的数据对程序调试也很重要。

  useragent {
    source => "user_agent.original"
    prefix => "user_agent."
    remove_field => "user_agent.original"
  }

因为iis的一条客户代理信息解析出多个信息,有些版本信息的字段名非常泛,容易与主机信息等混淆,所以加上前缀“user_agent_“,信息可读性更强。

              "user_agent.name" => "Firefox",
           "user_agent.os_name" => "Windows",
          "user_agent.original" => "Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.9.2)+Gecko/20100115+Firefox/3.6",
             "user_agent.minor" => "6",
                "user_agent.os" => "Windows",
             "user_agent.major" => "3",
geoip

免费的有2种数据库: GeoLite2-City 和 GeoLite2-ASN

# 解析库来自于 GeoLite2-City(默认值)
                 "source.ip" => {
             "longitude" => 123.3452,
           "region_name" => "Guangdong",
              "timezone" => "Asia/Shanghai",
          "country_name" => "China",
              "latitude" => 23.5289,
              "location" => {
            "lon" => 123.3452,
            "lat" => 23.5289
        },
                    "ip" => "218.233.233.226",
         "country_code2" => "CN",
         "country_code3" => "CN",
             "city_name" => "Jieyang",
        "continent_code" => "AS",
           "region_code" => "GD"
    },
# 解析库来自于 GeoLite2-ASN,设置default_database_type => "ASN"
                       "source.ip" => {
           "asn" => 9890,
            "ip" => "218.233.233.226",
        "as_org" => "Guangdong Mobile Communication Co.Ltd."
    },

对来访的ip,我们需要挖掘出更多的信息,供我们分析。同一个ip地址解析库不同,获取的信息也不同。
因此,为了获取更多信息,我们可以对ip地址解析两次,target到不同的字段名称。

  geoip {
    source => "source.address"
    target => "source.geo"
    default_database_type => "ASN"
  }
  geoip {
    source => "source.address"
    target => "source.as"
    default_database_type => "ASN"
  }
实际需要

因为需要知道iis出错最多的终端编号,信息已经包含在访问路径中了,需要提取出来做分析。
/XXinfoAPI/FindEventEx/haha02-6325中的haha02-6325就是终端编号。
分析终端编号是由字母te+2位数字+“-”+4位数字组成

  grok {
    match => ["url.path","(?<haha>(?=haha)haha[0-9]{1,2}-[0-9]{4}$)"]
    tag_on_failure => [""]
  }

因为并不是所有的访问路径都是带终端编号的。所以如果没有带编号的,不用写对信息进入tags中了。
表示该字段名称,(?=haha)表示是否包含了haha字符,后面为正则表达式。

可在Grok Debug(需科学上网)中先测试好

"te" => "haha02-6247"

附上正则表达式连接:
https://github.com/kkos/oniguruma/blob/master/doc/RE

logstash: grok-patterns

最后,一条iis日志
2020-06-01 00:00:00 10.122.123.22 GET /XXinfoAPI/HeheferEx/haha02-6247 format=json 80 - 12.123.23.226 Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.9.2)+Gecko/20100115+Firefox/3.6 - 200 0 0 265
就被解析成了如下信息

{
    "iis.access.win32_status": "0",
    "http.request.method": "GET",
    "user_agent.device": "Other",
    "source.address": "12.123.23.226",
    "user_agent.build": "",
    "path": "/Users/janeydeng/projects/ELK/mock_logs/u_ex200601.log",
    "iis.access.sub_status": "0",
    "@timestamp": "2020-06-01T08:00:00.000Z",
    "user_agent.os": "Windows",
    "source.geo": {
        "country_name": "China",
        "region_name": "Guangdong",
        "latitude": 23.5189,
        "region_code": "GD",
        "longitude": 118.3942,
        "ip": "12.123.23.226",
        "location": {
            "lon": 118.3942,
            "lat": 23.5189
        },
        "continent_code": "AS",
        "timezone": "Asia/Shanghai",
        "country_code3": "CN",
        "country_code2": "CN",
        "city_name": "Jieyang"
    },
    "@version": "1",
    "url.query": "format=json",
    "te": "haha02-6247",
    "source.as": {
        "ip": "12.123.23.226",
        "asn": 9808,
        "as_org": "Guangdong Mobile Communication Co.Ltd."
    },
    "user_agent.os_name": "Windows",
    "destination.address": "10.122.123.22",
    "url.path": "/XXinfoAPI/HeheferEx/haha02-6247",
    "destination.port": "80",
    "user_agent.minor": "6",
    "user_agent.original": "Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+zh-CN;+rv:1.9.2)+Gecko/20100115+Firefox/3.6",
    "user_agent.name": "Firefox",
    "iis.access.time": "2020-06-01 00:00:00",
    "http.response.status_code": "200",
    "host": "Janey-deMacBook-Pro.local",
    "temp.duration": "265",
    "event.created": "2020-07-09T09:27:11.050Z",
    "user_agent.major": "3"
}

附录logstash.conf:

# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.

input {
  file {
    path => "/Users/janeydeng/projects/ELK/mock_logs/*.log"
    start_position => "beginning"
 }
} 

filter {
  # 排除注释行
  if [message] =~ "^#" {
    drop {}
  }
  # 解析message
  grok {
    match => [
         "message","%{TIMESTAMP_ISO8601:iis.access.time} (?:-|%{IPORHOST:destination.address}) (?:-|%{WORD:http.request.method}) (?:-|%{NOTSPACE:url.path}) (?:-|%{NOTSPACE:url.query}) (?:-|%{NUMBER:destination.port:long}) (?:-|%{NOTSPACE:user.name}) (?:-|%{IPORHOST:source.address}) (?:-|%{NOTSPACE:user_agent.original}) (?:-|%{NOTSPACE:http.request.referrer}) (?:-|%{NUMBER:http.response.status_code:long}) (?:-|%{NUMBER:iis.access.sub_status:long}) (?:-|%{NUMBER:iis.access.win32_status:long}) (?:-|%{NUMBER:temp.duration:long})",
         "message","%{TIMESTAMP_ISO8601:iis.access.time} (?:-|%{NOTSPACE:iis.access.site_name}) (?:-|%{WORD:http.request.method}) (?:-|%{NOTSPACE:url.path}) (?:-|%{NOTSPACE:url.query}) (?:-|%{NUMBER:destination.port:long}) (?:-|%{NOTSPACE:user.name}) (?:-|%{IPORHOST:source.address}) (?:-|%{NOTSPACE:user_agent.original}) (?:-|%{NOTSPACE:iis.access.cookie}) (?:-|%{NOTSPACE:http.request.referrer}) (?:-|%{NOTSPACE:destination.domain}) (?:-|%{NUMBER:http.response.status_code:long}) (?:-|%{NUMBER:iis.access.sub_status:long}) (?:-|%{NUMBER:iis.access.win32_status:long}) (?:-|%{NUMBER:http.response.body.bytes:long}) (?:-|%{NUMBER:http.request.body.bytes:long}) (?:-|%{NUMBER:temp.duration:long})",
         "message","%{TIMESTAMP_ISO8601:iis.access.time} (?:-|%{NOTSPACE:iis.access.site_name}) (?:-|%{NOTSPACE:iis.access.server_name}) (?:-|%{IPORHOST:destination.address}) (?:-|%{WORD:http.request.method}) (?:-|%{NOTSPACE:url.path}) (?:-|%{NOTSPACE:url.query}) (?:-|%{NUMBER:destination.port:long}) (?:-|%{NOTSPACE:user.name}) (?:-|%{IPORHOST:source.address}) (?:-|HTTP/%{NUMBER:http.version}) (?:-|%{NOTSPACE:user_agent.original}) (?:-|%{NOTSPACE:iis.access.cookie}) (?:-|%{NOTSPACE:http.request.referrer}) (?:-|%{NOTSPACE:destination.domain}) (?:-|%{NUMBER:http.response.status_code:long}) (?:-|%{NUMBER:iis.access.sub_status:long}) (?:-|%{NUMBER:iis.access.win32_status:long}) (?:-|%{NUMBER:http.response.body.bytes:long}) (?:-|%{NUMBER:http.request.body.bytes:long}) (?:-|%{NUMBER:temp.duration:long})",
         "message","%{TIMESTAMP_ISO8601:iis.access.time} \\[%{IPORHOST:destination.address}\\]\\(http://%{IPORHOST:destination.address}\\) (?:-|%{WORD:http.request.method}) (?:-|%{NOTSPACE:url.path}) (?:-|%{NOTSPACE:url.query}) (?:-|%{NUMBER:destination.port:long}) (?:-|%{NOTSPACE:user.name}) \\[%{IPORHOST:source.address}\\]\\(http://%{IPORHOST:source.address}\\) (?:-|%{NOTSPACE:user_agent.original}) (?:-|%{NUMBER:http.response.status_code:long}) (?:-|%{NUMBER:iis.access.sub_status:long}) (?:-|%{NUMBER:iis.access.win32_status:long}) (?:-|%{NUMBER:temp.duration:long})",
         "message","%{TIMESTAMP_ISO8601:iis.access.time} (?:-|%{IPORHOST:destination.address}) (?:-|%{WORD:http.request.method}) (?:-|%{NOTSPACE:url.path}) (?:-|%{NOTSPACE:url.query}) (?:-|%{NUMBER:destination.port:long}) (?:-|%{NOTSPACE:user.name}) (?:-|%{IPORHOST:source.address}) (?:-|%{NOTSPACE:user_agent.original}) (?:-|%{NUMBER:http.response.status_code:long}) (?:-|%{NUMBER:iis.access.sub_status:long}) (?:-|%{NUMBER:iis.access.win32_status:long}) (?:-|%{NUMBER:temp.duration:long})"
    ]
    tag_on_failure => ["fail_in_message"]
  }
  # 如果message解析成功,则删除message;若失败,保留
  if "fail_in_message" not in [tags] {
    mutate {
      remove_field => ["message"]
    } 
  }
  # @timestamp处理
  mutate {
    rename => ["@timestamp","event.created"]
  }
  date {
    match => [ "iis.access.time", "YYYY-MM-dd HH:mm:ss" ]
    target => "@timestamp"
    tag_on_failure => ["fail_in_timestamp"]
    timezone => "Etc/GMT+8"
  }
  # 客户端解析
  # urldecode {
  #   field => "user_agent.original"
  # }
  useragent {
    source => "user_agent.original"
    prefix => "user_agent."
  }
  # ip地址解析
  geoip {
    source => "source.address"
    target => "source.geo"
  }
  geoip {
    source => "source.address"
    target => "source.as"
    default_database_type => "ASN"
  }
  # 解析终端编号
  grok {
    match => ["url.path","(?<te>(?=te)te[0-9]{1,2}-[0-9]{4}$)"]
    tag_on_failure => [""]
  }
}

output {
  # stdout { codec => rubydebug }
  stdout { codec => json }
}

注意

解析的urldecode和user_agent都是有问题的。解析的信息不全。iis的空格被编码成了+,导致解析user_agent解析出的信息不全(无法正确解析os_major和os_minor)。可用elasticsearch的urldecode和user_agent的pipeline proccesor去解。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值