elastic案例:logstash grok正则

环境:elasticsearch 5.1.1
logstash 5.2.1
现有业务日志需要分析处理,目前已知,日志条目采用‘||’进行分割,但字段含义并不完全一致。某个状态位的代码值决定了后面若干字段的含义。后考虑先从日志中筛选出一种进行分析。这里为登录信息。

#日志格式示范一:
[2017-02-20 22:30:08,455] [INFO] [c.r.c.front.UserInfoController] [? : ?] ||p0p-web||pp登录||1086||pc_computer-OS:Windows_7_-browser:CHROME_53.0.2785.104-client:192.168.100.11_||13812345678||唐伯虎||Y||-||-||-||-||-||-||-||0||0||0||0||0||-||-||-||-||-||-||-||

#根据日志统计,其中以||分割的第5部分会有我们关注的终端类型,ip地址等。

#格式二:
[2017-02-20 21:56:26,307] [INFO] [c.r.c.front.UserInfoController] [? : ?] ||p0p-web||pp登录||1086||MStation||13812345678||秋香||Y||-||-||-||-||-||-||-||0||0||0||0||0||-||-||-||-||-||-||-||

#第5部分仅一个字符串,没有期望的ip地址

思考点评:

  • 由于||分割栏目已知,只是其中一个字段可能存在包含ip信息或不包含。曾经设想首先通过logstash的dissect插件按照‘||’固定分割方式获得第一批字段,然后对可能包含ip的字段再通过grok正则匹配来获得详细信息。但测试未验证该猜测,暂不通过
  • 考虑首先对全文message字段进行初次正则匹配,判断是否包含ip地址。如果存在则按日志一格式进行grok正则匹配; 如果不存在则默认的添加几个缺失的field字段(os、browser、clientip)并赋予默认值。但该方法测试无法对message进行初次正则匹配是否存在ip。不通过
filter {
  if ( [message] =~ "%{IP}" ) {
    dissect {
      mapping => {
        "message" => "[%{logintime}] [%{}] [%{}] [%{}] ||%{hostname}||%{action}||%{code}||%{os_type}:%{os}-%{}:%{browser}-%{}:%{clientip}_%{}||%{phone}||%{username}||%{}"
      }
    }
  } else {
    dissect {
      mapping => {
        "message" => "[%{localtime}] [%{}] [%{}] [%{}] ||%{hostname}||%{action}||%{code}||%{os_type}||%{phone}||%{username}||%{}"
      }
      add_field => {
        "os" => null
        "browser" => null
        "clientip" => null
      }
    }
  }
  geoip {
    source => "clientip"
  }
  date {
    match => [ "logintime", "yyyy-MM-dd HH:mm:ss,SSS"]
  }
}
  • 最后,看到一些案例判断”_grokparsefailure”字段是否在tags中存在,想到这里,首先按存在ip的正则进行匹配,如果失败,则会自动生成”_grokparsefailure”标签。此时,再按照没变ip的正则第二次匹配获取字段值
input {
  file {
    path => "/data/aliyun/applogs/pm_prod/*_catalina.out"
    start_position => end
    codec => plain {
      charset => "UTF-8"
    }
    type => "pm_type"
  }
}

filter {

  if ( [message] !~ "p2p登录") {
    drop{}
  }

  grok {
    match => { "message" => "\[(?<logintime>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND},\d+)\]\s+.*\|\|(?<hostname>\S+)\|\|(?<action>\S+)\|\|(?<code>\d+)\|\|(?<os_type>\S+?):(?<os>\S+?)-browser:(?<browser>\S+)-client:%{IP:clientip}_.*?\|\|(?<phone>\d+)\|\|(?<username>\S+?)\|\|.*" }
  }

  if "_grokparsefailure" not in [tags] {
    geoip {
      source => "clientip"
    }
  }
  else
  {
    grok {
      match => { "message" => "\[(?<logintime>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND},\d+)\]\s+.*?\|\|(?<hostname>\S+?)\|\|(?<action>\S+?)\|\|(?<code>\d+)\|\|(?<os_type>\w+?)\|\|(?<phone>\d+)\|\|(?<username>\S+?)\|\|.*" }
      add_field => {
        "os" => null
        "browser" => null
        "clientip" => null
      }
    }
  }
  date {
    match => [ "logintime", "yyyy-MM-dd HH:mm:ss,SSS"]
  }
}

output {
#  stdout {
#    codec => rubydebug
#  }
  elasticsearch {
    hosts => "elk.dev:9200"
    index => "pm_info_%{+YYYYMMdd}"
  }
}
  • 模板配置,当基本字段确定后,在运行前,为需要存储的索引创建模板,预先定义主要字段的类型。
PUT /_template/pm_template
{
  "template": "pm_info*",
  "order": 0,
  "settings": {
    "index.number_of_shards": "1",
    "index.number_of_replicas": "0"
  },
  "mappings": {
    "pm_type": {
      "properties": {
        "logintime": {
          "type": "string"
        },
        "hostname": {
          "type": "string"
        },
        "action": {
          "type": "string"
        },
        "code": {
          "type": "integer"
        },
        "os_type": {
          "type": "string"
        },
        "os": {
          "type": "string"
        },
        "browser": {
          "type": "string"
        },
        "clientip": {
          "type": "string"      
        },
        "phone": {
          "type": "string"
        },
        "username": {
          "type": "string"
        },
        "geoip": {
          "properties": {
            "location": {
              "type": "geo_point"
            },
            "ip": {
              "type": "ip"
            }
          }
        }
      }
    }
  }
}
发布了27 篇原创文章 · 获赞 3 · 访问量 5万+
展开阅读全文

logstash中grok表达式

05-19

我的日志格式:[2017-05-19 09:38:10,690] [INFO] [http-nio-8050-exec-6] [com.yixiang.ticket.hub.service.impl.FlightServiceImpl.search(FlightServiceImpl.java:73)]| shopping request:{"cabinRank":"Y","flightRange":[{"fromCity":"CKG","fromDate":"2017-05-20","toCity":"TSN"}],"flightRangeType":"OW","redisKey":"PSSSHOPPING$OW$CKG#2017-05-20#TSN"} grok : grok{ match =>{ "message" => "[(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3})]\s\[(?<level>\w*)]\s\[(?<thread>\S*)\][(?<class>\S*)((?<file>[^:]*)[:]{1}(?<line>\d*)\)]\s\|\s\(?<msg>.*)" } } 在secureCRT上面启动 我自定义的logstash.conf文件的时候 报出以下异常: Pipeline aborted due to error {:exception=>#<RegexpError: char-class value at end of range: /[(?<datetime>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3})]\s\[(?<level>\w*)]\s\[(?<thread>\S*)\][(?<class>\S*)((?<file>[^:]*)[:]{1}(?<line>\d*)\)]\s\|\s\(?<msg>.*)/m>, :backtrace=>["org/jruby/RubyRegexp.java:1434:in `initialize'", "/home/elk/logstash-5.2.1/vendor/bundle/jruby/1.9/gems/jls-grok-0.11.4/lib/grok-pure.rb:127:in `compile'", "/home/elk/logstash-5.2.1/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.3.1/lib/logstash/filters/grok.rb:274:in `register'", "org/jruby/RubyArray.java:1613:in `each'", "/home/elk/logstash-5.2.1/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.3.1/lib/logstash/filters/grok.rb:269:in `register'", "org/jruby/RubyHash.java:1342:in `each'", "/home/elk/logstash-5.2.1/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-3.3.1/lib/logstash/filters/grok.rb:264:in `register'", "/home/elk/logstash-5.2.1/logstash-core/lib/logstash/pipeline.rb:235:in `start_workers'", "org/jruby/RubyArray.java:1613:in `eachqiu'", "/home/elk/logstash-5.2.1/logstash-core/lib/logstash/pipeline.rb:235:in `start_workers'", "/home/elk/logstash-5.2.1/logstash-core/lib/logstash/pipeline.rb:188:in `run'", "/home/elk/logstash-5.2.1/logstash-core/lib/logstash/agent.rb:302:in `start_pipeline'"]} 问答

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 大白 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览