elk笔记4--grok正则解析

elk笔记4--grok正则解析

1 grok 切分方法

grok切分规则可按照如下思路进行。
1)找准切分标志,以切分标志作为中心向左或者向右逐个字段抽出,对于正则中的通配符需要进行转义处理,否则这类字符作为分割标志的时候容易解析出错
2)也可以直接从左到右逐个字段取出

2 grok 切分案例

  1. 案例1内容:

    2016/04/27 12:22:50 OSPF: AdjChg: Nbr 220.220.220.220 on g-or2-a0bjt:10.61.61.61: Init -> Deleted (InactivityTimer)
    

    正则:

    %{DATA:timestamp} OSPF: %{DATA:type}: Nbr %{DATA:neighborip} on %{DATA:interface}:%{DATA:ip}: %{DATA:srcstat} -> %{GREEDYDATA:data} 
    

    注意: OSPF前面需要有空格,否则会导致空格到timestamp中;on前面需要空格,否则会导致解析失败
    结果:

    {
      "data": "Deleted (InactivityTimer)",
      "neighborip": "220.220.220.220",
      "srcstat": "Init",
      "ip": "10.61.61.61",
      "type": "AdjChg",
      "interface": "g-or2-a0bjt",
      "timestamp": "2016/04/27 12:22:50"
    }
    
  2. 案例2
    内容:

    [Jul 11 10:22:59][123.123.123.123]<14>[2016-07-11 10:22:59,591][client.log][INFO]bak found in cache, skip it, test_data_2035_20160711_0500
    

    正则1:

    \[%{DATA:head}]\[%{DATA:clientip}]<%{NUMBER:pid}>\[%{GREEDYDATA:ts}]\[%{DATA:logtype}]\[%{LOGLEVEL:level}]%{GREEDYDATA:data}
    

    注意:[需要进行转义
    结果:

    {
      "head": "Jul 11 10:22:59",
      "logtype": "client.log",
      "data": "bak found in cache, skip it, test_data_2035_20160711_0500",
      "level": "INFO",
      "clientip": "123.123.123.123",
      "pid": "14",
      "ts": "2016-07-11 10:22:59,591"
    }
    

    正则2:去掉多余一个时间

    \[%{DATA:head}]\[%{DATA:clientip}]<%{NUMBER:pid}>\[2016-07-11 10:22:59,591]\[%{DATA:logtype}]\[%{LOGLEVEL:level}]%{GREEDYDATA:data}
    或者
    \[%{DATA:head}]\[%{DATA:clientip}]<%{NUMBER:pid}>\[.*]\[%{DATA:logtype}]\[%{LOGLEVEL:level}]%{GREEDYDATA:data}
    

    结果:

    {
      "head": "Jul 11 10:22:59",
      "logtype": "client.log",
      "data": "bak found in cache, skip it, test_data_2035_20160711_0500",
      "level": "INFO",
      "clientip": "123.123.123.123",
      "pid": "14"
    }
    
  3. 案例3 解析syslog 日志
    内容:

    Apr 19 12:56:07 xg dbus-daemon[1537]: [session uid=1000 pid=1537] Successfully activated service 'org.freedesktop.Tracker1'
    

    正则:

    %{GREEDYDATA:timestamp} %{DATA:user} %{DATA:app}\[%{NUMBER:pid}]: %{GREEDYDATA:content}
    

    注意: 此处可以根[ 或者 ] 确定字段的相关关系,然后逐渐向前取,最前面时间直接使用GREEDYDATA匹配即可
    结果:

    {
      "app": "dbus-daemon",
      "pid": "1537",
      "user": "xg",
      "content": "[session uid=1000 pid=1537] Successfully activated service 'org.freedesktop.Tracker1'",
      "timestamp": "Apr 19 12:56:07"
    }
    
  4. 案例4 解析nginx 日志
    内容:

    120.123.123.123 - - [19/Apr/2020:10:40:59 +0800] "GET /hello HTTP/1.1" 404 200 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36"
    

    正则:

    %{IP:server_name} %{DATA:holder1} %{DATA:remote_user} \[%{DATA:localtime}] "%{DATA:request}" %{NUMBER:req_status} %{NUMBER:upstream_status} "%{DATA:holder2}" %{GREEDYDATA:agent}
    

    结果:

    {
      "localtime": "19/Apr/2020:10:40:59 +0800",
      "server_name": "120.123.123.123",
      "request": "GET /hello HTTP/1.1",
      "agent": "\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36\"",
      "req_status": "404",
      "remote_user": "-",
      "upstream_status": "200",
      "holder2": "-",
      "holder1": "-"
    }
    
  5. 案例5
    内容:

    \[%{DATA:ts}]\[%{DATA:ns}]\[%{DATA:env}]\[%{DATA:logstash_level}]\[%{DATA:service}]\[%{DATA:filename}:%{NUMBER:lineno}]%{GREEDYDATA:msg}
    

    正则:

    \[%{DATA:ts}]\[%{DATA:ns}]\[%{DATA:env}]\[%{DATA:logstash_level}]\[%{DATA:service}]\[%{DATA:filename}:%{NUMBER:lineno}]%{GREEDYDATA:msg}
    

    结果:

    {
      "msg": "{'keyword': '', 'pageNo': '1'}",
      "filename": "search.py",
      "lineno": "29",
      "ns": "audio-mgr",
      "service": "apiserver",
      "env": "production",
      "ts": "2020-04-29 21:37:54",
      "logstash_level": "    INFO"
    }
    
  6. 案例6
    内容:

    2021-01-12T17:38:53.800474Z stdout F 2021-01-12 17:38:53,800 INFO: [Log.py:50] [MainProcess:20 MainThread] - init logger
    

    正则:

    %{DATA:timestamp} %{DATA:stdtype} F %{DATA:dt2} %{DATA:time2} %{DATA:loglevel}\: \[%{DATA:file}] \[%{DATA:function}] - %{GREEDYDATA:msg}
    

    结果:

    {
      "msg": "init logger",
      "time2": "17:38:53,800",
      "dt2": "2021-01-12",
      "file": "Log.py:50",
      "loglevel": "INFO",
      "function": "MainProcess:20 MainThread",
      "stdtype": "stdout",
      "timestamp": "2021-01-12T17:38:53.800474Z"
    }
    
  7. 案例7-解析ingress 日志
    本案例解析ingress 的日志,案例中字段参考案例晕 sls 中日志解析字段
    内容:

    192.168.2.12 - - [18/May/2022:12:44:01 +0000] "GET /public/fonts/roboto/vPcynSL0qHq_6dX7lKVByfesZW2xOQ-xsNqO47m55DA.woff2 HTTP/1.1" 304 0 "http://grafana.xg.com:30080/public/build/grafana.dark.b208037f6b1954dc031d.css" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36" 569 0.000 [lens-metrics-grafana-svc-80] [] 10.224.25.187:3000 0 0.000 304 7f2d304f864b63c6cd969cdde507b899
    

    正则:

    %{IP:upstream_addr} %{DATA:http_referer} %{DATA:remote_user} \[%{DATA:time}] "%{DATA:method} %{DATA:url} %{DATA:version}" %{NUMBER:status} %{NUMBER:request_length} "http://%{DATA:host}/%{DATA:path}" %{GREEDYDATA:agent} %{NUMBER:request_length} %{NUMBER:request_time} \[%{DATA:proxy_upstream_name}] \[] %{DATA:upstream_addr} %{NUMBER:upstream_response_length} %{NUMBER:upstream_response_time} %{NUMBER:upstream_status} %{GREEDYDATA:req_id}
    

    结果:

    {
      "agent": "\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36\"",
      "method": "GET",
      "upstream_addr": "10.224.25.187:3000",
      "upstream_response_length": "0",
      "version": "HTTP/1.1",
      "url": "/public/fonts/roboto/vPcynSL0qHq_6dX7lKVByfesZW2xOQ-xsNqO47m55DA.woff2",
      "remote_user": "-",
      "req_id": "7f2d304f864b63c6cd969cdde507b899",
      "path": "public/build/grafana.dark.b208037f6b1954dc031d.css",
      "upstream_status": "304",
      "request_time": "0.000",
      "request_length": "0",
      "http_referer": "-",
      "host": "grafana.xg.com:30080",
      "proxy_upstream_name": "lens-metrics-grafana-svc-80",
      "upstream_response_time": "0.000",
      "time": "18/May/2022:12:44:01 +0000",
      "status": "304"
    }
    

3 说明

参考文档:
grok-patterns
filter-grok-index

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

昕光xg

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值