elk笔记4--grok正则解析
1 grok 切分方法
grok切分规则可按照如下思路进行。
1)找准切分标志,以切分标志作为中心向左或者向右逐个字段抽出,对于正则中的通配符需要进行转义处理,否则这类字符作为分割标志的时候容易解析出错
2)也可以直接从左到右逐个字段取出
2 grok 切分案例
-
案例1内容:
2016/04/27 12:22:50 OSPF: AdjChg: Nbr 220.220.220.220 on g-or2-a0bjt:10.61.61.61: Init -> Deleted (InactivityTimer)
正则:
%{DATA:timestamp} OSPF: %{DATA:type}: Nbr %{DATA:neighborip} on %{DATA:interface}:%{DATA:ip}: %{DATA:srcstat} -> %{GREEDYDATA:data}
注意: OSPF前面需要有空格,否则会导致空格到timestamp中;on前面需要空格,否则会导致解析失败
结果:{ "data": "Deleted (InactivityTimer)", "neighborip": "220.220.220.220", "srcstat": "Init", "ip": "10.61.61.61", "type": "AdjChg", "interface": "g-or2-a0bjt", "timestamp": "2016/04/27 12:22:50" }
-
案例2
内容:[Jul 11 10:22:59][123.123.123.123]<14>[2016-07-11 10:22:59,591][client.log][INFO]bak found in cache, skip it, test_data_2035_20160711_0500
正则1:
\[%{DATA:head}]\[%{DATA:clientip}]<%{NUMBER:pid}>\[%{GREEDYDATA:ts}]\[%{DATA:logtype}]\[%{LOGLEVEL:level}]%{GREEDYDATA:data}
注意:[需要进行转义
结果:{ "head": "Jul 11 10:22:59", "logtype": "client.log", "data": "bak found in cache, skip it, test_data_2035_20160711_0500", "level": "INFO", "clientip": "123.123.123.123", "pid": "14", "ts": "2016-07-11 10:22:59,591" }
正则2:去掉多余一个时间
\[%{DATA:head}]\[%{DATA:clientip}]<%{NUMBER:pid}>\[2016-07-11 10:22:59,591]\[%{DATA:logtype}]\[%{LOGLEVEL:level}]%{GREEDYDATA:data} 或者 \[%{DATA:head}]\[%{DATA:clientip}]<%{NUMBER:pid}>\[.*]\[%{DATA:logtype}]\[%{LOGLEVEL:level}]%{GREEDYDATA:data}
结果:
{ "head": "Jul 11 10:22:59", "logtype": "client.log", "data": "bak found in cache, skip it, test_data_2035_20160711_0500", "level": "INFO", "clientip": "123.123.123.123", "pid": "14" }
-
案例3 解析syslog 日志
内容:Apr 19 12:56:07 xg dbus-daemon[1537]: [session uid=1000 pid=1537] Successfully activated service 'org.freedesktop.Tracker1'
正则:
%{GREEDYDATA:timestamp} %{DATA:user} %{DATA:app}\[%{NUMBER:pid}]: %{GREEDYDATA:content}
注意: 此处可以根[ 或者 ] 确定字段的相关关系,然后逐渐向前取,最前面时间直接使用GREEDYDATA匹配即可
结果:{ "app": "dbus-daemon", "pid": "1537", "user": "xg", "content": "[session uid=1000 pid=1537] Successfully activated service 'org.freedesktop.Tracker1'", "timestamp": "Apr 19 12:56:07" }
-
案例4 解析nginx 日志
内容:120.123.123.123 - - [19/Apr/2020:10:40:59 +0800] "GET /hello HTTP/1.1" 404 200 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36"
正则:
%{IP:server_name} %{DATA:holder1} %{DATA:remote_user} \[%{DATA:localtime}] "%{DATA:request}" %{NUMBER:req_status} %{NUMBER:upstream_status} "%{DATA:holder2}" %{GREEDYDATA:agent}
结果:
{ "localtime": "19/Apr/2020:10:40:59 +0800", "server_name": "120.123.123.123", "request": "GET /hello HTTP/1.1", "agent": "\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36\"", "req_status": "404", "remote_user": "-", "upstream_status": "200", "holder2": "-", "holder1": "-" }
-
案例5
内容:\[%{DATA:ts}]\[%{DATA:ns}]\[%{DATA:env}]\[%{DATA:logstash_level}]\[%{DATA:service}]\[%{DATA:filename}:%{NUMBER:lineno}]%{GREEDYDATA:msg}
正则:
\[%{DATA:ts}]\[%{DATA:ns}]\[%{DATA:env}]\[%{DATA:logstash_level}]\[%{DATA:service}]\[%{DATA:filename}:%{NUMBER:lineno}]%{GREEDYDATA:msg}
结果:
{ "msg": "{'keyword': '', 'pageNo': '1'}", "filename": "search.py", "lineno": "29", "ns": "audio-mgr", "service": "apiserver", "env": "production", "ts": "2020-04-29 21:37:54", "logstash_level": " INFO" }
-
案例6
内容:2021-01-12T17:38:53.800474Z stdout F 2021-01-12 17:38:53,800 INFO: [Log.py:50] [MainProcess:20 MainThread] - init logger
正则:
%{DATA:timestamp} %{DATA:stdtype} F %{DATA:dt2} %{DATA:time2} %{DATA:loglevel}\: \[%{DATA:file}] \[%{DATA:function}] - %{GREEDYDATA:msg}
结果:
{ "msg": "init logger", "time2": "17:38:53,800", "dt2": "2021-01-12", "file": "Log.py:50", "loglevel": "INFO", "function": "MainProcess:20 MainThread", "stdtype": "stdout", "timestamp": "2021-01-12T17:38:53.800474Z" }
-
案例7-解析ingress 日志
本案例解析ingress 的日志,案例中字段参考案例晕 sls 中日志解析字段
内容:192.168.2.12 - - [18/May/2022:12:44:01 +0000] "GET /public/fonts/roboto/vPcynSL0qHq_6dX7lKVByfesZW2xOQ-xsNqO47m55DA.woff2 HTTP/1.1" 304 0 "http://grafana.xg.com:30080/public/build/grafana.dark.b208037f6b1954dc031d.css" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36" 569 0.000 [lens-metrics-grafana-svc-80] [] 10.224.25.187:3000 0 0.000 304 7f2d304f864b63c6cd969cdde507b899
正则:
%{IP:upstream_addr} %{DATA:http_referer} %{DATA:remote_user} \[%{DATA:time}] "%{DATA:method} %{DATA:url} %{DATA:version}" %{NUMBER:status} %{NUMBER:request_length} "http://%{DATA:host}/%{DATA:path}" %{GREEDYDATA:agent} %{NUMBER:request_length} %{NUMBER:request_time} \[%{DATA:proxy_upstream_name}] \[] %{DATA:upstream_addr} %{NUMBER:upstream_response_length} %{NUMBER:upstream_response_time} %{NUMBER:upstream_status} %{GREEDYDATA:req_id}
结果:
{ "agent": "\"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36\"", "method": "GET", "upstream_addr": "10.224.25.187:3000", "upstream_response_length": "0", "version": "HTTP/1.1", "url": "/public/fonts/roboto/vPcynSL0qHq_6dX7lKVByfesZW2xOQ-xsNqO47m55DA.woff2", "remote_user": "-", "req_id": "7f2d304f864b63c6cd969cdde507b899", "path": "public/build/grafana.dark.b208037f6b1954dc031d.css", "upstream_status": "304", "request_time": "0.000", "request_length": "0", "http_referer": "-", "host": "grafana.xg.com:30080", "proxy_upstream_name": "lens-metrics-grafana-svc-80", "upstream_response_time": "0.000", "time": "18/May/2022:12:44:01 +0000", "status": "304" }