grok是Logstash中的filter模块的插件,是一种采用预定义的正则表达式组合来匹配分割日志信息的工具。
别名 | 正则 | 释义 |
---|---|---|
common 常用表达式 | ||
USERNAME | [a-zA-Z0-9._-]+ | 用户名,由数字、大小写及特殊字符 ._- 组成的字符串 |
USER | %{USERNAME} | |
INT | (?:[+-]?(?:[0-9]+)) | 整数,包括0和正负整数 |
BASE10NUM | (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))) | 十进制数字,包括整数和小数 |
NUMBER | (?:%{BASE10NUM}) | |
BASE16NUM | (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+)) | 十六进制整数 |
BASE16FLOAT | \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b | 十六进制整数和小数 |
POSINT | \b(?:[1-9][0-9]*)\b | 匹配正整数 |
NONNEGINT | \b(?:[0-9]+)\b | 匹配非负整数 |
WORD | \b\w+\b | 字符串,包括数字和大小写字母 |
NOTSPACE | \S+ | 不带任何空格的字符串 |
SPACE | \s* | 空格字符串 |
DATA | .*? | 匹配换行符 |
GREEDYDATA | .* | 能匹配任意字符串(匹配0个或多个除换行符) |
QUOTEDSTRING | (?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)) | 带引号的字符串,引用内容,QS简写 |
QS | %{QUOTEDSTRING} | |
UUID | [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12} | 标准UUID |
Networking 网络 | ||
MAC | (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC}) | MAC地址,可以是Cisco设备里的MAC地址,也可以是通用或者Windows系统的MAC地址 |
CISCOMAC | (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4}) | CISCOMAC地址 |
WINDOWSMAC | (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2}) | WINDOWSMAC地址 |
COMMONMAC | (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}) | COMMONMAC地址 |
IPV6 | ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)? | IP V6地址,如:FE80:0000:0000:0000:AAAA:0000:00C2:000 |
IPV4 | (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9]) | IPV4地址,如:127.0.0.1 |
IP | (?:%{IPV6}|%{IPV4}) | IP地址,IPv4或IPv6地址 |
HOSTNAME | \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) | 主机名称 |
HOST | %{HOSTNAME} | |
IPORHOST | (?:%{HOSTNAME}|%{IP}) | 主机名称或者IP |
HOSTPORT | %{IPORHOST}:%{POSINT} | 主机名(IP)+端口 |
paths 路径 | ||
PATH | (?:%{UNIXPATH}|%{WINPATH}) | 路径,UNIX系统或者Windows系统里的路径格式,比如:/usr/xxx ,C:\windows\xxx等 |
UNIXPATH | (?>/(?>[\w_%!$@:.,-]+|\\.)*)+ | 匹配UNIXPATH,UNIX路径 |
TTY | (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+)) | 匹配TTY路径 |
WINPATH | (?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+ | 匹配WINPATH,Windows路径 |
URIPROTO | [A-Za-z]+(\+[A-Za-z+]+)? | URI协议,如:http、ftp等; 匹配URI中的头部分,例如 |
URIHOST | %{IPORHOST}(?::%{POSINT:port})? | URI主机,匹配IPORHOST和POSINT,例如http://hostname.domain.tld/_astats?application=&inf.name=eth0 ,会匹配到hostname.domain.tld |
URIPATH | (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+ | URI路径,匹配内容中的URI |
URIPARAM | \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)? | URI里的GET参数,如:?a=1&b=2&c=3 |
URIPARAM | \?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[]]* | |
URIPATHPARAM | %{URIPATH}(?:%{URIPARAM})? | URI路径+GET参数 |
URI | %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})? | 完整的URI |
Year | Month | Day | Dates | Time 时间 | ||
MONTH | \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b | 月份名称,匹配数字或者月份英文缩写或者全拼等格式月份 "January, Feb,3" |
MONTHNUM | (?:0?[1-9]|1[0-2]) | 月份数字,如:3, 03, 12 ... |
MONTHNUM2 | (?:0[1-9]|1[0-2]) | |
MONTHDAY | (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]) | 月份中的日期数字,如:3,03,31 ... |
DAY | (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?) | 星期几名称,Days: Monday, Tue, Thu, etc... |
YEAR | (?>\d\d){1,2} | 年份数字 |
HOUR | (?:2[0123]|[01]?[0-9]) | 小时数字 |
MINUTE | (?:[0-5][0-9]) | 分钟数字 |
SECOND | (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?) | 秒数字 |
TIME | (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9]) | 时间,如:00:12:34 |
datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it) 时间戳 | ||
DATE_US | %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR} | 美国日期格式,如:10-15-1982、10/15/1982 |
DATE_EU | %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR} | 欧洲日期格式, 如:15-10-1982、15/10/1982、15.10.1982等 |
ISO8601_TIMEZONE | (?:Z|[+-]%{HOUR}(?::?%{MINUTE})) | ISO8601时间格式的小时和分钟,如:+10:23、-1023等 |
ISO8601_SECOND | (?:%{SECOND}|60) | 匹配ISO8601格式的秒钟 |
TIMESTAMP_ISO8601 | %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}? | ISO8601时间戳格式, 如:2020-02-29T12:11:11+08:00 |
DATE | %{DATE_US}|%{DATE_EU} | 日期,美国日期%{DATE_US} 或者欧洲日期%{DATE_EU} |
DATESTAMP | %{DATE}[- ]%{TIME} | 完整日期+时间,如:07-03-2022 11:22:33 |
TZ | (?:[PMCE][SD]T|UTC) | 匹配UTC |
DATESTAMP_RFC822 | %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ} | 匹配RFC822格式时间 |
DATESTAMP_RFC2822 | %{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE} | 匹配RFC2822格式时间 |
DATESTAMP_OTHER | %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR} | 匹配其他格式时间 |
DATESTAMP_EVENTLOG | %{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND} | 匹配EVENTLOG格式的时间 |
Syslog Dates: Month Day HH:MM:SS 系统日志日期 | ||
SYSLOGTIMESTAMP | %{MONTH} +%{MONTHDAY} %{TIME} | 匹配Syslog格式的时间 |
PROG | (?:[\w._/%-]+) | 匹配program内容 |
SYSLOGPROG | %{PROG:program}(?:[%{POSINT:pid}])? | 匹配program和pid内容 |
SYSLOGHOST | %{IPORHOST} | 匹配IPORHOST |
SYSLOGFACILITY | <%{NONNEGINT:facility}.%{NONNEGINT:priority}> | 匹配IPORHOST |
HTTPDATE | %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT} | http默认日期格式, 如:03/Jul/2016:00:36:53 +0800 |
Log formats 日志格式 | ||
SYSLOGBASE | %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: | Syslog默认格式日志 |
COMMONAPACHELOG | %{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) | 匹配commonApache日志 |
COMBINEDAPACHELOG | %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent} | 匹配组合Apache日志 |
Log Levels 日志等级 | ||
LOGLEVEL | ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?) | 日志等级, 比如:warn、debug、Alert、alert、ALERT、Error等 |
create grok 自定义构建 | ||
DATE_CN | %{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY} | 中国人习惯的日期 ”例:2022.03.07、2022/03/07、2022-03-07“ |
ZIPCODE_CN | [1-9]\d{5} | 国内邮政编码 |
GAME_ACCOUNT | [a-zA-Z][a-zA-Z0-9_]{4,15} | 游戏账号,首字符为字母,4-15位字母、数字、下划线组成 |
注: \b 单词开头或结尾 \w 字母数字下划线 \s或者\s+ 代表多个空格 \S+或者\S*, 代表多个字符 \d数字
使用方法:一:%{正则别名:字段命名} 二:(?<字段命名>正则)
操作样例:
1. 日志切割
2. 数据测试
3. 别名与正则 (例)
别名 | 正则 |
BASE10NUM | (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))) |
4. 自定义字段
* 以上内容整理自网络,仅供个人学习参考,整理不易多多支持!不足之处还望指正。