解析任意文本。
Grok是一种将非结构化日志数据解析为结构化和可查询内容的好方法。该工具非常适合于syslog日志、apache和其他web服务器日志、mysql日志,以及一般情况下任何为人类编写的日志格式,而不适合于计算机使用。
Grok or Dissect ? Or both?
dissect filter plugin是使用分隔符将非结构化事件数据提取到字段中的另一种方法。
Dissect与Grok的不同之处在于它不使用正则表达式,并且更快。当数据可靠地重复时,Dissect工作得很好。
当文本的结构随着行而变化时,Grok是更好的选择。
当可靠地重复该行的某个部分,但是整个行没有时,您可以同时使用Dissect和Grok作为混合用例。
Dissect过滤器可以解构重复的线段。Grok过滤器可以以更高的正则表达式可预测性处理剩余字段值。
Grok Basics
Grok通过将文本模式组合成与日志匹配的内容来工作。
grok模式的语法为%{SYNTAX:SEMANTIC}
SYNTAX 是匹配文本的模式名称。例如,3.44将由NUMBER模式匹配,55.3.244.1将由IP模式匹配。语法就是匹配的方式。
SEMANTIC 是为匹配的文本片段提供的标识符。例如,3.44可以是事件的持续时间,所以您可以简单地将其称为持续时间。此外,字符串55.3.244.1可以标识发出请求的客户端。
%{NUMBER:duration} %{IP:client}
也可以将数据类型转换添加到grok模式。默认情况下,所有语义都保存为字符串。如果希望转换语义的数据类型,
例如,将字符串更改为整数,然后在其后面加上目标数据类型。例如%{NUMBER:num:int},它将num语义从字符串转换为整数。目前唯一支持的转换是int和float。
Regular Expressions
Grok位于正则表达式之上,所以任何正则表达式在grok中也是有效的。
Custom Patterns
有时候,logstash没有您需要的模式。
1. 可以自己命名一个匹配,例如(?<queue_id>[0-9A-F]{10,11})
2. 创建自定义patterns 文件。
创建一个名为patterns的目录,其中包含一个文件(文件名无关紧要,自己命名)
在该文件中,将需要的模式写为模式名、空间,然后是该模式的regexp。
Grok Filter Configuration Options 可以配置:
break_on_match
keep_empty_captures
match
named_captures_only
overwrite
pattern_definitions
patterns_dir
patterns_files_glob
tag_on_failure
tag_on_timeout
timeout_millis
如果想试试我们匹配的是否正确,可以登录下面?网址,专门做grok debug的。
http://grokdebug.herokuapp.com/
============================================================================================
Grok 匹配对照表
USERNAME | [a-zA-Z0-9._-]+ |
USER | %{USERNAME} |
INT | (?:[+-]?(?:[0-9]+)) |
BASE10NUM | (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+))) |
NUMBER | (?:%{BASE10NUM}) |
BASE16NUM | (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+)) |
BASE16FLOAT | \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b |
POSINT | \b(?:[1-9][0-9]*)\b |
NONNEGINT | \b(?:[0-9]+)\b |
WORD | \b\w+\b |
NOTSPACE | \S+ |
SPACE | \s* |
DATA | .*? |
GREEDYDATA | .* |
QUOTEDSTRING | (?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``)) |
UUID | [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12} |
# Networking | |
MAC | (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC}) |
CISCOMAC | (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4}) |
WINDOWSMAC | (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2}) |
COMMONMAC | (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2}) |
IPV6 | ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)? |
IPV4 | (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9]) |
IP | (?:%{IPV6}|%{IPV4}) |
HOSTNAME | \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) |
HOST | %{HOSTNAME} |
IPORHOST | (?:%{HOSTNAME}|%{IP}) |
HOSTPORT | %{IPORHOST}:%{POSINT} |
# paths | |
PATH | (?:%{UNIXPATH}|%{WINPATH}) |
UNIXPATH | (?>/(?>[\w_%!$@:.,-]+|\\.)*)+ |
TTY | (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+)) |
WINPATH | (?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+ |
URIPROTO | [A-Za-z]+(\+[A-Za-z+]+)? |
URIHOST | %{IPORHOST}(?::%{POSINT:port})? |
# uripath comes loosely from RFC1738, but mostly from what Firefox | |
# doesn't turn into %XX | |
URIPATH | (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%_\-]*)+ |
#URIPARAM | \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)? |
URIPARAM | \?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]]* |
URIPATHPARAM | %{URIPATH}(?:%{URIPARAM})? |
URI | %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})? |
# Months: January, Feb, 3, 03, 12, December | |
MONTH | \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b |
MONTHNUM | (?:0?[1-9]|1[0-2]) |
MONTHNUM2 | (?:0[1-9]|1[0-2]) |
MONTHDAY | (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]) |
# Days: Monday, Tue, Thu, etc... | |
DAY | (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?) |
# Years? | |
YEAR | (?>\d\d){1,2} |
HOUR | (?:2[0123]|[01]?[0-9]) |
MINUTE | (?:[0-5][0-9]) |
# '60' is a leap second in most time standards and thus is valid. | |
SECOND | (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?) |
TIME | (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9]) |
# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it) | |
DATE_US | %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR} |
DATE_EU | %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR} |
ISO8601_TIMEZONE | (?:Z|[+-]%{HOUR}(?::?%{MINUTE})) |
ISO8601_SECOND | (?:%{SECOND}|60) |
TIMESTAMP_ISO8601 | %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}? |
DATE | %{DATE_US}|%{DATE_EU} |
DATESTAMP | %{DATE}[- ]%{TIME} |
TZ | (?:[PMCE][SD]T|UTC) |
DATESTAMP_RFC822 | %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ} |
DATESTAMP_RFC2822 | %{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE} |
DATESTAMP_OTHER | %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR} |
DATESTAMP_EVENTLOG | %{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND} |
# Syslog Dates: Month Day HH:MM:SS | |
SYSLOGTIMESTAMP | %{MONTH} +%{MONTHDAY} %{TIME} |
PROG | (?:[\w._/%-]+) |
SYSLOGPROG | %{PROG:program}(?:\[%{POSINT:pid}\])? |
SYSLOGHOST | %{IPORHOST} |
SYSLOGFACILITY | <%{NONNEGINT:facility}.%{NONNEGINT:priority}> |
HTTPDATE | %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT} |
# Shortcuts | |
QS | %{QUOTEDSTRING} |
# Log formats | |
SYSLOGBASE | %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}: |
COMMONAPACHELOG | %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) |
COMBINEDAPACHELOG | %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent} |
# Log Levels | |
LOGLEVEL | ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?) |