Logstash Filter学习
目录:
1、Filter介绍:
2、示例操作:
3、官方提供模式:
1、Filter介绍:
Filter过滤插件的作用:logstash将数据从一种源转移到另一种源的过程中,可以通过过滤插件对数据进行处理。
官网参考文档地址:
https://www.elastic.co/guide/en/logstash/current/filter-plugins.html
l Date Filter:
官网描述:The date filter is used for parsing dates from fields, and then using that date or timestamp as the logstash timestamp for the event.
日期过滤器:用于从字段中解析日期,然后使用该日期或时间戳作为事件的logstash的时间戳。
l Grok Filter:
官网描述:Parse arbitrary text and structure it.
Grok is currently the best way in logstash to parse crappy unstructured log data into something structured and queryable.
This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format that is generally written for humans and not computer consumption.
Logstash ships with about 120 patterns by default. You can find them here: https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns. You can add your own trivially. (See the patterns_dir setting)
If you need help building patterns to match your logs, you will find the http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/ applications quite useful!
解析任意文本和结构。
Grok是现在logstash日志记录的最好的方式,可以将肮脏的非结构化的日志数据解析转换为一些结构化的、可查询的数据。
这个工具非常适用于syslog日志、apache和其它的Web服务器日志、mysql日志以及通常用于人类而不是计算机消耗的任何日志格式。
默认情况下,Logstash大约包括120种的模式。你能够在这里找到它们:
https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns
你能够添加你自己的模式(参见patterns_dir路径)
如果你需要帮助构建模式来匹配你的日志,你将会在这两个网站找到有用的信息:
http://grokdebug.herokuapp.com和http://grokconstructor.appspot.com/ applications
l Syslog_pri Filter:
官网描述:Filter plugin for logstash to parse the PRI field from the front of a Syslog (RFC3164) message. If no priority is set, it will default to 13 (per RFC).
This filter is based on the original syslog.rb code shipped with logstash.
用于Logstash的过滤器插件从Syslog(RFC3164)消息的前面解析PRI字段。如果没有设置优先级,它将默认为13(根据RFC)。
这个过滤器是基于Logstash附带的原始syslog.rb代码。
l Geoip Filter:
官网描述:The GeoIP filter adds information about the geographical location of IP addresses, based on data from the Maxmind GeoLite2 databases.
GeoIP过滤器根据Maxmind GeoLite2数据库的数据添加关于IP地址地理位置的信息。
l Mutate Filter:
官网描述:The mutate filter allows you to perform general mutations on fields. You can rename, remove, replace, and modify fields in your events.
突变过滤器允许你对字段执行一般突变。你可以重命名、删除、替换和修改事件中的字段。
2、示例操作:
目的:将示例日志文件的信息导入es中,并解析出时间和cmd字段信息。
示例日志文件的示例内容:
<86>Apr 13 03:21:25 groupadd[2381]: group added to /etc/group: name=oprofile, GID=16
<86>Apr 13 03:21:25 groupadd[2381]: group added to /etc/gshadow: name=oprofile
解析内容中自命名字段说明:
systemType:系统类型。X86机还是X64机
pid:进程号
command:指令的具体名称
logTime:执行该条指令的具体时间
operate:执行该条指令的具体操作
附注:详细配置文件及中间过程截图见文章分类ELK目录下的前几篇博客,本文不再冗述。
(1)编辑pattern文件:
在目录/opt/package/logstash-5.2.2/config/patterns下新建filter1pattern文件,文件内容如下:
MYTIME \b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b\s*[0-9][0-9]+\s*(2[0123]|[01]?[0-9]):([0-5][0-9]):([0-5][0-9])
MYCOMMAND \w+
MYOPERATE .*
(2)启动filebeat input logstash&kafka output logstash脚本filter1.conf:
bash ../../bin/logstash -f filter1.conf
input {
beats{
port => 5044
}
}
filter{
if "beats_input_codec_plain_applied" in [tags]{
mutate{
remove_tag => ["beats_input_codec_plain_applied"]
}
}
grok{
patterns_dir => "/opt/package/logstash-5.2.2/config/patterns/filter1pattern"
match => {
"message" => "<%{INT:systemType}>%{MYTIME:logTime}\s*%{MYCOMMAND:command}\[%{INT:pid}\]:%{MYOPERATE:operate}"
}
}
}
output {
stdout{
codec => rubydebug
}
kafka{
topic_id => "remoa"
bootstrap_servers => "hdp1.example.com:9092"
security_protocol => "SASL_PLAINTEXT"
sasl_kerberos_service_name => "kafka"
jaas_path => "/tmp/kafka_jaas.conf.demouser"
kerberos_config => "/etc/krb5.conf"
compression_type => "none"
acks => "1"
}
}
(3)启动kafka input logstash&elasticsearch output logstash脚本filter2.conf:
bash ../../bin/logstash -f filter2.conf
input{
kafka{
bootstrap_servers => "hdp1.example.com:9092"
security_protocol => "SASL_PLAINTEXT"
sasl_kerberos_service_name => "kafka"
jaas_path => "/tmp/kafka_jaas.conf.demouser"
kerberos_config => "/etc/krb5.conf"
topics => ["remoa"]
}
}
filter{
if "beats_input_codec_plain_applied" in [tags]{
mutate{
remove_tag => ["beats_input_codec_plain_applied"]
}
}
grok{
patterns_dir => "/opt/package/logstash-5.2.2/config/patterns/filter1pattern"
match => {
"message" => "<%{INT:systemType}>%{MYTIME:logTime}\s*%{MYCOMMAND:command}\[%{INT:pid}\]:%{MYOPERATE:operate}"
}
}
}
output{
stdout{
codec => rubydebug
}
elasticsearch{
hosts => ["kdc1.example.com:9200","kdc2.example.com:9200"]
user => logstash
password => logstash
action => "index"
index => "logstash-remoa1-%{+YYYY.MM.dd}"
truststore => "/opt/package/logstash-5.2.2/config/keys/truststore.jks"
truststore_password => whoami
ssl => true
ssl_certificate_verification => true
codec => "json"
}
}
(4)配置filebeat.yml文件,记录测试日志文件路径,然后启动filebeat日志采集器,
service filebeat start
(5)查看到filter1.conf对日志文件经过filter过滤器后处理后的结果:
图2.1 截图1
(6)查看到filter2.conf对日志文件经过filter过滤器后处理后的结果:
图2.2 截图2
(7)在Kibana找到对应的index:
GET _cat/indices
图2.3 截图3
(8)查看其具体内容,查看到解析出的时间及cmd信息字段:
GET logstash-remoa1-2017.09.11/_search
图2.4 截图4
3、官方提供模式:
USERNAME [a-zA-Z0-9._-]+
USER %{USERNAME}
EMAILLOCALPART [a-zA-Z][a-zA-Z0-9_.+-=:]+
EMAILADDRESS %{EMAILLOCALPART}@%{HOSTNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b
POSINT \b(?:[1-9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
QUOTEDSTRING (?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))
UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
# URN, allowing use of RFC 2141 section 2.3 reserved characters
URN urn:[0-9A-Za-z][0-9A-Za-z-]{0,31}:(?:%[0-9a-fA-F]{2}|[0-9A-Za-z()+,.:=@;$_!*'/?#-])+
# Networking
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
IPV6 ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?
IPV4 (?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
IP (?:%{IPV6}|%{IPV4})
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
IPORHOST (?:%{IP}|%{HOSTNAME})
HOSTPORT %{IPORHOST}:%{POSINT}
# paths
PATH (?:%{UNIXPATH}|%{WINPATH})
UNIXPATH (/([\w_%!$@:.,+~-]+|\\.)*)+
TTY (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))
WINPATH (?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
URIPROTO [A-Za-z]([A-Za-z0-9+\-.]+)+
URIHOST %{IPORHOST}(?::%{POSINT:port})?
# uripath comes loosely from RFC1738, but mostly from what Firefox
# doesn't turn into %XX
URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%&_\-]*)+
#URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)?
URIPARAM \?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]*
URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?
# Months: January, Feb, 3, 03, 12, December
MONTH \b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHNUM2 (?:0[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
# Days: Monday, Tue, Thu, etc...
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
# Years?
YEAR (?>\d\d){1,2}
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
# '60' is a leap second in most time standards and thus is valid.
SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND (?:%{SECOND}|60)
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[APMCE][SD]T|UTC)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_RFC2822 %{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
DATESTAMP_EVENTLOG %{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}
# Syslog Dates: Month Day HH:MM:SS
SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
PROG [\x21-\x5a\x5c\x5e-\x7e]+
SYSLOGPROG %{PROG:program}(?:\[%{POSINT:pid}\])?
SYSLOGHOST %{IPORHOST}
SYSLOGFACILITY <%{NONNEGINT:facility}.%{NONNEGINT:priority}>
HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT}
# Shortcuts
QS %{QUOTEDSTRING}
# Log formats
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
# Log Levels
LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)