Logstash Filter学习

Logstash Filter学习

目录:

1、Filter介绍:

2、示例操作:

3、官方提供模式:

 

1、Filter介绍:

Filter过滤插件的作用:logstash将数据从一种源转移到另一种源的过程中,可以通过过滤插件对数据进行处理。

官网参考文档地址:

https://www.elastic.co/guide/en/logstash/current/filter-plugins.html

l Date Filter:

官网描述:The date filter is used for parsing dates from fields, and then using that date or timestamp as the logstash timestamp for the event.

日期过滤器:用于从字段中解析日期,然后使用该日期或时间戳作为事件的logstash的时间戳。

l Grok Filter:

官网描述:Parse arbitrary text and structure it.

Grok is currently the best way in logstash to parse crappy unstructured log data into something structured and queryable.

This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format that is generally written for humans and not computer consumption.

Logstash ships with about 120 patterns by default. You can find them here: https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns. You can add your own trivially. (See the patterns_dir setting)

If you need help building patterns to match your logs, you will find the http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/ applications quite useful!

解析任意文本和结构。

Grok是现在logstash日志记录的最好的方式,可以将肮脏的非结构化的日志数据解析转换为一些结构化的、可查询的数据。

这个工具非常适用于syslog日志、apache和其它的Web服务器日志、mysql日志以及通常用于人类而不是计算机消耗的任何日志格式。

默认情况下,Logstash大约包括120种的模式。你能够在这里找到它们:

https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns

你能够添加你自己的模式(参见patterns_dir路径)

如果你需要帮助构建模式来匹配你的日志,你将会在这两个网站找到有用的信息:

http://grokdebug.herokuapp.comhttp://grokconstructor.appspot.com/ applications

l Syslog_pri Filter:

官网描述:Filter plugin for logstash to parse the PRI field from the front of a Syslog (RFC3164) message. If no priority is set, it will default to 13 (per RFC).

This filter is based on the original syslog.rb code shipped with logstash.

用于Logstash的过滤器插件从Syslog(RFC3164)消息的前面解析PRI字段。如果没有设置优先级,它将默认为13(根据RFC)。

这个过滤器是基于Logstash附带的原始syslog.rb代码。

l Geoip Filter:

官网描述:The GeoIP filter adds information about the geographical location of IP addresses, based on data from the Maxmind GeoLite2 databases.

GeoIP过滤器根据Maxmind GeoLite2数据库的数据添加关于IP地址地理位置的信息。

l Mutate Filter:

官网描述:The mutate filter allows you to perform general mutations on fields. You can rename, remove, replace, and modify fields in your events.

突变过滤器允许你对字段执行一般突变。你可以重命名、删除、替换和修改事件中的字段。

 

2、示例操作:

目的:将示例日志文件的信息导入es中,并解析出时间和cmd字段信息。

示例日志文件的示例内容:

<86>Apr 13 03:21:25 groupadd[2381]: group added to /etc/group: name=oprofile, GID=16
<86>Apr 13 03:21:25 groupadd[2381]: group added to /etc/gshadow: name=oprofile

解析内容中自命名字段说明:
systemType:系统类型。X86机还是X64机
pid:进程号
command:指令的具体名称
logTime:执行该条指令的具体时间
operate:执行该条指令的具体操作

附注:详细配置文件及中间过程截图见文章分类ELK目录下的前几篇博客,本文不再冗述。

(1)编辑pattern文件:

在目录/opt/package/logstash-5.2.2/config/patterns下新建filter1pattern文件,文件内容如下:

MYTIME \b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b\s*[0-9][0-9]+\s*(2[0123]|[01]?[0-9]):([0-5][0-9]):([0-5][0-9])
MYCOMMAND \w+
MYOPERATE .*

(2)启动filebeat input logstash&kafka output logstash脚本filter1.conf:

bash ../../bin/logstash -f filter1.conf

input {
        beats{
                port => 5044
        }
}
 
filter{
        if "beats_input_codec_plain_applied" in [tags]{
                mutate{
                        remove_tag => ["beats_input_codec_plain_applied"]
                }
        }
        grok{
                patterns_dir => "/opt/package/logstash-5.2.2/config/patterns/filter1pattern"
                match => {
                        "message" => "<%{INT:systemType}>%{MYTIME:logTime}\s*%{MYCOMMAND:command}\[%{INT:pid}\]:%{MYOPERATE:operate}"
                }
        }
}
 
output {
        stdout{
                codec => rubydebug
        }
        kafka{
                topic_id => "remoa"
                bootstrap_servers => "hdp1.example.com:9092"
                security_protocol => "SASL_PLAINTEXT"
                sasl_kerberos_service_name => "kafka"
                jaas_path => "/tmp/kafka_jaas.conf.demouser"
                kerberos_config => "/etc/krb5.conf"
                compression_type => "none"
                acks => "1"
        }
}

(3)启动kafka input logstash&elasticsearch output logstash脚本filter2.conf:

bash ../../bin/logstash -f filter2.conf

input{
        kafka{
                bootstrap_servers => "hdp1.example.com:9092"
                security_protocol => "SASL_PLAINTEXT"
                sasl_kerberos_service_name => "kafka"
                jaas_path => "/tmp/kafka_jaas.conf.demouser"
                kerberos_config => "/etc/krb5.conf"
                topics => ["remoa"]
        }
}
 
filter{
        if "beats_input_codec_plain_applied" in [tags]{
                mutate{
                        remove_tag => ["beats_input_codec_plain_applied"]
                }
        }
        grok{
                patterns_dir => "/opt/package/logstash-5.2.2/config/patterns/filter1pattern"
                match => {
                        "message" => "<%{INT:systemType}>%{MYTIME:logTime}\s*%{MYCOMMAND:command}\[%{INT:pid}\]:%{MYOPERATE:operate}"
                }
        }
}
 
output{
        stdout{
                codec => rubydebug
        }
        elasticsearch{
                hosts => ["kdc1.example.com:9200","kdc2.example.com:9200"]
                user => logstash
                password => logstash
                action => "index"
                index => "logstash-remoa1-%{+YYYY.MM.dd}"
                truststore => "/opt/package/logstash-5.2.2/config/keys/truststore.jks"
                truststore_password => whoami
                ssl => true
                ssl_certificate_verification => true
                codec => "json"
        }
}

(4)配置filebeat.yml文件,记录测试日志文件路径,然后启动filebeat日志采集器,

service filebeat start

(5)查看到filter1.conf对日志文件经过filter过滤器后处理后的结果:

 

图2.1 截图1

(6)查看到filter2.conf对日志文件经过filter过滤器后处理后的结果:

 

图2.2 截图2

(7)在Kibana找到对应的index:

GET _cat/indices

 

图2.3 截图3

(8)查看其具体内容,查看到解析出的时间及cmd信息字段:

GET logstash-remoa1-2017.09.11/_search

 

图2.4 截图4


3、官方提供模式:

USERNAME [a-zA-Z0-9._-]+
USER %{USERNAME}
EMAILLOCALPART [a-zA-Z][a-zA-Z0-9_.+-=:]+
EMAILADDRESS %{EMAILLOCALPART}@%{HOSTNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b

POSINT \b(?:[1-9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
QUOTEDSTRING (?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>`(?>\\.|[^\\`]+)+`)|``))
UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
# URN, allowing use of RFC 2141 section 2.3 reserved characters
URN urn:[0-9A-Za-z][0-9A-Za-z-]{0,31}:(?:%[0-9a-fA-F]{2}|[0-9A-Za-z()+,.:=@;$_!*'/?#-])+

# Networking
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
IPV6 ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?
IPV4 (?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])
IP (?:%{IPV6}|%{IPV4})
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
IPORHOST (?:%{IP}|%{HOSTNAME})
HOSTPORT %{IPORHOST}:%{POSINT}

# paths
PATH (?:%{UNIXPATH}|%{WINPATH})
UNIXPATH (/([\w_%!$@:.,+~-]+|\\.)*)+
TTY (?:/dev/(pts|tty([pq])?)(\w+)?/?(?:[0-9]+))
WINPATH (?>[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
URIPROTO [A-Za-z]([A-Za-z0-9+\-.]+)+
URIHOST %{IPORHOST}(?::%{POSINT:port})?
# uripath comes loosely from RFC1738, but mostly from what Firefox
# doesn't turn into %XX
URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=@#%&_\-]*)+
#URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)?
URIPARAM \?[A-Za-z0-9$.+!*'|(){},~@#%&/=:;_?\-\[\]<>]*
URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?

# Months: January, Feb, 3, 03, 12, December
MONTH \b(?:[Jj]an(?:uary|uar)?|[Ff]eb(?:ruary|ruar)?|[Mm](?:a|ä)?r(?:ch|z)?|[Aa]pr(?:il)?|[Mm]a(?:y|i)?|[Jj]un(?:e|i)?|[Jj]ul(?:y)?|[Aa]ug(?:ust)?|[Ss]ep(?:tember)?|[Oo](?:c|k)?t(?:ober)?|[Nn]ov(?:ember)?|[Dd]e(?:c|z)(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHNUM2 (?:0[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])

# Days: Monday, Tue, Thu, etc...
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)

# Years?
YEAR (?>\d\d){1,2}
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
# '60' is a leap second in most time standards and thus is valid.
SECOND (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND (?:%{SECOND}|60)
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[APMCE][SD]T|UTC)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_RFC2822 %{DAY}, %{MONTHDAY} %{MONTH} %{YEAR} %{TIME} %{ISO8601_TIMEZONE}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
DATESTAMP_EVENTLOG %{YEAR}%{MONTHNUM2}%{MONTHDAY}%{HOUR}%{MINUTE}%{SECOND}

# Syslog Dates: Month Day HH:MM:SS
SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
PROG [\x21-\x5a\x5c\x5e-\x7e]+
SYSLOGPROG %{PROG:program}(?:\[%{POSINT:pid}\])?
SYSLOGHOST %{IPORHOST}
SYSLOGFACILITY <%{NONNEGINT:facility}.%{NONNEGINT:priority}>
HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT}

# Shortcuts
QS %{QUOTEDSTRING}

# Log formats
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:

# Log Levels
LOGLEVEL ([Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值