flume-ng interceptors 可以理解为一个过滤器,通过配置可以收集到符合自己需要类型的日志
官网提供了以下几种interceptors:
Timestamp Interceptor
在event的header中添加一个key叫:timestamp,value为当前的时间戳
Example for agent named a1:
a1.sources = r1
a1.channels = c1
a1.sources.r1.channels = c1
a1.sources.r1.type = seq
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp
Host Interceptor
在event的header中添加一个key叫:host,value为当前机器的hostname或者ip
Example for agent named a1:
a1.sources = r1
a1.channels = c1
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = host
a1.sources.r1.interceptors.i1.hostHeader = hostname
Static Interceptor
可以在event的header中添加自定义的key和value。
Example for agent named a1:
a1.sources = r1
a1.channels = c1
a1.sources.r1.channels = c1
a1.sources.r1.type = seq
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = datacenter
a1.sources.r1.interceptors.i1.value = NEW_YORK
UUID Interceptor
Morphline Interceptor
Sample flume.conf file:
a1.sources.avroSrc.interceptors = morphlineinterceptor
a1.sources.avroSrc.interceptors.morphlineinterceptor.type = org.apache.flume.sink.solr.morphline.MorphlineInterceptor$Builder
a1.sources.avroSrc.interceptors.morphlineinterceptor.morphlineFile = /etc/flume-ng/conf/morphline.conf
a1.sources.avroSrc.interceptors.morphlineinterceptor.morphlineId = morphline1
Search and Replace Interceptor
Example configuration:
a1.sources.avroSrc.interceptors = search-replace
a1.sources.avroSrc.interceptors.search-replace.type = search_replace
# Remove leading alphanumeric characters in an event body.
a1.sources.avroSrc.interceptors.search-replace.searchPattern = ^[A-Za-z0-9_]+
a1.sources.avroSrc.interceptors.search-replace.replaceString =
Another example:
a1.sources.avroSrc.interceptors = search-replace
a1.sources.avroSrc.interceptors.search-replace.type = search_replace
# Use grouping operators to reorder and munge words on a line.
a1.sources.avroSrc.interceptors.search-replace.searchPattern = The quick brown ([a-z]+) jumped over the lazy ([a-z]+)
a1.sources.avroSrc.interceptors.search-replace.replaceString = The hungry $2 ate the careless $1
Regex Filtering Interceptor
通过正则来清洗或包含匹配的events。
Regex Extractor Intercepto
通过正则表达式来在header中添加指定的key,value则为正则匹配的部分
Example 1:
If the Flume event body contained 1:2:3.4foobar5 and the following configuration was used
a1.sources.r1.interceptors.i1.regex = (\\d):(\\d):(\\d)
a1.sources.r1.interceptors.i1.serializers = s1 s2 s3
a1.sources.r1.interceptors.i1.serializers.s1.name = one
a1.sources.r1.interceptors.i1.serializers.s2.name = two
a1.sources.r1.interceptors.i1.serializers.s3.name = three
The extracted event will contain the same body but the following headers will have been added one=>1, two=>2, three=>3
Example 2:
If the Flume event body contained 2012-10-18 18:47:57,614 some log line and the following configuration was used
a1.sources.r1.interceptors.i1.regex = ^(?:\\n)?(\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d)
a1.sources.r1.interceptors.i1.serializers = s1
a1.sources.r1.interceptors.i1.serializers.s1.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
a1.sources.r1.interceptors.i1.serializers.s1.name = timestamp
a1.sources.r1.interceptors.i1.serializers.s1.pattern = yyyy-MM-dd HH:mm
上面是官网提供的demo:
这里我简单介绍一下经常使用的 regex_filter
在配置中新增interceptors:
下面是给了两个正则 ,作为两个例子进行实现 其中 i1 是 匹配正则 i2 是匹配类似 d:d:d 格式的日志
a1.sources.source1.interceptors=i2
a1.sources.source1.interceptors.i1.type=regex_filter
a1.sources.source1.interceptors.i1.regex=\\{.*\\}
a1.sources.source1.interceptors.i2.type=regex_filter
a1.sources.source1.interceptors.i2.regex = (\\d):(\\d):(\\d)
a1.sources.source1.interceptors.i2.serializers = s1 s2 s3
a1.sources.source1.interceptors.i2.serializers.s1.name = one
a1.sources.source1.interceptors.i2.serializers.s2.name = two
a1.sources.source1.interceptors.i2.serializers.s3.name = three
其他的配置请看上一篇日志 http://blog.csdn.net/linlinv3/article/details/50053333 ;
在上一篇的日志中加入 interceptors;
若是不加interceptors 采集到的日志是 WriteLog所有的:
若是加上i1后 日志只有 1:2:3
若是使用i2 则日志只有 那串json串