Flume Interceptors

原文:http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.0/ds_Flume/FlumeUserGuide.html


Flume的拦截器可以修改消息内容,Flume提供了集中现成的拦截器,也可以自定义拦截器。拦截器可以组合在一起进行链式拦截。

现成的有:

1)Timestamp Interceptor

This interceptor inserts into the event headers, the time in millis at which it processes the event. This interceptor inserts a header with key timestamp whose value is the relevant timestamp. This interceptor can preserve an existing timestamp if it is already present in the configuration.

Property NameDefaultDescription
typeThe component type name, has to be timestamp or the FQCN
preserveExistingfalseIf the timestamp already exists, should it be preserved - true or false

Example for agent named a1:

a1.sources = r1
a1.channels = c1
a1.sources.r1.channels =  c1
a1.sources.r1.type = seq
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp

2)Host Interceptor

This interceptor inserts the hostname or IP address of the host that this agent is running on. It inserts a header with key host or a configured key whose value is the hostname or IP address of the host, based on configuration.

Property NameDefaultDescription
typeThe component type name, has to be host
preserveExistingfalseIf the host header already exists, should it be preserved - true or false
useIPtrueUse the IP Address if true, else use hostname.
hostHeaderhostThe header key to be used.

Example for agent named a1:

a1.sources = r1
a1.channels = c1
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = host
a1.sources.r1.interceptors.i1.hostHeader = hostname

3)Static Interceptor

Static interceptor allows user to append a static header with static value to all events.

The current implementation does not allow specifying multiple headers at one time. Instead user might chain multiple static interceptors each defining one static header.

Property NameDefaultDescription
typeThe component type name, has to be static
preserveExistingtrueIf configured header already exists, should it be preserved - true or false
keykeyName of header that should be created
valuevalueStatic value that should be created

Example for agent named a1:

a1.sources = r1
a1.channels = c1
a1.sources.r1.channels =  c1
a1.sources.r1.type = seq
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = datacenter
a1.sources.r1.interceptors.i1.value = NEW_YORK

4)UUID Interceptor

This interceptor sets a universally unique identifier on all events that are intercepted. An example UUID is b5755073-77a9-43c1-8fad-b7a586fc1b97, which represents a 128-bit value.

Consider using UUID Interceptor to automatically assign a UUID to an event if no application level unique key for the event is available. It can be important to assign UUIDs to events as soon as they enter the Flume network; that is, in the first Flume Source of the flow. This enables subsequent deduplication of events in the face of replication and redelivery in a Flume network that is designed for high availability and high performance. If an application level key is available, this is preferable over an auto-generated UUID because it enables subsequent updates and deletes of event in data stores using said well known application level key.

Property NameDefaultDescription
typeThe component type name has to be org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder
headerNameidThe name of the Flume header to modify
preserveExistingtrueIf the UUID header already exists, should it be preserved - true or false
prefix“”The prefix string constant to prepend to each generated UUID

5)Regex Filtering Interceptor

This interceptor filters events selectively by interpreting the event body as text and matching the text against a configured regular expression.The supplied regular expression can be used to include events or exclude events.

Property NameDefaultDescription
typeThe component type name has to be regex_filter
regex”.*”Regular expression for matching against events
excludeEventsfalseIf true, regex determines events to exclude, otherwise regex determinesevents to include.

6)Regex Extractor Interceptor

This interceptor extracts regex match groups using a specified regular expression and appends the match groups as headers on the event.It also supports pluggable serializers for formatting the match groups before adding them as event headers.

Property NameDefaultDescription
typeThe component type name has to be regex_extractor
regexRegular expression for matching against events
serializersSpace-separated list of serializers for mapping matches to header names and serializing theirvalues. (See example below)Flume provides built-in support for the following serializers:org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializerorg.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
serializers.<s1>.typedefaultMust be default (org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer),org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer,or the FQCN of a custom class that implements org.apache.flume.interceptor.RegexExtractorInterceptorSerializer
serializers.<s1>.name 
serializers.*Serializer-specific properties

The serializers are used to map the matches to a header name and a formatted header value; by default, you only need to specifythe header name and the default org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer will be used.This serializer simply maps the matches to the specified header name and passes the value through as it was extracted by the regex.You can plug custom serializer implementations into the extractor using the fully qualified class name (FQCN) to format the matchesin anyway you like.

Example 1:

If the Flume event body contained 1:2:3.4foobar5 and the following configuration was used

a1.sources.r1.interceptors.i1.regex = (\\d):(\\d):(\\d)
a1.sources.r1.interceptors.i1.serializers = s1 s2 s3
a1.sources.r1.interceptors.i1.serializers.s1.name = one
a1.sources.r1.interceptors.i1.serializers.s2.name = two
a1.sources.r1.interceptors.i1.serializers.s3.name = three

The extracted event will contain the same body but the following headers will have been added one=>1, two=>2, three=>3

Example 2:

If the Flume event body contained 2012-10-18 18:47:57,614 some log line and the following configuration was used

a1.sources.r1.interceptors.i1.regex = ^(?:\\n)?(\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d)
a1.sources.r1.interceptors.i1.serializers = s1
a1.sources.r1.interceptors.i1.serializers.s1.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
a1.sources.r1.interceptors.i1.serializers.s1.name = timestamp
a1.sources.r1.interceptors.i1.serializers.s1.pattern = yyyy-MM-dd HH:mm

the extracted event will contain the same body but the following headers will have been added timestamp=>1350611220000


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值