1.自定义Interceptor
可以自定义Interceptor,对agent接收的event header中添加KV值。在后面的Selector中根据添加的KV值来决定将event分发到channel的分发规则。
例如下面代码定义了一个interceptor,当接收event的body中含有Hello字符串时,向header添加type:hello键值对;否则添加type:nonhello键值对。若selector接收的event header type是hello,则将其分发到channel c1;否则分发到channel c2。自定义interceptor使用方法Event intercept(Event event)对source进行处理,返回处理后的Event。
package com.fyk.flume.Interceptor;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
public class CustomInterceptor implements Interceptor {
private List<Event> addHeaderEvents;
public void initialize() {
addHeaderEvents = new ArrayList<Event>();
}
public Event intercept(Event event) {
Map<String, String> headers = event.getHeaders();
String body = new String(event.getBody());
if(body.contains("Hello")){
headers.put("type", "hello");
}
else{
headers.put("type", "nonhello");
}
return event;
}
public List<Event> intercept(List<Event> events) {
addHeaderEvents.clear();
for(Event event : events){
addHeaderEvents.add(intercept(event));
}
return addHeaderEvents;
}
public void close() {
}
public static class Builder implements Interceptor.Builder {
public Interceptor build() {
return new CustomInterceptor();
}
public void configure(Context context) {
}
}
}
测试结果如下:
###启动
[root@master flume-1.6.0]# ./bin/flume-ng agent --conf conf --conf-file ./conf/flume-custom/flume-custominteceptor-master.conf --name a1 -Dflume.root.logger=INFO,console
[root@slave1 flume-1.6.0]# ./bin/flume-ng agent --conf conf --conf-file ./conf/flume-custom/flume-custominterceptor-slave1.conf --name a1 -Dflume.root.logger=INFO,console
[root@slave2 flume-1.6.0]# ./bin/flume-ng agent --conf conf --conf-file ./conf/flume-custom/flume-custominteceptor-slave2.conf --name a1 -Dflume.root.logger=INFO,console
###测试结果
[root@slave1 ~]# telnet master 50000
Trying 192.168.56.100...
Connected to master.
Escape character is '^]'.
abcdef
OK
hello ghi
OK
Hello world
OK
Hello python
OK
Hello AI
OK
hello Bigdata
OK
#slave1
2020-12-22 07:21:37,117 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=hello} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 0D Hello world. }
2020-12-22 07:22:07,125 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=hello} body: 48 65 6C 6C 6F 20 70 79 74 68 6F 6E 0D Hello python. }
2020-12-22 07:22:07,125 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=hello} body: 48 65 6C 6C 6F 20 41 49 0D Hello AI. }
#slave2
2020-12-22 07:20:01,966 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=nonhello} body: 61 62 63 64 65 66 0D abcdef. }
2020-12-22 07:20:26,254 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=nonhello} body: 68 65 6C 6C 6F 20 67 68 69 0D hello ghi. }
2020-12-22 07:22:11,298 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{type=nonhello} body: 68 65 6C 6C 6F 20 42 69 67 64 61 74 61 0D hello Bigdata. }
###使用的配置文件
#master
[root@master flume-1.6.0]# cat ./conf/flume-custom/flume-custominteceptor-master.conf
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
a1.sources.r1.type = netcat
a1.sources.r1.bind = master
a1.sources.r1.port = 50000
#Interceptor
a1.sources.r1.interceptors = i1
#自定义Interceptor类
a1.sources.r1.interceptors.i1.type = com.fyk.flume.Interceptor.CustomInterceptor$Builder
#channel selector, 根据自定义Interceptor值来分配channel
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = type
a1.sources.r1.selector.mapping.hello = c1
a1.sources.r1.selector.mapping.nonhello = c2
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = slave1
a1.sinks.k1.port = 50000
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = slave2
a1.sinks.k2.port = 50000
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
#slave1
[root@slave1 flume-1.6.0]# cat ./conf/flume-custom/flume-custominterceptor-slave1.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.type = avro
a1.sources.r1.bind = slave1
a1.sources.r1.port = 50000
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
#slave2
[root@slave2 flume-1.6.0]# cat ./conf/flume-custom/flume-custominteceptor-slave2.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.type = avro
a1.sources.r1.bind = slave2
a1.sources.r1.port = 50000
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
1.2 自定义Source
以下代码自定义一个source,封装数据为event并由getChannelProcessor().processEvent(event);来处理。processEvent方法中使用事务处理机制来提交event给channel。
package com.fyk.flume.Source;
import org.apache.flume.Context;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.PollableSource;
import org.apache.flume.Source;
import org.apache.flume.channel.ChannelProcessor;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.lifecycle.LifecycleState;
import org.apache.flume.source.AbstractSource;
public class CustomSource extends AbstractSource implements Configurable, PollableSource {
//global variable
private String prefix;
private String suffix;
public void configure(Context context) {
prefix = context.getString("prefix");
suffix = context.getString("suffix", "Flume");
}
/*
*1. receive out data
* 2. make event
* 3. send event to channel
*/
public Status process() throws EventDeliveryException {
Status status = null;
try{
for(int i = 0; i < 5; i++)
{
SimpleEvent event = new SimpleEvent();
event.setBody((prefix + "--"+ i + suffix).getBytes());
getChannelProcessor().processEvent(event);
status = Status.READY;
}
}
catch (Exception e)
{
e.printStackTrace();
status = Status.BACKOFF;
}
try{
Thread.sleep(2000);
}catch(InterruptedException e)
{
e.printStackTrace();
}
return status;
}
public long getBackOffSleepIncrement() {
return 0;
}
public long getMaxBackOffSleepInterval() {
return 0;
}
}
##启动
[root@master flume-1.6.0]# ./bin/flume-ng agent --conf conf --conf-file ./conf/flume-custom/flume-customsource-master.conf --name a1 -Dflume.root.logger=INFO,console
##配置文件
[root@master flume-1.6.0]# cat ./conf/flume-custom/flume-customsource-master.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.type = com.fyk.flume.Source.CustomSource
a1.sources.r1.prefix = Bigdata
a1.sources.r1.suffix = spark
a1.sinks.k1.type = logger
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
##测试结果
2020-12-22 23:54:01,951 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 30 73 70 61 72 6B Bigdata--0spark }
2020-12-22 23:54:01,951 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 31 73 70 61 72 6B Bigdata--1spark }
2020-12-22 23:54:01,952 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 32 73 70 61 72 6B Bigdata--2spark }
2020-12-22 23:54:01,952 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 33 73 70 61 72 6B Bigdata--3spark }
2020-12-22 23:54:01,952 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 34 73 70 61 72 6B Bigdata--4spark }
2020-12-22 23:54:03,950 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 30 73 70 61 72 6B Bigdata--0spark }
2020-12-22 23:54:03,951 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 31 73 70 61 72 6B Bigdata--1spark }
2020-12-22 23:54:03,951 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 32 73 70 61 72 6B Bigdata--2spark }
2020-12-22 23:54:03,952 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 33 73 70 61 72 6B Bigdata--3spark }
2020-12-22 23:54:03,952 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{} body: 42 69 67 64 61 74 61 2D 2D 34 73 70 61 72 6B Bigdata--4spark }
1.3 自定义sink
以下定义一个sink,使用事务机制去channel pull event。
package com.fyk.flume.Sink;
import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class CustomSink extends AbstractSink implements Configurable {
private Logger logger = LoggerFactory.getLogger(CustomSink.class);
private String prefix;
private String suffix;
public Status process() throws EventDeliveryException {
Status status = null;
Channel channel = getChannel();
Transaction txn = channel.getTransaction();
txn.begin();
try{
Event event = channel.take();
if(event != null) {
String body = new String(event.getBody());
logger.info(prefix + body + suffix);
}
txn.commit();
status = Status.READY;
}
catch(Throwable e)
{
txn.rollback();
status = Status.BACKOFF;
if(e instanceof Exception){
throw (Error)e;
}
}
finally {
txn.close();
}
return status;
}
public void configure(Context context) {
prefix = context.getString("prefix");
suffix = context.getString("suffix", "BigData");
}
}
测试结果如下:
#启动
[root@master flume-1.6.0]# ./bin/flume-ng agent --conf conf --conf-file ./conf/flume-custom/flume-customsink-master.conf --name a1 -Dflume.root.logger=INFO,console
##配置文件
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.type = netcat
a1.sources.r1.bind = master
a1.sources.r1.port = 50000
a1.sinks.k1.type = com.fyk.flume.Sink.CustomSink
a1.sinks.k1.prefix = sleep--
a1.sinks.k1.suffix = --boy
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
##测试结果
#slave1
[root@slave1 ~]# telnet master 50000
Trying 192.168.56.100...
Connected to master.
Escape character is '^]'.
abc def ghi
OK
Hello Bigdata
OK
#master
--boy12-23 00:03:51,530 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - com.fyk.flume.Sink.CustomSink.process(CustomSink.java:26)] sleep--abc def ghi
--boy12-23 00:05:12,854 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - com.fyk.flume.Sink.CustomSink.process(CustomSink.java:26)] sleep--Hello Bigdata