Flume中的拦截器的使用方法(Interceptor)
Flume中的拦截器(interceptor),用户Source读取events发送到Sink的时候,在events header中加入一些有用的信息,或者对events的内容进行过滤,完成初步的数据清洗。这在实际业务场景中非常有
Java代码实现
用java代码实现简单的功能,如果出现hello开头的,拦截到一个地方,出现hi开头的 拦截到另一个地方
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
/**
* @Description 对source 接收到的event 进行分辨
* event:header,body
* 如果 body内容中以hello开头,则给当前的event header 打入hello标签
*/
public class InterceptorDemo implements Interceptor {
ArrayList<Event> addHeaderEvents=null;
@Override
public void initialize() {
addHeaderEvents = new ArrayList<>();
}
@Override
public Event intercept(Event event) {
Map<String, String> headers = event.getHeaders();
byte[] body = event.getBody();
String bodystr = new String(body);
if (bodystr.startsWith("hello")){
headers.put("type","hello");
}else if (bodystr.startsWith("hi")){
headers.put("type","hi");
}else {
headers.put("type","other");
}
return event;
}
@Override
public List<Event> intercept(List<Event> list) {
addHeaderEvents.clear();
for (Event event : list) {
Event opEvent = intercept(event);
addHeaderEvents.add(opEvent);
}
return addHeaderEvents;
}
@Override
public void close() {
addHeaderEvents.clear();
addHeaderEvents=null;
}
public static class Builder implements Interceptor.Builder{
@Override
public Interceptor build() {
return new InterceptorDemo();
}
@Override
public void configure(Context context) {
}
}
}
打Jar 包
依次双击clean 和package ,此时会出现jar包
然后将jar包放到 flume路径的lib路径内备用
创建conf文件
这里将hello 拦截到 HDFS上面保存
hi拦截到kafka 保存
other拦截到 logger保存
interceptordemo.sources=interceptorDemoSource
interceptordemo.channels=interceptorDemoChannelhello interceptorDemoChannelhi interceptorDemoChannelother
interceptordemo.sinks=interceptorDemoSinkhello interceptorDemoSinkhi interceptorDemoSinkother
interceptordemo.sources.interceptorDemoSource.type=netcat
interceptordemo.sources.interceptorDemoSource.bind=localhost
interceptordemo.sources.interceptorDemoSource.port=44444
interceptordemo.sources.interceptorDemoSource.interceptors=interceptor1
interceptordemo.sources.interceptorDemoSource.interceptors.interceptor1.type=nj.zb.kb11.InterceptorDemo$Builder
interceptordemo.sources.interceptorDemoSource.selector.type=multiplexing
interceptordemo.sources.interceptorDemoSource.selector.mapping.hello=interceptorDemoChannelhello
interceptordemo.sources.interceptorDemoSource.selector.mapping.hi=interceptorDemoChannelhi
interceptordemo.sources.interceptorDemoSource.selector.mapping.other=interceptorDemoChannelother
interceptordemo.sources.interceptorDemoSource.selector.header=type
interceptordemo.channels.interceptorDemoChannelhello.type=memory
interceptordemo.channels.interceptorDemoChannelhello.capacity=1000
interceptordemo.channels.interceptorDemoChannelhello.transactionCapacity=100
interceptordemo.channels.interceptorDemoChannelhi.type=memory
interceptordemo.channels.interceptorDemoChannelhi.capacity=1000
interceptordemo.channels.interceptorDemoChannelhi.transactionCapacity=100
interceptordemo.channels.interceptorDemoChannelother.type=memory
interceptordemo.channels.interceptorDemoChannelother.capacity=1000
interceptordemo.channels.interceptorDemoChannelother.transactionCapacity=100
interceptordemo.sinks.interceptorDemoSinkhello.type=hdfs
interceptordemo.sinks.interceptorDemoSinkhello.hdfs.fileType=DataStream
interceptordemo.sinks.interceptorDemoSinkhello.hdfs.filePrefix=hello
interceptordemo.sinks.interceptorDemoSinkhello.hdfs.fileSuffix=.csv
interceptordemo.sinks.interceptorDemoSinkhello.hdfs.path=hdfs://192.168.146.222:9000/kb11/hello/%Y-%m-%d
interceptordemo.sinks.interceptorDemoSinkhello.hdfs.useLocalTimeStamp=true
interceptordemo.sinks.interceptorDemoSinkhello.hdfs.batchSize=640
interceptordemo.sinks.interceptorDemoSinkhello.hdfs.rollCount=0
interceptordemo.sinks.interceptorDemoSinkhello.hdfs.rollSize=6400000
interceptordemo.sinks.interceptorDemoSinkhello.hdfs.rollInterval=3
interceptordemo.sinks.interceptorDemoSinkhi.type=org.apache.flume.sink.kafka.KafkaSink
interceptordemo.sinks.interceptorDemoSinkhi.batchSize=640
interceptordemo.sinks.interceptorDemoSinkhi.brokerList=192.168.146.222:9092
interceptordemo.sinks.interceptorDemoSinkhi.topic=hi
interceptordemo.sinks.interceptorDemoSinkother.type=logger
interceptordemo.sources.interceptorDemoSource.channels=interceptorDemoChannelhello interceptorDemoChannelhi interceptorDemoChannelother
interceptordemo.sinks.interceptorDemoSinkhello.channel=interceptorDemoChannelhello
interceptordemo.sinks.interceptorDemoSinkhi.channel=interceptorDemoChannelhi
interceptordemo.sinks.interceptorDemoSinkother.channel=interceptorDemoChannelother
然后分别在hdfs上面创建对应路径的hello文件夹
在kafka创建topic 名为 hi
运行conf文件:
./bin/flume-ng agent --name interceptordemo --conf ./conf/ --conf-file ./conf/kb11job/netcat-flume-interceptor.conf -Dflume.root.logger=INFO,console
然后另一台打开kafka的消费者模式:
kafka-console-consumer.sh --topic hi --bootstrap-server 192.168.146.222:9092 --from-beginning
然后第三台连接nc的控制台:
telnet localhost 44444
验证
此时在 telnet端输入 : hello java
此时在HDFS上面会生成对应的文件
然后在telnet端输入: hi lilei
此时在kafka topic的 hi 里面 会出现 对应内容
此时在控制台输入任意其他内容
然后在logger会对应出现内容