六、Flume 对接 Kafka
2、数据分离
0)需求:
将flume采集的数据按照不同的类型输入到不同的topic中
将日志数据中带有flume的,输入到Kafka的flume主题中,
将日志数据中带有hello的,输入到Kafka的hello主题中,
其他的数据输入到Kafka的other主题中
1) 编写Flume的Interceptor
/**
* 需求: 将flume采集的数据按照不同的类型输入到不同的topic中
* 将日志数据中带有flume的,输入到Kafka的flume主题中,
* 将日志数据中带有hello的,输入到Kafka的hello主题中,
* 其他的数据输入到Kafka的other主题中
*/
public class FlumeKafkaInterceptor implements Interceptor {
//定义一个全局事件集合
private List<Event> addEventList;
@Override
public void initialize() {
addEventList = new ArrayList<>();
}
@Override
//单个处理event
public Event intercept(Event event) {
//拿到头部信息
Map<String, String> headers = event.getHeaders();
//拿到身体信息
String body = new String(event.getBody());
//判断身体信息
if(body.contains("flume")){
headers.put("topic","flume");
}else if(body.contains("hello")){
headers.put("topic","hello");
}else{
headers.put("topic","other");
}
return event;
}
@Override
public List<Event> intercept(List<Event> events) {
//清空集合
addEventList.clear();
for (Event event : events) {
addEventList.add(intercept(event));
}
return addEventList;
}
@Override
public void close() {
}
public static class Builder implements Interceptor.Builder{
@Override
public Interceptor build() {
return new FlumeKafkaInterceptor();
}
@Override
public void configure(Context context) {
}
}
}
2)将写好的interceptor打包上传到Flume安装目录的lib目录下
3)配置flume
[xiaoxq@hadoop105 jobs]$ pwd
/opt/module/flume-1.9.0/jobs
[xiaoxq@hadoop105 jobs]$ vim flume-kafka2.conf
- 添加如下内容
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = third
a1.sinks.k1.kafka.bootstrap.servers = hadoop105:9092,hadoop106:9092,hadoop107:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
#Interceptor
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = com.xiaoxq.interceptor.FlumeKafkaInterceptor$Builder
# # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
4) 创建相关主题
[xiaoxq@hadoop105 kafka_2.11-2.4.1]$ bin/kafka-topics.sh --create --topic flume --bootstrap-server hadoop105:9092
[xiaoxq@hadoop105 kafka_2.11-2.4.1]$ bin/kafka-topics.sh --create --topic hello --bootstrap-server hadoop105:9092
[xiaoxq@hadoop105 kafka_2.11-2.4.1]$ bin/kafka-topics.sh --create --topic other --bootstrap-server hadoop105:9092
5)启动kafka消费者
[xiaoxq@hadoop105 kafka_2.11-2.4.1]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop105:9092 --topic flume --consumer.config config/consumer.properties
[xiaoxq@hadoop105 kafka_2.11-2.4.1]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop105:9092 --topic hello --consumer.config config/consumer.properties
[xiaoxq@hadoop105 kafka_2.11-2.4.1]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop105:9092 --topic other --consumer.config config/consumer.properties
6) 进入flume根目录下,启动flume
[xiaoxq@hadoop105 flume-1.9.0]$ bin/flume-ng agent -c conf/ -n a1 -f jobs/flume-kafka2.conf
7)向44444端口写数据,查看kafka消费者消费情况
[xiaoxq@hadoop105 jobs]$ nc localhost 44444
flume
OK
hello
OK
world
OK
flume
OK
fjhsia
OK
dfaw
OK
hello
OK