文章目录
Kafka整合Flume
1.Kafka作为Flume的sources
Kafka Source
Kafka Source是一个Kafka消费者(consumer),从Kafka的Topic中消费消息。
测试1
sources为kafka source,sinks为HDFS
测试准备
(1)创建一个主题flume01,作为flume的数据源。
kafka-topics.sh --create --zookeeper pseudo01:2181 --replication-factor 1 --partitions 1 --topic flume01
(2)在HDFS上创建一个目录kafka_test01,用于存放flume写出的数据
hdfs dfs -mkdir /kafka_test01
测试
(1)编写flume agent配置文件kafka_hdfs.conf
agent1.sources=s1
agent1.channels=c1
agent1.sinks=k1
#kafka source s1的属性
agent1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
agent1.sources.s1.batchSize = 5000
agent1.sources.s1.batchDurationMillis = 2000
agent1.sources.s1.kafka.bootstrap.servers = pseudo01:9092
agent1.sources.s1.kafka.topics = flume01
agent1.sources.s1.kafka.consumer.group.id = kafka-flume01
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 10000
agent1.channels.c1.transactionCapacity = 10000
agent1.channels.c1.byteCapacityBufferPercentage = 20
agent1.channels.c1.byteCapacity = 800000
#HDFS sinks k1的属性
agent1.sinks.k1.type = hdfs
agent1.sinks.k1.hdfs.path = /kafka_test01/kafka01/%y-%m-%d/%H%M
agent1.sinks.k1.hdfs.filePrefix = kafka-
agent1.sinks.k1.hdfs.fileSuffix = .log
agent1.sinks.k1.hdfs.inUseSuffix = .tmp
agent1.sinks.k1.hdfs.fileType = DataStream
agent1.sinks.k1.hdfs.writeFormat = Text
agent1.sinks.k1.hdfs.useLocalTimeStamp = true
agent1.sinks.k1.hdfs.rollCount = 30
agent1.sinks.k1.hdfs.round = true
agent1.sinks.k1.hdfs.roundValue = 10
agent1.sinks.k1.hdfs.roundUnit = minute
agent1.sources.s1.channels = c1
agent1.sinks.k1.channel = c1
(2)开启该agent
flume-ng agent -n agent1 -c . -f kafka_hdfs.conf
(3)开启一个生产者,向主题flume01发送消息
(4)查看写入HDFS中的数据
测试成功!!!
2.Kafka作为Flume的sinks
Kafka Sinks
这是一个Flume Sink实现,可以将数据发布到Kafka topic。其中一个目标是将Flume与Kafka集成,使基于拉取数据的处理系统能够处理来自不同Flume源的数据。
测试2
监控某一个文件kafka.data的内容,并将文件内容作为数据源发送消息到kafka主题flume02
测试准备
(1)创建要监控的文件kafka.data
touch kafka.data
(2)创建一个主题flume02存储flume生产的消息
kafka-topics.sh --create --zookeeper pseudo01:2181 --replication-factor 1 --partition 2 --topic flume02
(3)创建一个消费者消费主题flume02中的消息。
kafka-console-consumer.sh --bootstrap-server pseudo01:9092 --zookeeper pseudo01:2181 --topic flume02 --from-beginning
测试
(1)创建agent配置文件flume_kafka.conf
agent1.sources=f1
agent1.channels=c1
agent1.sinks=k1
agent1.sources.f1.type = exec
agent1.sources.f1.command = tail -F /root/flume_test/kafka.data
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 10000
agent1.channels.c1.transactionCapacity = 10000
agent1.channels.c1.byteCapacityBufferPercentage = 20
agent1.channels.c1.byteCapacity = 800000
agent1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent1.sinks.k1.kafka.topic = flume02
agent1.sinks.k1.kafka.bootstrap.servers = pseudo01:9092
agent1.sinks.k1.kafka.flumeBatchSize = 20
agent1.sinks.k1.kafka.producer.acks = 1
agent1.sinks.k1.kafka.producer.linger.ms = 1
agent1.sources.f1.channels = c1
agent1.sinks.k1.channel = c1
(2)开启该agent
flume-ng agent -n agent1 -c . -f flume_kafka.conf
(3)向kafka.data中写入数据
echo "111111111111111111111111" >> kafka.data
echo "222222222222222222222222" >> kafka.data
echo "333333333333333333333333" >> kafka.data
echo "444444444444444444444444" >> kafka.data
echo "555555555555555555555555" >> kafka.data
echo "666666666666666666666666" >> kafka.data
echo "777777777777777777777777" >> kafka.data
echo "888888888888888888888888" >> kafka.data
echo "999999999999999999999999" >> kafka.data
echo "000000000000000000000000" >> kafka.data
echo "aaaaaaaaaaaaaaaaaaaaaaaa" >> kafka.data
echo "bbbbbbbbbbbbbbbbbbbbbbbb" >> kafka.data
echo "cccccccccccccccccccccccc" >> kafka.data
echo "Hello,World" >> kafka.data
(4)查看消费者消费消息的情况
测试成功!!!
小案例
(1)创建两个topic:kafka1,kafka2,kafka1作为最初的数据源,kafka2作为最终数据写入的主题;
kafka-topics.sh --create --zookeeper pseudo01:2181 --replication-factor 1 --partitions 1 --topic kafka1
kafka-topics.sh --create --zookeeper pseudo01:2181 --replication-factor 1 --partitions 2 --topic kafka2
(2)编写三个agent的配置文件k-agent1.conf,k-agent2.conf,k-agent3.conf
-
k-agent1.conf
agent1.sources=s1 agent1.channels=c1 agent1.sinks=k1 agent1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource agent1.sources.s1.batchSize = 5000 agent1.sources.s1.batchDurationMillis = 2000 agent1.sources.s1.kafka.bootstrap.servers = pseudo01:9092 agent1.sources.s1.kafka.topics = kafka1 agent1.sources.s1.kafka.consumer.group.id = kafka-flume01 agent1.channels.c1.type = memory agent1.channels.c1.capacity = 10000 agent1.channels.c1.transactionCapacity = 10000 agent1.channels.c1.byteCapacityBufferPercentage = 20 agent1.channels.c1.byteCapacity = 800000 a1.sinks.k1.type = avro a1.sinks.k1.hostname = pseudo01 a1.sinks.k1.port = 6666 agent1.sources.s1.channels = c1 agent1.sinks.k1.channel = c1
-
k-agent2.conf
agent2.sources=s2 agent2.channels=c2 agent2.sinks=k2 agent2.sources.s2.type = avro agent2.sources.s2.bind = pseudo01 agent2.sources.s2.port = 6666 agent2.channels.c2.type = memory agent2.channels.c2.capacity = 10000 agent2.channels.c2.transactionCapacity = 10000 agent2.channels.c2.byteCapacityBufferPercentage = 20 agent2.channels.c2.byteCapacity = 800000 agent2.sinks.k2.type = avro agent2.sinks.k2.hostname = pseudo01 agent2.sinks.k2.port = 7777 agent2.sources.s2.channels = c2 agent2.sinks.k2.channel = c2
-
k-agent3.conf
agent3.sources=s3 agent3.channels=c3 agent3.sinks=k3 agent3.sources.s3.type = avro agent3.sources.s3.bind = pseudo01 agent3.sources.s3.port = 7777 agent3.channels.c3.type = memory agent3.channels.c3.capacity = 10000 agent3.channels.c3.transactionCapacity = 10000 agent3.channels.c3.byteCapacityBufferPercentage = 20 agent3.channels.c3.byteCapacity = 800000 agent3.sinks.k3.type = org.apache.flume.sink.kafka.KafkaSink agent3.sinks.k3.kafka.topic = kafka2 #注意来自agent1源的主题kafka1的event,会在event的header中加上topic=kafka1 #的属性,如果不设置以下两个属性,event将会被推送到header里topic属性指定的主题中 agent3.sinks.k3.allowTopicOverride = true agent3.sinks.k3.topicHeader = kafka02 agent3.sinks.k3.kafka.bootstrap.servers = pseudo01:9092 agent3.sinks.k3.kafka.flumeBatchSize = 20 agent3.sinks.k3.kafka.producer.acks = 1 agent3.sinks.k3.kafka.producer.linger.ms = 1 #agent3.sinks.k3.type = logger agent3.sources.s3.channels = c3 agent3.sinks.k3.channel = c3
(3)分别开启三个Agent
开启顺序为agent3 ——> agent2 ——> agent1
flume-ng agent -n agent3 -c . -f k-agent3.conf
flume-ng agent -n agent2 -c . -f k-agent2.conf
flume-ng agent -n agent1 -c . -f k-agent1.conf
(4)开启一个生产者向主题kafka01推送消息
kafka-console-producer.sh --broker-list pseudo01:9092 --topic kafka1
(5)开启一个消费者拉取主题kafka02中的消息
kafka-console-consumer.sh --bootstrap-server pseudo01:9092 --zookeeper pseudo01:2181 --topic kafka2
(6)查看结果
生产者向主题kafka1推送的消息
消费者向主题kafka2拉取的数据