Kafka整合Flume

Kafka整合Flume

1.Kafka作为Flume的sources

Kafka Source

Kafka Source是一个Kafka消费者(consumer),从KafkaTopic消费消息

[外链图片转存失败(img-4eVge4wK-1563654956657)(img/source01.png)]

测试1

sourceskafka sourcesinksHDFS

测试准备

(1)创建一个主题flume01,作为flume的数据源

kafka-topics.sh --create --zookeeper pseudo01:2181 --replication-factor 1 --partitions 1 --topic flume01

(2)在HDFS上创建一个目录kafka_test01,用于存放flume写出的数据

hdfs dfs -mkdir /kafka_test01
测试

(1)编写flume agent配置文件kafka_hdfs.conf

agent1.sources=s1
agent1.channels=c1
agent1.sinks=k1

#kafka source s1的属性
agent1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
agent1.sources.s1.batchSize = 5000
agent1.sources.s1.batchDurationMillis = 2000
agent1.sources.s1.kafka.bootstrap.servers = pseudo01:9092
agent1.sources.s1.kafka.topics = flume01
agent1.sources.s1.kafka.consumer.group.id = kafka-flume01

agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 10000
agent1.channels.c1.transactionCapacity = 10000
agent1.channels.c1.byteCapacityBufferPercentage = 20
agent1.channels.c1.byteCapacity = 800000

#HDFS sinks k1的属性
agent1.sinks.k1.type = hdfs
agent1.sinks.k1.hdfs.path = /kafka_test01/kafka01/%y-%m-%d/%H%M
agent1.sinks.k1.hdfs.filePrefix = kafka-
agent1.sinks.k1.hdfs.fileSuffix = .log
agent1.sinks.k1.hdfs.inUseSuffix = .tmp
agent1.sinks.k1.hdfs.fileType = DataStream
agent1.sinks.k1.hdfs.writeFormat = Text
agent1.sinks.k1.hdfs.useLocalTimeStamp = true
agent1.sinks.k1.hdfs.rollCount = 30
agent1.sinks.k1.hdfs.round = true
agent1.sinks.k1.hdfs.roundValue = 10
agent1.sinks.k1.hdfs.roundUnit = minute

agent1.sources.s1.channels = c1
agent1.sinks.k1.channel = c1

(2)开启该agent

flume-ng agent -n agent1 -c . -f kafka_hdfs.conf 

(3)开启一个生产者,向主题flume01发送消息

[外链图片转存失败(img-8dvPE24c-1563654956659)(img/kafka03.png)]

(4)查看写入HDFS中的数据

在这里插入图片描述

[外链图片转存失败(img-WRG11YBz-1563654956660)(img/hdfs02.png)]

测试成功!!!

2.Kafka作为Flume的sinks

Kafka Sinks

这是一个Flume Sink实现,可以将数据发布到Kafka topic。其中一个目标是将Flume与Kafka集成,使基于拉取数据的处理系统能够处理来自不同Flume源的数据。

[外链图片转存失败(img-Ax10FxEn-1563654956660)(img/sink01.png)]

测试2

监控某一个文件kafka.data的内容,并将文件内容作为数据源发送消息到kafka主题flume02

测试准备

(1)创建要监控的文件kafka.data

touch kafka.data

(2)创建一个主题flume02存储flume生产的消息

 kafka-topics.sh --create --zookeeper pseudo01:2181 --replication-factor 1 --partition 2 --topic flume02

(3)创建一个消费者消费主题flume02中的消息。

 kafka-console-consumer.sh --bootstrap-server pseudo01:9092 --zookeeper pseudo01:2181 --topic flume02 --from-beginning
测试

(1)创建agent配置文件flume_kafka.conf

agent1.sources=f1
agent1.channels=c1
agent1.sinks=k1

agent1.sources.f1.type = exec
agent1.sources.f1.command = tail -F /root/flume_test/kafka.data

agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 10000
agent1.channels.c1.transactionCapacity = 10000
agent1.channels.c1.byteCapacityBufferPercentage = 20
agent1.channels.c1.byteCapacity = 800000

agent1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
agent1.sinks.k1.kafka.topic = flume02
agent1.sinks.k1.kafka.bootstrap.servers = pseudo01:9092
agent1.sinks.k1.kafka.flumeBatchSize = 20
agent1.sinks.k1.kafka.producer.acks = 1
agent1.sinks.k1.kafka.producer.linger.ms = 1

agent1.sources.f1.channels = c1
agent1.sinks.k1.channel = c1

(2)开启该agent

flume-ng agent -n agent1 -c . -f flume_kafka.conf

(3)向kafka.data中写入数据

echo "111111111111111111111111" >> kafka.data
echo "222222222222222222222222" >> kafka.data
echo "333333333333333333333333" >> kafka.data
echo "444444444444444444444444" >> kafka.data
echo "555555555555555555555555" >> kafka.data
echo "666666666666666666666666" >> kafka.data
echo "777777777777777777777777" >> kafka.data
echo "888888888888888888888888" >> kafka.data
echo "999999999999999999999999" >> kafka.data
echo "000000000000000000000000" >> kafka.data
echo "aaaaaaaaaaaaaaaaaaaaaaaa" >> kafka.data
echo "bbbbbbbbbbbbbbbbbbbbbbbb" >> kafka.data
echo "cccccccccccccccccccccccc" >> kafka.data
echo "Hello,World" >> kafka.data

(4)查看消费者消费消息的情况

[外链图片转存失败(img-wFb8OI2i-1563654956661)(img/consume.png)]

测试成功!!!

小案例

[外链图片转存失败(img-7YEGc1kt-1563654956661)(img/flume02.png)]

(1)创建两个topic:kafka1kafka2kafka1作为最初的数据源,kafka2作为最终数据写入的主题;

kafka-topics.sh --create --zookeeper pseudo01:2181 --replication-factor 1 --partitions 1 --topic kafka1
kafka-topics.sh --create --zookeeper pseudo01:2181 --replication-factor 1 --partitions 2 --topic kafka2

(2)编写三个agent的配置文件k-agent1.confk-agent2.confk-agent3.conf

  • k-agent1.conf

    agent1.sources=s1
    agent1.channels=c1
    agent1.sinks=k1
    
    agent1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
    agent1.sources.s1.batchSize = 5000
    agent1.sources.s1.batchDurationMillis = 2000
    agent1.sources.s1.kafka.bootstrap.servers = pseudo01:9092
    agent1.sources.s1.kafka.topics = kafka1
    agent1.sources.s1.kafka.consumer.group.id = kafka-flume01
    
    agent1.channels.c1.type = memory
    agent1.channels.c1.capacity = 10000
    agent1.channels.c1.transactionCapacity = 10000
    agent1.channels.c1.byteCapacityBufferPercentage = 20
    agent1.channels.c1.byteCapacity = 800000
    
    a1.sinks.k1.type = avro
    a1.sinks.k1.hostname = pseudo01
    a1.sinks.k1.port = 6666
    
    agent1.sources.s1.channels = c1
    agent1.sinks.k1.channel = c1
    
  • k-agent2.conf

    agent2.sources=s2
    agent2.channels=c2
    agent2.sinks=k2
    
    agent2.sources.s2.type = avro
    agent2.sources.s2.bind = pseudo01
    agent2.sources.s2.port = 6666
    
    agent2.channels.c2.type = memory
    agent2.channels.c2.capacity = 10000
    agent2.channels.c2.transactionCapacity = 10000
    agent2.channels.c2.byteCapacityBufferPercentage = 20
    agent2.channels.c2.byteCapacity = 800000
    
    agent2.sinks.k2.type = avro
    agent2.sinks.k2.hostname = pseudo01
    agent2.sinks.k2.port = 7777
    
    agent2.sources.s2.channels = c2
    agent2.sinks.k2.channel = c2
    
  • k-agent3.conf

      agent3.sources=s3
      agent3.channels=c3
      agent3.sinks=k3
      
      agent3.sources.s3.type = avro
      agent3.sources.s3.bind = pseudo01
      agent3.sources.s3.port = 7777
      
      agent3.channels.c3.type = memory
      agent3.channels.c3.capacity = 10000
      agent3.channels.c3.transactionCapacity = 10000
      agent3.channels.c3.byteCapacityBufferPercentage = 20
      agent3.channels.c3.byteCapacity = 800000
      
      agent3.sinks.k3.type = org.apache.flume.sink.kafka.KafkaSink
      agent3.sinks.k3.kafka.topic = kafka2
      #注意来自agent1源的主题kafka1的event,会在event的header中加上topic=kafka1
      #的属性,如果不设置以下两个属性,event将会被推送到header里topic属性指定的主题中
      agent3.sinks.k3.allowTopicOverride = true
      agent3.sinks.k3.topicHeader = kafka02
      agent3.sinks.k3.kafka.bootstrap.servers = pseudo01:9092
      agent3.sinks.k3.kafka.flumeBatchSize = 20
      agent3.sinks.k3.kafka.producer.acks = 1
      agent3.sinks.k3.kafka.producer.linger.ms = 1
      
      #agent3.sinks.k3.type = logger
      
      agent3.sources.s3.channels = c3
      agent3.sinks.k3.channel = c3
    

(3)分别开启三个Agent

开启顺序为agent3 ——> agent2 ——> agent1

flume-ng agent -n agent3 -c . -f k-agent3.conf
flume-ng agent -n agent2 -c . -f k-agent2.conf
flume-ng agent -n agent1 -c . -f k-agent1.conf

(4)开启一个生产者向主题kafka01推送消息

 kafka-console-producer.sh --broker-list pseudo01:9092 --topic kafka1

(5)开启一个消费者拉取主题kafka02中的消息

kafka-console-consumer.sh --bootstrap-server pseudo01:9092 --zookeeper pseudo01:2181 --topic kafka2

(6)查看结果

生产者向主题kafka1推送的消息
[外链图片转存失败(img-1c2kzREr-1563654956661)(img/kafka1.png)]
消费者向主题kafka2拉取的数据
[外链图片转存失败(img-s18N0fXD-1563654956661)(img/kafka2.png)]

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值