Kafka启动
启动Zookeeper和Kafka,过程省略
新建一个Topic,并启动,Topic的名字与下面的名字应该对应,都是topic1:
[hadoop@Slave1 bin]$ sh kafka-topics.sh --create --topic flume1 --replication-factor 1 --partitions 1 --zookeeper Slave1:2181
Created topic "flume1".
[hadoop@Slave1 bin]$ sh kafka-console-consumer.sh --zookeeper Slave1:2181 --topic flume1 --from-beginning
Flume配置
安装好Flume、Zookeeper和Kafka
在Flume的conf目录下,新建一个名为flume-kafkaconf.properties的文件
文件的内容是:
a1.sources = r1
a1.sinks = kafkaSink
a1.channels = memoryChannel
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/hadoop/flumeSpool
a1.sources.r1.fileHeader = true
a1.sources.r1.deletePolicy = never
##########the type of channel is kafka#########
a1.channels.memoryChannel.type = memory
a1.channels.memoryChannel.capacity = 10000
a1.channels.memoryChannel.transactionCapacity = 1000
a1.channels.memoryChannel.byteCapacityBufferPercentage = 20
a1.channles.memoryChannel.byteCapacity = 80000
##########the type of channel is kafka#########
a1.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.kafkaSink.topic=flume1
a1.sinks.kafkaSink.brokerList=Slave1:9092,Slave2:9092,Slave3:9092
a1.sinks.kafkaSink.requiredAcks=1
a1.sinks.kafkaSink.batchSize = 20
#a1..sinks.kafkaS
##########the type of sink is kafka#########
a1.sources.r1.channels = memoryChannel
a1.sinks.kafkaSink.channel = memoryChannel
内容说明:
Sink是kafka,Channel是Memory;
源的类型是spooldir,直接从本地读取文件,要读取的文件放在了home/hadoop/flumeSpool,文件读取结束后,不会删除(另一种选择是立即删除);
Channel类型是内存;
Sink类型是Kafka,最后传递给三台机器:Slave1、Slave2、Slave3,Kafka消费的topic类型是flume1。
Flume增加关于Kafka的JAR文件包
内容省略
Flume启动
[hadoop@Master flume]$ bin/flume-ng agent -c ./conf/ -f conf/flume-kafkaconf.properties -Dflume.root.logger=INFO,console -n a1
备注:这里的a1与配置文件里的a1对应。
启动后,Kafka端就能收到数据。