一、整体步骤:
1.首先安装kafka,配置flume。创建kafka的topic(利用zookeeper进行管理,所以首先要安装zookeeper)
2.将文件放置在flume的source目录下,启动flume。将文件读取到指定的kafka的topic中。
3.启动的kafka的consumer端。
二、具体整合过程:
1.前提kafka和flume已经安装好,我们主要讲解整合过程。
2,创建kafka的topic
:[root@hadoop11 ~]# kafka-topic.sh --create --topic mytopic --replication-factor 1 --partition 10 --zookeeper localhosts:2181
查看创建topic:
[root@hadoop11 ~]# kafka-topic.sh --list --zookeeper localhosts:2181
3.flume的读取文件到kafka的配置,在flume的conf目录下创建flume-dirToKafka.properties,添加如下配置:
[root@hadoop11 conf]# cat flume-dirToKafka.properties #agent1 name agent1.sources=source1 agent1.sinks=sink1 agent1.channels=channel1 #set source1 agent1.sources.source1.type=spooldir #注意创建目录的权限问题:chmod 777 -R (flumePath)和(dir) agent1.sources.source1.spoolDir=/yangxiaohai/flumePath/dir/logdfs agent1.sources.source1.channels=channel1 agent1.sources.source1.fileHeader = false agent1.sources.source1.interceptors = i1 agent1.sources.source1.interceptors.i1.type = timestamp #set sink1 #设置获取数据存储位置,这里是kafka,如果是hdfs,就设置为相应的hdfs agent1.sinks.sink1.type = org.apache.flume.sink.kafka.KafkaSink agent1.sinks.sink1.topic =mytopic(创建的kafka topic) agent1.sinks.sink1.brokerList = hadoop11:9092,hadoop12:9092,hadoop13:9092 agent1.sinks.sink1.requiredAcks = 1 agent1.sinks.sink1.batchSize = 100 agent1.sinks.sink1.channel = channel1 #set channel1 agent1.channels.channel1.type=file agent1.channels.channel1.checkpointDir=/yangxiaohai/flumePath/dir/logdfstmp/point agent1.channels.channel1.dataDirs=/yangxiaohai/flumePath/dir/logdfstmp |
4.启动flume:
注意:agent1为配置文件中设置的agent命名,要对应,