flume+kafaka 整合 本文在flume和kafka都单独调通以后进行的
kafka作为消费者
创建该文件spooldir.conf
文本内容如下
# Define names for the source, channel, and sink
agent1.sources = source1
agent1.channels = channel1
agent1.sinks = sink1
# Define the properties of the source, which receives event data
agent1.sources.source1.type = netcat
agent1.sources.source1.bind = localhost
agent1.sources.source1.port = 12345
agent1.sources.source1.channels = channel1
# Define the properties of the channel
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 10000
agent1.channels.channel1.transactionCapacity = 1000
#flume 留有kafka的接口只有修改sink就可以
# Define our Kafka sink, which publishes to the app_event topic
agent1.sinks.sink1.type = org.apache.flume.sink.kafka.KafkaSink
agent1.sinks.sink1.topic = app_events
agent1.sinks.sink1.brokerList = localhost:9092
agent1.sinks.sink1.batchSize = 20
agent1.sinks.sink1.channel = channel1
启动flume
flume-ng agent --conf /etc/flume-ng/conf --conf-file ./spooldir.conf --name agent1 -Dflume.root.logger=INFO,console
kafka作为生产者
创建该文件spooldir.conf
文本内容如下
agent1.sources = source1
agent1.channels = channel1
agent1.sinks = sink1
# Define a Kafka source that reads from the calls_placed topic
# The "type" property line wraps around due to its long value
agent1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
agent1.sources.source1.zookeeperConnect = localhost:2181
agent1.sources.source1.topic = calls_placed
agent1.sources.source1.channels = channel1
# Define the properties of the channel
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 10000
agent1.channels.channel1.transactionCapacity = 1000
# Define the sink that writes call data to HDFS
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/training/calls_placed
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.fileSuffix = .csv
agent1.sinks.sink1.channel = channel1
启动flume
flume-ng agent --conf /etc/flume-ng/conf --conf-file ./spooldir.conf --name agent1 -Dflume.root.logger=INFO,console