Kafka创建Topic
kafka-topics.sh --create --zookeeper 192.168.**.**:2181 --topic fk_raw --partitions 1 --replication-factor 1
Flume创建agent
- 在flume/conf目录下创建:
vi /opt/flume/conf/fk.conf
输入以下内容
fk.sources=fkSource
fk.channels=fkChannel
userfriend.sinks=fkSink
fk.sources.fkSource.type=spooldir
fk.sources.fkSource.spoolDir=/opt/flume/conf/job/dataSourceFile/userfriend
fk.sources.fkSource.deserializer=LINE
fk.sources.fkSource.deserializer.maxLineLength=320000
fk.sources.fkSource.includePattern=fk_[0-9]{4}-[0-9]{2}-[0-9]{2}.csv
fk.sources.fkSource.interceptors=head_filter
fk.sources.fkSource.interceptors.head_filter.type=regex_filter
fk.sources.fkSource.interceptors.head_filter.regex=^fk*
fk.sources.fkSource.interceptors.head_filter.excludeEvents=true
fk.channels.fkChannel.type=file
fk.channels.fkChannel.checkpointDir=/opt/flume/conf/job/cheakPointFile/userfriend
fk.channels.fkChannel.dataDirs=/opt/flume/conf/job/dataChannelFile/userfriend
fk.sinks.fkSink.type=org.apache.flume.sink.kafka.KafkaSink
fk.sinks.fkSink.batchSize=640
fk.sinks.fkSink.brokerList=192.168.**.**:9092
fk.sinks.fkSink.topic=fk_raw
fk.sources.fkSource.channels=fkChannel
fk.sinks.fkSink.channel=fkChannel
- 在flume目录下执行:
./bin/flume-ng agent --n fk ./conf/ --conf-file ./conf/job/fk.conf -Dflume.root.logger=INFO,console
- 将需要读取的数据放入spoolDir配置的目录下,Flume就会自动拉取
- 数据文件命名示例:fk_2020-12-08.csv
- 数据示例
fk
1
2
3
4
5
Kafka消费
- 先查看topic分区内数据条数:
kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list 192.168.**.**:9092 --topic fk_raw -time -1 --offsets 1
- 消费Topic内数据:
kafka-console-consumer.sh --bootstrap-server 192.168.**.**:9092 --topic fk_raw --from-beginning