Flume消费Kafka数据写入HDFS

主要是消费kafka中不同topic落入不同的文件夹上

这里仅作为参考
因为我是在华为云上使用Flume消费Kafka数据到OBS(华为云的对象存储产品 s3a)适当做了修改

Flume版本1.6.0

#这里Source1、2 channel01 sink02 随便写 只需要在后面引用的时候对应起来就好
server.sources = Source1 Source2 
server.channels = channel01 channel02 
server.sinks = sink1 sink2 


##########========= source1的配置===========###############
server.sources.Source1.type = org.apache.flume.source.kafka.KafkaSource
server.sources.Source1.channels = channel01
server.sources.Source1.inputCharset = UTF-8
server.sources.Source1.monTime = 0
server.sources.Source1.nodatatime = 0
server.sources.Source1.batchSize = 1000
server.sources.Source1.batchDurationMillis = 1000
server.sources.Source1.keepTopicInHeader = false
server.sources.Source1.keepPartitionInHeader = false
server.sources.Source1.kafka.bootstrap.servers = 192.168.1.1:9092,192.168.1.2:9092
server.sources.Source1.kafka.consumer.group.id = GroupA
server.sources.Source1.kafka.topics = TopicA
server.sources.Source1.kafka.security.protocol = PLAINTEXT

##########========= channel01的配置===========###############
server.channels.channel01.type = org.apache.flume.channel.kafka.KafkaChannel
server.channels.channel01.kafka.bootstrap.servers = 192.168.1.1:9092,192.168.1.2:9092
server.channels.channel01.kafka.topic = TopicA
server.channels.channel01.kafka.consumer.group.id = GroupA
server.channels.channel01.parseAsFlumeEvent = false
server.channels.channel01.migrateZookeeperOffsets = true
server.channels.channel01.kafka.consumer.auto.offset.reset = earliest
server.channels.channel01.kafka.producer.security.protocol = PLAINTEXT
server.channels.channel01.kafka.consumer.security.protocol = PLAINTEXT


##########========= sink1的配置===========###############
server.sinks.sink1.channel = channel01
server.sinks.sink1.type = hdfs
server.sinks.sink1.monTime = 0
server.sinks.sink1.hdfs.path = /Test/B
#### 临时文件名后缀
server.sinks.sink1.hdfs.inUseSuffix = .tmp
#### 十分钟保存一次
server.sinks.sink1.hdfs.round = true
server.sinks.sink1.hdfs.roundValue = 10 
server.sinks.sink1.hdfs.roundUnit = minute
#### 文件达到100M写新文件
server.sinks.sink1.hdfs.rollInterval = 0
server.sinks.sink1.hdfs.rollSize=1048570
server.sinks.sink1.hdfs.rollCount=0

server.sinks.sink1.hdfs.batchSize = 10000
server.sinks.sink1.hdfs.calltimeout = 10000
server.sinks.sink1.hdfs.fileCloseByEndEvent = false
server.sinks.sink1.hdfs.batchCallTimeout =
server.sinks.sink1.hdfs.serializer.appendNewline = true
#writeFormat 为Text or Writable(这个生成的是csv文件)
server.sinks.sink1.hdfs.writeFormat = Text
#filePrefix生成文件的前缀
server.sinks.sink1.hdfs.filePrefix = RoneA
#fileType元数据直接写入 不做压缩什么的
server.sinks.sink1.hdfs.fileType = DataStream


##########========= source2的配置===========###############
server.sources.Source2.type = org.apache.flume.source.kafka.KafkaSource
server.sources.Source2.channels = channel02
server.sources.Source2.inputCharset = UTF-8
server.sources.Source2.monTime = 0
server.sources.Source2.nodatatime = 0
server.sources.Source2.batchSize = 1000
server.sources.Source2.batchDurationMillis = 1000
server.sources.Source2.keepTopicInHeader = false
server.sources.Source2.keepPartitionInHeader = false
server.sources.Source2.kafka.bootstrap.servers = 192.168.1.1:9092,192.168.1.2:9092
server.sources.Source2.kafka.consumer.group.id = group01
server.sources.Source2.kafka.topics = TopicB
server.sources.Source2.kafka.security.protocol = PLAINTEXT


##########========= channel01的配置===========###############
server.channels.channel02.type = org.apache.flume.channel.kafka.KafkaChannel
server.channels.channel02.kafka.bootstrap.servers = 192.168.1.1:9092,192.168.1.2:9092
server.channels.channel02.kafka.topic = TopicB
server.channels.channel02.kafka.consumer.group.id = group01
server.channels.channel02.parseAsFlumeEvent = flase
server.channels.channel02.migrateZookeeperOffsets = true
server.channels.channel02.kafka.consumer.auto.offset.reset = earliest
server.channels.channel02.kafka.producer.security.protocol = PLAINTEXT
server.channels.channel02.kafka.consumer.security.protocol = PLAINTEXT


##########========= sink的配置===========###############
server.sinks.sink2.channel = channel02
server.sinks.sink2.type = hdfs
server.sinks.sink2.monTime = 0
server.sinks.sink2.hdfs.path = /test/A
server.sinks.sink2.hdfs.inUseSuffix = .tmp
#### 十分钟保存一次
server.sinks.sink2.hdfs.round = true
server.sinks.sink2.hdfs.roundValue = 10
server.sinks.sink2.hdfs.roundUnit = minute
#### 文件达到100M写新文件
server.sinks.sink2.hdfs.rollInterval = 0
server.sinks.sink2.hdfs.rollSize=1048570
server.sinks.sink2.hdfs.rollCount=0
server.sinks.sink2.hdfs.batchSize = 10000
server.sinks.sink2.hdfs.calltimeout = 10000
server.sinks.sink2.hdfs.fileCloseByEndEvent = false
server.sinks.sink2.hdfs.batchCallTimeout =
server.sinks.sink2.hdfs.serializer.appendNewline = true
server.sinks.sink2.hdfs.writeFormat = Text
#文件前缀
server.sinks.sink2.hdfs.filePrefix = RoneB
server.sinks.sink2.hdfs.fileType = DataStream

这里应该可以配置一个Source 然后对应n个Channel + n个Sink的,由于时间的原因我就没试了,理论上可以的。
Flume Source参数 、Sink参数等等我会重新写一篇各种源的配置文档。
各种详细常用的配置在这里
Flume对接各种常用组件

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值