flume采集kafka数据到hdfs,不会产生大量小文件的配置

Name the components on this agent

a1.sources=r1
a1.channels=c1
a1.sinks=k1

source

a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.channels = c1
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = cdh01:9092,cdh02:9092,cdh03:9092
a1.sources.r1.kafka.topics = DayFreezingDataTest

channel1

a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20

#官网说明
#Maximum total bytes of memory allowed as a sum of all events in this channel. The implementation only counts the Event body, which is the reason for providing the byteCapacityBufferPercentage configuration parameter as well. Defaults to a computed value equal to 80% of the maximum memory available to the JVM (i.e. 80% of the -Xmx value passed on the command line). Note that if you have multiple memory channels on a single JVM, and they happen to hold the same physical events (i.e. if you are using a replicating channel selector from a single source) then those event sizes may be double-counted for channel byteCapacity purposes. Setting this value to 0 will cause this value to fall back to a hard internal limit of about 200 GB.
#值太小会引发这个问题
#Cannot commit transaction. Byte capacity allocated to store event body 640000.0reached. Please increase heap space/byte capacity allocated to the channel as the sinks may not be keeping up with the sources

a1.channels.c1.byteCapacity = 800000

sink

a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /event/flume/kafkaToHDFS/test/DayFreezingDataTest/%y-%m-%d/%H
a1.sinks.k1.hdfs.filePrefix = event-

表示一个小时生成一个文件夹

a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 1
a1.sinks.k1.hdfs.roundUnit = hour

a1.sinks.k1.hdfs.useLocalTimeStamp=true
##如果kafka数据量很大,可以调大这个参数。调到8万,就算kafka里面大几十G都够了
a1.sinks.k1.hdfs.batchSize=1000
a1.sinks.k1.hdfs.fileType=DataStream

表示10分钟或者128M生成一个文件

a1.sinks.k1.hdfs.rollInterval=600
a1.sinks.k1.hdfs.rollSize=134217700
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.minBlockReplicas=1

  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值