当使用hdfs sink时 有可能会产生严重的小文件问题。
通过配置rollInterval
, rollSize
, rollCount
三个参数来缓解小文件问题。
a1.sinks.hdfssink.type = hdfs
a1.sinks.hdfssink.hdfs.path = hdfs://c1:8020/flume/alertlog/%y%m%d%H%M/origin
a1.sinks.hdfssink.filePrefix = alert-
a1.sinks.hdfssink.hdfs.useLocalTimeStamp = true
a1.sinks.hdfssink.hdfs.rollInterval = 60
a1.sinks.hdfssink.hdfs.rollSize = 10485760
a1.sinks.hdfssink.hdfs.rollCount = 0
a1.sinks.hdfssink.hdfs.codeC = snappy
a1.sinks.hdfssink.hdfs.fileType = CompressedStream
a1.sinks.hdfssink.hdfs.writeFormat = Text