我有一个包含很多gzip文件的文件夹 . 每个gzip文件都包含xml文件 . 我曾使用flume将文件流式传输到HDFS . 以下是我的配置文件:
agent1.sources = src
agent1.channels = ch
agent1.sinks = sink
agent1.sources.src.type = spooldir
agent1.sources.src.spoolDir = /home/tester/datafiles
agent1.sources.src.channels = ch
agent1.sources.src.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agent1.channels.ch.type = memory
agent1.channels.ch.capacity = 1000
agent1.channels.ch.transactionCapacity = 1000
agent1.sinks.sink.type = hdfs
agent1.sinks.sink.channel = ch
agent1.sinks.sink.hdfs.path = /user/tester/datafiles
agent1.sinks.sink.hdfs.fileType = CompressedStream
agent1.sinks.sink.hdfs.codeC = gzip
agent1.sinks.sink.hdfs.fileSuffix = .gz
agent1.sinks.sink.hdfs.rollInter