flume sink到hdfs第一列是时间戳,怎么去掉?如下
1492665578789 111
1492665580789 222
1492666625916 qqqq
1492664454650
1492664455642 q
【问题描述】 用flume收集本地文件夹下的文件变动 source的类型是:spooldir
配置文件如下:
LogAgent.sources = mysource
LogAgent.channels = mychannel
LogAgent.sinks = mysink
LogAgent.sources.mysource.type= spooldir
LogAgent.sources.mysource.fileHeader = true
LogAgent.sources.mysource.deserializer.outputCharset=UTF-8
LogAgent.sources.mysource.channels=mychannel
LogAgent.sources.mysource.spoolDir=/tmp/logs
LogAgent.sources.mysource.basenameHeader=true
LogAgent.sources.mysource.basenameHeaderKey=fileName
LogAgent.sinks.mysink.channel= mychannel
LogAgent.sinks.mysink.type=hdfs
LogAgent.sinks.mysink.hdfs.path=hdfs://master:9000/data/logs/%Y/%m/%d/%H/
LogAgent.sinks.mysink.hdfs.filePrefix=%{fileName}
LogAgent.sinks.mysink.hdfs.batchSize=1000
LogAgent.sinks.mysink.hdfs.rollSize=0
LogAgent.sinks.mysink.hdfs.rollCount=10000
LogAgent.sinks.mysink.hdfs.useLocalTimeStamp=true
LogAgent.channels.mychannel.type=memory
LogAgent.channels.mychannel.capacity=1000000
LogAgent.channels.mychannel.transactionCapacity=300000
【问题解决】
情况一:配置文件中应该加入以下的内容,让hdfs知道文件的格式:
LogAgent.sinks.mysink.hdfs.fileType=DataStream
LogAgent.sinks.mysink.hdfs.writeFormat=Text
官网的解释:
Name | Default | Description |
---|
hdfs.fileType | SequenceFile | File format: currently SequenceFile, DataStream or CompressedStream (1)DataStream will not compress output file and please don’t set codeC (2)CompressedStream requires set hdfs.codeC with an available codeC |