Flume NG flume-hdfs-sink 源代码分析

C1: HDFSEventSink

0. HDFSEventSink.configure() also needs to implement a Configurable interface for processing its own configuration settings.

0.1 从context中读取配置参数configure

0.2 设置编码,

      codeC = getCodec(codecName);
      // TODO : set proper compression type
      compType = CompressionType.BLOCK;

0.2.1 getCodec()

(1) 通过 CompressionCodecFactory.getCodecClasses(conf); 获取所能兼容的编码类型codecs 

(2) 通过codecMatches(cls, codecName)判断是否相等,以获取编码名codecName所对应的编码类;

(3) 获取codec = cls.newInstance(),

(4) 


0.3 set writeFormat

if writeFormat = null, 

then set format according to file type, if fileType= DataStreamType or CompStreamType, set 


1. HDFSEventSink.start()  method should initialize the sink and bring it to a state where it can forward the events to its next destination.


2. HDFSEventSink.process() method from sink interface is should do the core processing of extracting the event from channel and forwarding it. 


3. HDFSEventSink.stop() method should do the necessary cleanup. 




HDFSEventSink will call 

(2) HDFSFormatterFactory  

(2.1) HDFSWriterableFormatter

(2.2) HDFSTextFormatter

(3) HDFSWriterFactory

(3.1) HDFSSequenceFile

(3.2) HDFSDataStream

(3.3) HDFSCompressDataStream

(4) BucketWriter

(5) HDFSWriter




 FLUME-1104 : HDFS rolls the first file incorrectly

The sink process() keep tracks of the buckets opened during the transaction. At the end of transaction, we need to flush all the buckets that has pending data. This is required in order to ensure that the data removed from channel should be safely in HDFS during commit.
Currently the files are tracked only when they are created and also getting closed during the cleanup instead of flush.

The fix is to track buckets every time they are written to in the current transaction. Also buckets with pending data should be flushed instead of close.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值