转载请注明出处:http://blog.csdn.net/l1028386804/article/details/79464949
Spout
ITransactionalSpout<T>,同BaseTransactionalSpout<T>,普通事务Spout
IPartitionedTransactionalSpout<T>,同BasePartitionedTransactionalSpout<T>,分区事务Spout
IOpaquePartitionedTransactionalSpout<T>:同BaseOpaquePartitionedTransactionalSpout<T>,不透明分区事务Spout
Bolt
IBatchBolt<T>:同BaseBatchBolt<T>,普通批处理
BaseTransactionalBolt:事务Bolt
接口Icommitter:标识IBatchBolt 或BaseTransactionalBolt是否是一个committerCoordinatedBolt
ITransactionalSpout<T>普通事务Spout
ITransactionalSpout<T>:普通事务Spout
-- ITransactionalSpout.Coordinator<X>
--initializeTransaction(BigInteger txid, X prevMetadata) :
创建一个新的metadata,当isReady() 为true时,发射该metadata(事务tuple)到“batch emit”流
--isReady() :为true时启动新事务,需要时可以在此sleep
-- ITransactionalSpout.Emitter<X>
-- emitBatch(TransactionAttempt tx, X coordinatorMeta, BatchOutputCollector collector) :逐个发射batch的tuple
IPartitionedTransactionalSpout<T>:分区事务Spout
IPartitionedTransactionalSpout<T>:分区事务Spout,主流事务Spout,原因是目前主流Message Queue都支持分区,分区的作用是增加MQ的吞吐量(每个分区作为一个数据源发送点),主流MQ如Kafka、RocketMQ
-- IPartitionedTransactionalSpout.Coordinator
-- isReady() :同上
-- numPartitions() :返回分区个数。当增加了数据源新分区,同时一个事务被replayed ,此时则不发射新分区的tuples,因为它知道该事务中有多少个分区。
-- IPartitionedTransactionalSpout.Emitter<X>
--emitPartitionBatchNew(TransactionAttempt tx, BatchOutputCollector collector, int partition, X lastPartitionMeta) :发射一个新的Batch,返回Metadata
--emitPartitionBatch(TransactionAttempt tx, BatchOutputCollector collector, int partition, X partitionMeta) :如果这批消息Bolt消费失败了,emitPartitionBatch负责重发这批消息
IOpaquePartitionedTransactionalSpout:不透明分区事务Spout
IOpaquePartitionedTransactionalSpout<T>:不透明分区事务Spout
--IOpaquePartitionedTransactionalSpout.Coordinator
--isReady() :同上
--IOpaquePartitionedTransactionalSpout.Emitter<X>
-- emitPartitionBatch(TransactionAttempt tx, BatchOutputCollector collector, int partition, X lastPartitionMeta)
-- numPartitions()
它不区分发新消息还是重发旧消息,全部用emitPartitionBatch搞定。虽然emitPartitionBatch返回的X应该是下一批次供自己使用的(emitPartitionBatch的第4个参数),但是只有一个批次成功以后X才会更新到ZooKeeper中,如果失败重发,emitPartitionBatch读取的X还是旧的。所以这时候自定义的X不需要记录当前批次的开始位置和下一批次的开始位置两个值,只需要记录下一批次开始位置一个值即可,例如:
public class BatchMeta {
public long nextOffset; //下一批次的偏移量
}