FlinkKafkaProducer extends TwoPhaseCommitSinkFunction implements CheckpointedFunction, CheckpointListener
TwoPhaseCommitSinkFunction 实现 CheckpointedFunction中的initializeState和snapshotState
CheckpointListener 中的notifyCheckpointComplete
参考:https://cloud.tencent.com/developer/article/1583233
###############################################################################
总结 FlinkKafkaProducer && TPC
1、开启事务:initializeState initializeState TPC.beginTransaction() 开启事务,并初始化KafkaProducer,kafkaProducer 会初始化 accumulateRecord和sender线程
FlinkKafkaProducer.initializeState -> TPC.initializeState
获取状态信息
如果是restore,获取state中未提交的事务,构建kafkaProducer,重新提交事务 getPendingCommitTransactions -> recoverAndCommitInternal -> recoverAndCommit{producer = initTransactionalProducer,producer.commitTransaction()}
初始化用户context,生成 transactionalIds initializeUserContext -> generateNewTransactionalIds -> generateIdsToUse
Flink用一个队列作为transactional id的Pool,新的Transaction开始时从队头拿出一个transactional id,Transaction结束时将transactional id放回队尾。
因为每开始一个Transaction,都会构造一个新的Kafka Producer,因此availableTransactionalIds初始的大小就是配置的Kafka Producer Pool Size(默认是5)
public Set<String> generateIdsToUse(long nextFreeTransactionalId) {
Set<String> transactionalIds = new HashSet<>();
for (int i = 0; i < poolSize; i++) {
long transactionalId = nextFreeTransactionalId + subtaskIndex * poolSize + i;
transactionalIds.add(generateTransactionalId(transactionalId));
}
return transactionalIds;
}
开启新的事务,初始化新的事务关联的producer beginTransactionInternal -> createTransactionalProducer
private FlinkKafkaInternalProducer<byte[], byte[]> createTransactionalProducer() throws FlinkKafkaException {
String transactionalId = availableTransactionalIds.poll();
if (transactionalId == null) {
throw new FlinkKafkaException(
FlinkKafkaErrorCode.PRODUCERS_POOL_EMPTY,
"Too many ongoing snapshots. Increase kafka producers pool size or decrease number of concurrent checkpoints.");
}
FlinkKafkaInternalProducer<byte[], byte[]> producer = initTransactionalProducer(transactionalId, true);
producer.initTransactions();
return producer;
}
initTransactionalProducer -> initProducer -> createProducer -> FlinkKafkaInternalProducer -> kafkaProducer = new KafkaProducer<>(properties) kafkaProducer 会初始化 accumulateRecord和sender线程
2、invoke 方法,调用kafkaProducer.send 将处理的数据写入 kafka
3、预提交kafka事务,同时开启下一个事务 :preCommit, beginTransactionInternal()
snapshotState -> preCommit 每次checkpoint触发时,调用snapshotState,调用FlinkKafkaProducer.preCommit 再调用kafkaProduer.flush方法,将将RecordAccumulator 中未写入完kafka broker中的剩余数据使用sender写入完 。
注: 2,3两步 已经将消息发送到kafka,因为beginTransaction 已经启动sender线程,accumulateRecord中有数据就会发送到kafka, 如果kafkaconsumer的isolation.level为 read_uncommitted(默认),就能读到写入的数据导致脏读,
将其设置为read_committed 才能读到提交的数据,但会有延时,延时时间为checkpoint间隔时间
4、提交kafka事务 : notifyCheckpointComplete -> commit checkpoint 完成时调用 notifyCheckpointComplete -> comit 调用 kafkaProducer.commitTransaction 提交kafka事务
################################################################################
TwoPhaseCommitSinkFunction
1、initializeState checkpoint 初始化
@Override
public void initializeState(FunctionInitializationContext context) throws Exception {
// when we are restoring state with pendingCommitTransactions, we don't really know whether the
// transactions were already committed, or whether there was a failure between
// completing the checkpoint on the master, and notifying the writer here.
// (the common case is actually that is was already committed, the window
// between the commit on the master and the notification here is very small)
// it is possible to not have any transactions at all if there was a failure before
// the first completed checkpoint, or in case of a scale-out event, where some of the
// new task do not have and transactions assigned to check)
// we can have more than one transaction to check in case of a scale-in event, or
// for the reasons discussed in the 'notifyCheckpointComplete()' method.
state = context.getOperatorStateStore().getListState(stateDescriptor);
boolean recoveredUserContext = false;
// 遇到故障重启时
if (context.isRestored()) {
LOG.info("{} - restoring state", name());
for (State<TXN, CONTEXT> operatorState : state.get()) {
userContext = operatorState.getContext();
List<TransactionHolder<TXN>> recoveredTransactions = operatorState.getPendingCommitTransactions();
List<TXN> handledTransactions = new ArrayList<>(recoveredTransactions.size() + 1);
for (TransactionHolder<TXN> recoveredTransaction : recoveredTransactions) {
// If this fails to succeed eventually, there is actually data loss
recoverAndCommitInternal(recoveredTransaction);
handledTransactions.add(recoveredTransaction.handle);
LOG.info("{} committed recovered transaction {}", name(), recoveredTransaction);
}
{
TXN transaction = operatorState.getPendingTransaction().handle;
recoverAndAbort(transaction);
handledTransactions.add(transaction);
LOG.info("{} aborted recovered transaction {}", name(), operatorState.getPendingTransaction());
}
if (userContext.isPresent()) {
finishRecoveringContext(handledTransactions);
recoveredUserContext = true;
}
}
}
// if in restore we didn't get any userContext or we are initializing from scratch
if (!recoveredUserContext) {
LOG.info("{} - no state to restore", name());
userContext = initializeUserContext();
}
this.pendingCommitTransactions.clear();
currentTransactionHolder = beginTransactionInternal();
LOG.debug("{} - started new transaction '{}'", name(), currentTransactionHolder);
}
beginTransactionInternal->
private TransactionHolder<TXN> beginTransactionInternal() throws Exception {
return new TransactionHolder<>(beginTransaction(), clock.millis());
}
// 开启事务
FlinkKafkaProducer.beginTransaction() 见下文
2、snapshotState 每次checkpoint时执行 预提交kafka事务
public void snapshotState(FunctionSnapshotContext context) throws Exception {
// this is like the pre-commit of a 2-phase-commit transaction
// we are ready to commit and remember the transaction
checkState(currentTransactionHolder != null, "bug: no transaction object when performing state snapshot");
long checkpointId = context.getCheckpointId();
LOG.debug("{} - checkpoint {} triggered, flushing transaction '{}'", name(), context.getCheckpointId(), currentTransactionHolder);
// 执行预提交
preCommit(currentTransactionHolder.handle);
pendingCommitTransactions.put(checkpointId, currentTransactionHolder);
LOG.debug("{} - stored pending transactions {}", name(), pendingCommitTransactions);
currentTransactionHolder = beginTransactionInternal();
LOG.debug("{} - started new transaction '{}'", name(), currentTransactionHolder);
state.clear();
state.add(new State<>(
this.currentTransactionHolder,
new ArrayList<>(pendingCommitTransactions.values()),
userContext));
}
preCommit -> FlinkKafkaProducer.preCommit
3、notifyCheckpointComplete checkpoint 完成时提交kafka事务
public final void notifyCheckpointComplete(long checkpointId) throws Exception {
// the following scenarios are possible here
//
// (1) there is exactly one transaction from the latest checkpoint that
// was triggered and completed. That should be the common case.
// Simply commit that transaction in that case.
//
// (2) there are multiple pending transactions because one previous
// checkpoint was skipped. That is a rare case, but can happen
// for example when:
//
// - the master cannot persist the metadata of the last
// checkpoint (temporary outage in the storage system) but
// could persist a successive checkpoint (the one notified here)
//
// - other tasks could not persist their status during
// the previous checkpoint, but did not trigger a failure because they
// could hold onto their state and could successfully persist it in
// a successive checkpoint (the one notified here)
//
// In both cases, the prior checkpoint never reach a committed state, but
// this checkpoint is always expected to subsume the prior one and cover all
// changes since the last successful one. As a consequence, we need to commit
// all pending transactions.
//
// (3) Multiple transactions are pending, but the checkpoint complete notification
// relates not to the latest. That is possible, because notification messages
// can be delayed (in an extreme case till arrive after a succeeding checkpoint
// was triggered) and because there can be concurrent overlapping checkpoints
// (a new one is started before the previous fully finished).
//
// ==> There should never be a case where we have no pending transaction here
//
Iterator<Map.Entry<Long, TransactionHolder<TXN>>> pendingTransactionIterator = pendingCommitTransactions.entrySet().iterator();
Throwable firstError = null;
while (pendingTransactionIterator.hasNext()) {
Map.Entry<Long, TransactionHolder<TXN>> entry = pendingTransactionIterator.next();
Long pendingTransactionCheckpointId = entry.getKey();
TransactionHolder<TXN> pendingTransaction = entry.getValue();
if (pendingTransactionCheckpointId > checkpointId) {
continue;
}
LOG.info("{} - checkpoint {} complete, committing transaction {} from checkpoint {}",
name(), checkpointId, pendingTransaction, pendingTransactionCheckpointId);
logWarningIfTimeoutAlmostReached(pendingTransaction);
try {
// 提交kafka事务
commit(pendingTransaction.handle);
} catch (Throwable t) {
if (firstError == null) {
firstError = t;
}
}
LOG.debug("{} - committed checkpoint transaction {}", name(), pendingTransaction);
pendingTransactionIterator.remove();
}
if (firstError != null) {
throw new FlinkRuntimeException("Committing one of transactions failed, logging first encountered failure",
firstError);
}
}
commit ->FlinkKafkaProducer.commit
FlinkKafkaProducer
1、beginTransaction
protected FlinkKafkaProducer.KafkaTransactionState beginTransaction() throws FlinkKafkaException {
switch (semantic) {
case EXACTLY_ONCE:
// 创建KafkaProducer
FlinkKafkaInternalProducer<byte[], byte[]> producer = createTransactionalProducer();
producer.beginTransaction();
return new FlinkKafkaProducer.KafkaTransactionState(producer.getTransactionalId(), producer);
case AT_LEAST_ONCE:
case NONE:
// Do not create new producer on each beginTransaction() if it is not necessary
final FlinkKafkaProducer.KafkaTransactionState currentTransaction = currentTransaction();
if (currentTransaction != null && currentTransaction.producer != null) {
return new FlinkKafkaProducer.KafkaTransactionState(currentTransaction.producer);
}
return new FlinkKafkaProducer.KafkaTransactionState(initNonTransactionalProducer(true));
default:
throw new UnsupportedOperationException("Not implemented semantic");
}
}
--> createTransactionalProducer -> initTransactionalProducer -> initProducer -> createProducer -> new FlinkKafkaInternalProducer
-> new KafkaProducer -> KafkaProducer
--> KafkaProducer // 以下为kafka源码
// 创建RecordAccumulator
this.accumulator = new RecordAccumulator
// 创建sender对象并启动
this.sender = newSender(logContext, kafkaClient, this.metadata);
String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;
this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
this.ioThread.start();
2、invoke
public void invoke(FlinkKafkaProducer.KafkaTransactionState transaction, IN next, Context context) throws FlinkKafkaException {
// 此步就已经将消息发送到kafka,因为beginTransaction 以启动sender线程, 如果kafkaconsumer的isolation.level为
// read_uncommitted(默认),就能读到写入的数据导致脏读, 将其设置为read_committed 才能读到提交的数据,但会有延时,
// 延时时间为checkpoint间隔时间
transaction.producer.send(record, callback);
}
3、preCommit
protected void preCommit(FlinkKafkaProducer.KafkaTransactionState transaction) throws FlinkKafkaException {
switch (semantic) {
case EXACTLY_ONCE:
case AT_LEAST_ONCE:
// 刷新数据,将RecordAccumulator 中未写入完kafka broker中的数据使用sender写入完
flush(transaction);
break;
case NONE:
break;
default:
throw new UnsupportedOperationException("Not implemented semantic");
}
checkErroneous();
}
--> flush -> transaction.producer.flush() -> kafkaProducer.flush()->
/**
* Invoking this method makes all buffered records immediately available to send (even if <code>linger.ms</code> is
* greater than 0) and blocks on the completion of the requests associated with these records.
*/
public void flush() {
log.trace("Flushing accumulated records in producer.");
this.accumulator.beginFlush();
this.sender.wakeup();
try {
this.accumulator.awaitFlushCompletion();
} catch (InterruptedException e) {
throw new InterruptException("Flush interrupted.", e);
}
}
4、commit
protected void commit(FlinkKafkaProducer.KafkaTransactionState transaction) {
if (transaction.isTransactional()) {
try {
transaction.producer.commitTransaction();
} finally {
recycleTransactionalProducer(transaction.producer);
}
}
}
commitTransaction-> kafkaProducer.commitTransaction()