开篇一张图,读者更幸福,不多说上架构图。
这个架构图我们在前面一篇文章《kafka生产者的蓄水池机制》里面介绍过,上一篇我们是介绍了这个图里面的消息收集过程(我们成为“蓄水池”机制),这里我们就介绍它的另外一部分,消息的发送机制。
1.1、Sender运行过程
所有的消息发送,都是从Sender线程开始,它是一个守护线程,所以我们首先就需要来看一下Sender的run方法,最外层的run方式是一个主循环不断调用具体逻辑运行方法run,我们看下它的具体逻辑处理run方法:
void run(long now) {
//生产者事务管理相关处理,本章节不做具体分析,后面专门章节再做分析,大家先了解一下
if (transactionManager != null) {
try {
if (transactionManager.shouldResetProducerStateAfterResolvingSequences())
// Check if the previous run expired batches which requires a reset of the producer state.
transactionManager.resetProducerId();
if (!transactionManager.isTransactional()) {
// this is an idempotent producer, so make sure we have a producer id
maybeWaitForProducerId();
} else if (transactionManager.hasUnresolvedSequences() && !transactionManager.hasFatalError()) {
transactionManager.transitionToFatalError(new KafkaException("The client hasn't received acknowledgment for " +
"some previously sent messages and can no longer retry them. It isn't safe to continue."));
} else if (transactionManager.hasInFlightTransactionalRequest() || maybeSendTransactionalRequest(now)) {
// as long as there are outstanding transactional requests, we simply wait for them to return
client.poll(retryBackoffMs, now);
return;
}
// do not continue sending if the transaction manager is in a failed state or if there
// is no producer id (for the idempotent case).
if (transactionManager.hasFatalError() || !transactionManager.hasProducerId()) {
RuntimeException lastError = transactionManager.lastError();
if (lastError != null)
maybeAbortBatches(lastError);
client.poll(retryBackoffMs, now);
return;
} else if (transactionManager.hasAbortableError()) {
accumulator.abortUndrainedBatches(transactionManager.lastError());
}
} catch (AuthenticationException e) {
// This is already logged as error, but propagated here to perform any clean ups.
log.trace("Authentication exception while processing transactional request: {}", e);
transactionManager.authenticationFailed(e);
}
}
//实际的数据发送请求,并处理服务端响应
long pollTimeout = sendProducerData(now);
client.poll(pollTimeout, now);
}
接下来我们从两个层面来看,一个是消息发送,一个是消息返回响应处理。
1.2、消息的发送
先看下sendProducerData的具体逻辑:
private long sendProducerData(long now) {
//获取集群信息
Cluster cluster = metadata.fetch();
// 获取那些可以发送消息的分区列表信息
RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);
// 如果这些分区没有对应的leader,就需要强制对metadata信息进行更新
if (!result.unknownLeaderTopics.isEmpty()) {
// 没有leader的场景例如leader选举,或者topic已失效,这些都需要将topic重新加入,发送到服务端请求更新,因为现在还需要往这些topic发送消息
for (String topic : result.unknownLeaderTopics)
this.metadata.add(topic);
this.metadata.requestUpdate();
}
// 遍历所有获取到的网络节点,基于网络连接状态来检测这些节点是否可用,如果不可用则剔除
Iterator<Node> iter = result.readyNodes.iterator();
long notReadyTimeout = Long.MAX_VALUE;
while (iter.hasNext()) {
Node node = iter.next();
//节点连接状态检查,如果允许连接,则重新创建连接
if (!this.client.ready(node, now)) {
//未准备好的节点删除
iter.remove();
notReadyTimeout = Math.min(notReadyTimeout, this.client.connectionD