一 序
通过上一篇的分析《RecordAccumulator(3)》我们知道,主线程通过KafkaProducer.send()方法将消息放入RecordAccumulator中缓存,并没有实际的网络I/O操作。网络操作是由Sender统一进行的。
sender发消息的大概流程。
- 用RecordAccumulator.ready()方法,根据RecordAccumulator的缓存情况,筛选出可以向哪些Node节点发送消息。
- 根据生产者与各个节点的连接情况(由NetworkClient管理),过滤Node节点。
- 创建请求,每个Node节点只生成一个请求。
-
调用NetworkClient将请求发送出去。
二 sender的创建与成员变量
先回顾下KafkaProducer的构造函数,可以看出,Sender就是KafkaProducer中创建的一个Thread.
KafkaProducer(ProducerConfig config,
Serializer<K> keySerializer,
Serializer<V> valueSerializer,
Metadata metadata,
KafkaClient kafkaClient,
ProducerInterceptors interceptors,
Time time) {
try {
...
//核心
this.sender = newSender(logContext, kafkaClient, this.metadata);
String ioThreadName = NETWORK_THREAD_PREFIX + " | " + clientId;
//启动sender对应的线程
this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
this.ioThread.start();
config.logUnused();
AppInfoParser.registerAppInfo(JMX_PREFIX, clientId, metrics);
log.debug("Kafka producer started");
} catch (Throwable t) {
// call close methods if internal objects are already constructed this is to prevent resource leak. see KAFKA-2121
close(0, TimeUnit.MILLISECONDS, true);
// now propagate the exception
throw new KafkaException("Failed to construct kafka producer", t);
}
}
成员变量
public class Sender implements Runnable {
private final Logger log;
/* the state of each nodes connection */
// 每个节点连接的状态KafkaClient实例client
private final KafkaClient client;
/* the record accumulator that batches records */
// 批量记录的记录累加器RecordAccumulator实例accumulator
private final RecordAccumulator accumulator;
/* the metadata for the client */
//客户端的元数据Metadata实例
private final Metadata metadata;
/* the flag indicating whether the producer should guarantee the message order on the broker or not. */
//是否顺序
private final boolean guaranteeMessageOrder;
/* the maximum request size to attempt to send to the server */
//试图发送到server端的最大请求大小maxRequestSize
private final int maxRequestSize;
/* the number of acknowledgements to request from the server */
// 从server端获得的请求发送的已确认数量acks
private final short acks;
/* the number of times to retry a failed request before giving up */
// 一个失败请求在被放弃之前的重试次数retries
private final int retries;
/* the clock instance used for getting the time */
// 获取时间的时钟Time实例
private final Time time;
/* true while the sender thread is still running */
// Sender线程运行的标志位,注意修饰符
private volatile boolean running;
/* true when the caller wants to ignore all unsent/inflight messages and force close. */
// 强行停止的标识位
private volatile boolean forceClose;
/* metrics 监控指标 */
private final SenderMetrics sensors;
/* the max time to wait for the server to respond to the request*/
//等到server端响应请求的超时时间
private final int requestTimeoutMs;
/* The max time to wait before retrying a request which has failed */
private final long retryBackoffMs;
/* current request API versions supported by the known brokers */
private final ApiVersions apiVersions;
/* all the state related to transactions, in particular the producer id, producer epoch, and sequence numbers */
private final TransactionManager transactionManager;
// A per-partition queue of batches ordered by creation time for tracking the in-flight batches
//tp 与已经发送尚未收到响应的batch映射关系
private final Map<TopicPartition, List<ProducerBatch>> inFlightBatches;
三 唤醒wakeup
RecordAccumulator.RecordAppendResult的batch满了,唤醒Sender线程。
KafkaProducer.java
private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
....
//队列容器中执行添加数据操作( 重要,跟性能有关),供Sender线程去读取数据,然后发给broker
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
serializedValue, headers, interceptCallback, remainingWaitMs);
//如果满了或者是新创建的,必须满上唤醒sender线程
if (result.batchIsFull || result.newBatchCreated) {
log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
this.sender.wakeup();
}
....
}
整个唤醒过程也很有趣,唤醒sender线程后Sender再唤醒NetworkClient(不是线程,相当于通知客户端开始服务了),client也唤醒Selector,最终唤醒NIO的Selector。
sender.java
/**
* Wake up the selector associated with this send thread
*/
public void wakeup() {
this.client.wakeup();
}
NetworkClient.java
/**
* Interrupt the client if it is blocked waiting on I/O.
*/
@Override
public void wakeup() {
this.selector.wakeup();
}
Selector.java
/**
* Interrupt the nioSelector if it is blocked waiting to do I/O.
*/
@Override
public void wakeup() {
this.nioSelector.wakeup();
}
为什么需要有wakeup动作:因为可能有线程在select等待事件被阻塞了,通过wakeup唤醒那个线程开始工作.
四 核心方法
Sender实现了Runnable接口,并运行在单独的ioThread中。Sender的run()方法调用了重载的run(long),这才是Sender线程的核心方法,这是发送消息的流程图。
主要流程在sendProducerData(),最后是调用了NetworkClient.poll().
Sender不仅承载了RecordAccumulator记录的收集器,也要通知客户端服务:把Accumulator收集的批记录通过客户端发送出去。Sender作为一个线程,是在后台不断运行的,如果线程被停止,可能RecordAccumulator中还有数据没有发送出去,所以要优雅地停止.
/**
* The main run loop for the sender thread
*/
public void run() {
log.debug("Starting Kafka producer I/O thread.");
// main loop, runs until close is called
//一直运行,直到关闭:注意running修饰
while (running) {
try {
run(time.milliseconds());
} catch (Exception e) {
log.error("Uncaught error in kafka producer I/O thread: ", e);
}
}
log.debug("Beginning shutdown of Kafka producer I/O thread, sending remaining records.");
// okay we stopped accepting requests but there may still be
// requests in the accumulator or waiting for acknowledgment,
// wait until these are completed.如果不是强行停掉,则等待缓存处理完
while (!forceClose && (this.accumulator.hasUndrained() || this.client.inFlightRequestCount() > 0)) {
try {
run(time.milliseconds());
} catch (Exception e) {
log.error("Uncaught error in kafka producer I/O thread: ", e);
}
}
if (forceClose) {
// We need to fail all the incomplete batches and wake up the threads waiting on
// the futures.强行停止,忽略未完成的
log.debug("Aborting incomplete batches due to forced shutdown");
this.accumulator.abortIncompleteBatches();
}
try {
this.client.close();
} catch (Exception e) {
log.error("Failed to close network client", e);
}
log.debug("Shutdown of Kafka producer I/O thread has completed.");
}
发送消息的工作统一由Sender来控制。之前的wakeup只是一个通知,实际的工作还是由线程的run方法来控制的。同样调用client.send也只是把请求先放到队列中.
/**
* Run a single iteration of sending
*
* @param now The current POSIX time in milliseconds
*/
void run(long now) {
if (transactionManager != null) {//事务控制
try {
if (transactionManager.shouldResetProducerStateAfterResolvingSequences())
// Check if the previous run expired batches which requires a reset of the producer state.
transactionManager.resetProducerId();
if (!transactionManager.isTransactional()) {
// this is an idempotent producer, so make sure we have a producer id
maybeWaitForProducerId();
} else if (transactionManager.hasUnresolvedSequences() && !transactionManager.hasFatalError()) {
transactionManager.transitionToFatalError(
new KafkaException("The client hasn't received acknowledgment for " +
"some previously sent messages and can no longer retry them. It isn't safe to continue."));
} else if (transactionManager.hasInFlightTransactionalRequest() || maybeSendTransactionalRequest(now)) {
// as long as there are outstanding transactional requests, we simply wait for them to return
client.poll(retryBackoffMs, now);
return;
}
// do not continue sending if the transaction manager is in a failed state or if there
// is no producer id (for the idempotent case).
if (transactionManager.hasFatalError() || !transactionManager.hasProducerId()) {
RuntimeException lastError = transactionManager.lastError();
if (lastError != null)
maybeAbortBatches(lastError);
client.poll(retryBackoffMs, now);
return;
} else if (transactionManager.hasAbortableError()) {
accumulator.abortUndrainedBatches(transactionManager.lastError());
}
} catch (AuthenticationException e) {
// This is already logged as error, but propagated here to perform any clean ups.
log.trace("Authentication exception while processing transactional request: {}", e);
transactionManager.authenticationFailed(e);
}
}
long pollTimeout = sendProducerData(now);
client.poll(pollTimeout, now);
}
其主要处理逻辑为:
1、首先进入一个while主循环,当标志位running为true时一直循环,直到close被调用: 调用带参数的run(long now)方法,处理消息的发送;
2、当close被调用时,running被设置为false,while主循环退出:
2.1、如果不是强制关闭,且消息累加器accumulator尚有消息未发送,或者客户端client尚有正在处理(in-flight)的请求,进入另外一个while循环,调用带参数的run(long now)方法,处理尚未发送完的消息的发送;
2.2、如果是强制关闭,调用消息累加器accumulator的abortIncompleteBatches(),放弃未处理完的请求;
2.3、关闭客户端。
而run(long now) 里面,如果不考虑事务的话,只分为sendProducerData和poll, 所以说kafka 2.0对比1.0对于事务的处理还是很复杂的。代码复杂程度提高了,单独整理吧。
sendProducerData这里主要是调用了ready,drain.因为batches已经是按照节点划分好的了,所以创建的客户端请求也是按照节点划分好了。不过虽然produceRequest方法中的batches是某个节点所有的batches,但是客户端请求面向的还是Partition级别!所以要对batches重新按照Partition的粒度整理。不过注意的是一个节点只有一个ClientRequest,它本身并不关心包含了多少个Partition,你只要需要发送的对象包装成RequestSend即可。
//发送数据,核心方法
private long sendProducerData(long now) {
Cluster cluster = metadata.fetch();
// get the list of partitions with data ready to send
//获取准备号发送的数据,包括:队列长度大于1、第一个batch满了、没有缓存buffer空间了、正在关闭、在调用flush都会刷新待发送数据。
RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);
// if there are any partitions whose leaders are not known yet, force metadata update
// 存在topic未知leader的副本,则需要更新元数据
if (!result.unknownLeaderTopics.isEmpty()) {
// The set of topics with unknown leader contains topics with leader election pending as well as
// topics which may have expired. Add the topic again to metadata to ensure it is included
// and request metadata update, since there are messages to send to the topic.
for (String topic : result.unknownLeaderTopics)
this.metadata.add(topic);
log.debug("Requesting metadata update due to unknown leader topics from the batched records: {}",
result.unknownLeaderTopics);
this.metadata.requestUpdate();
}
// remove any nodes we aren't ready to send to
// 移除没有ready 的 node
Iterator<Node> iter = result.readyNodes.iterator();
long notReadyTimeout = Long.MAX_VALUE;
while (iter.hasNext()) {
Node node = iter.next();
if (!this.client.ready(node, now)) {
iter.remove();
notReadyTimeout = Math.min(notReadyTimeout, this.client.pollDelayMs(node, now));
}
}
// create produce requests
// 获取待发送消息的集合。drain主要是转换,从tp.batch-->node.batch,用于发送给服务器
Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes, this.maxRequestSize, now);
addToInflightBatches(batches);
//保证一个 tp 只有一个 RecordBatch 在发送,保证有序性
if (guaranteeMessageOrder) {
// Mute all the partitions drained
// 如果是消息保序的,则将drain得到的batches对应的tp放入mute队列中
for (List<ProducerBatch> batchList : batches.values()) {
for (ProducerBatch batch : batchList)
this.accumulator.mutePartition(batch.topicPartition);
}
}
accumulator.resetNextBatchExpiryTime();
//获取超时的数据
List<ProducerBatch> expiredInflightBatches = getExpiredInflightBatches(now);
//已超时,调用expiredBatches处理
List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now);
expiredBatches.addAll(expiredInflightBatches);
// Reset the producer id if an expired batch has previously been sent to the broker. Also update the metrics
// for expired batches. see the documentation of @TransactionState.resetProducerId to understand why
// we need to reset the producer id here. 重置超时的batch
if (!expiredBatches.isEmpty())
log.trace("Expired {} batches in accumulator", expiredBatches.size());
for (ProducerBatch expiredBatch : expiredBatches) {
String errorMessage = "Expiring " + expiredBatch.recordCount + " record(s) for " + expiredBatch.topicPartition
+ ":" + (now - expiredBatch.createdMs) + " ms has passed since batch creation";
failBatch(expiredBatch, -1, NO_TIMESTAMP, new TimeoutException(errorMessage), false);
if (transactionManager != null && expiredBatch.inRetry()) {
// This ensures that no new batches are drained until the current in flight batches are fully resolved.
transactionManager.markSequenceUnresolved(expiredBatch.topicPartition);
}
}
sensors.updateProduceRequestMetrics(batches);
// If we have any nodes that are ready to send + have sendable data, poll with 0 timeout so this can immediately
// loop and try sending more data. Otherwise, the timeout will be the smaller value between next batch expiry
// time, and the delay time for checking data availability. Note that the nodes may have data that isn't yet
// sendable due to lingering, backing off, etc. This specifically does not include nodes with sendable data
// that aren't ready to send since they would cause busy looping.
long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
pollTimeout = Math.min(pollTimeout, this.accumulator.nextExpiryTimeMs() - now);
pollTimeout = Math.max(pollTimeout, 0);
if (!result.readyNodes.isEmpty()) {
log.trace("Nodes with data ready to send: {}", result.readyNodes);
// if some partitions are already ready to be sent, the select time would be 0;
// otherwise if some partition already has some data accumulated but not ready yet,
// the select time will be the time difference between now and its linger expiry time;
// otherwise the select time will be the time difference between now and the metadata expiry time;
pollTimeout = 0;
}
sendProduceRequests(batches, now);
return pollTimeout;
}
1. 从Metadata获取Kafka集群数据
2. 调用RecordAccumulator.ready()方法,根据RecordAccumulator的缓存情况,选出可以向哪些Node节点发送消息,返回ReadyCheckResult对象。
3. 如果ReadyCheckResult中标识有unknownLeaderTopics,则调用Metadata的requestUpdate方法,标记需要更新Kafka的集群信息。
4. 针对ReadyCheckResult中ready node集合,循环调用NetworkClient.ready()方法,目的是检查网络方面是否符合发送消息的条件,不符合条件的Node将从readyNodes中移除。NetworkClient类后面单独整理。
5. 通过以上步骤处理后的ready node集合,调用RecordAccumulator.drain()方法,获取待发送的消息集合。
6. 调用RecorAccumulator的drain()方法,将队列记录收集器中的记录转变为tp.batch-->node.batch。便于发送给服务器。
6. getExpiredInflightBatches()方法处理已发送未收到响应的消息。代码逻辑是,遍历RecordAccumulator,调用RecordAccumulator.getDeliveryTimeoutMs()方法获取发送时间和当前时间,判断已经超时的消息。接着调用expiredBatches(),遍历ProducerBatch,查询出已超时的消息,如果已超时,将所有超时的消息添加到expiredBatches中,再调用failBatch()方法,调用ProdcuerBatch的done()方法释放空间。
7. 调用Sender.sendProduceRequest()方法将待发送消息封装成ClientRequest,
8.调用NetwoekClient.send()将ClientRequestx写入KafkaChannel的send字段。
9.调用NetworkClient.poll()方法,将KafkaChannel.send字段中保存的ClientRequest发送出去,并且还会处理服务端发回的响应、处理超时的请求、调用用户自定义的CallBack。
后面会逐步展开介绍。
4.1 ClientRequest 及创建请求
ClientRequest是客户端的请求,这个请求会被发送到服务器上,所以封装了requestBuilder,通过requetBuilder给不同类型的请求设置不同的请求内容.
public final class ClientRequest {
private final String destination;
//ClientRequest中通过requetBuilder给不同类型的请求设置不同的请求内容
private final AbstractRequest.Builder<?> requestBuilder;
private final int correlationId;
private final String clientId;
private final long createdTimeMs;
private final boolean expectResponse;
private final int requestTimeoutMs;
private final RequestCompletionHandler callback;
...
}
ClientResponse 是客户端的响应,包含Callback是为了响应回调。
/**
* A response from the server. Contains both the body of the response as well as the correlated request
* metadata that was originally sent.
* ClientResponse是客户端的响应。onComplete触发回调函数的执行。
*/
public class ClientResponse {
private final RequestHeader requestHeader;
private final RequestCompletionHandler callback;
private final String destination;
private final long receivedTimeMs;
private final long latencyMs;
private final boolean disconnected;
private final UnsupportedVersionException versionMismatch;
private final AuthenticationException authenticationException;
private final AbstractResponse responseBody;
...
}
4.2 sendProduceRequests
Sender.sendProduceRequests()方法的功能是将待发送的消息封装成ClientRequest,不管一个Node对应有多少个ProducerBatch,也不管这些ProducerBatch发给几个分区,为每个Node仅仅生成一个ClientRequest对象。
Sender.java
/**
* Transfer the record batches into a list of produce requests on a per-node basis
* 分别发送每个node对应的batches
*/
private void sendProduceRequests(Map<Integer, List<ProducerBatch>> collated, long now) {
for (Map.Entry<Integer, List<ProducerBatch>> entry : collated.entrySet())
//调用sendProduceRequest()方法,将发往同一个Node的ProducerBatch封装成一个ClientRequest对象。
sendProduceRequest(now, entry.getKey(), acks, requestTimeoutMs, entry.getValue());
}
/**
* Create a produce request from the given record batches
*/
private void sendProduceRequest(long now, int destination, short acks, int timeout, List<ProducerBatch> batches) {
if (batches.isEmpty())
return;
//produceRecordsByPartition和recordsByPartition的value不一样,
Map<TopicPartition, MemoryRecords> produceRecordsByPartition = new HashMap<>(batches.size());
final Map<TopicPartition, ProducerBatch> recordsByPartition = new HashMap<>(batches.size());
// find the minimum magic version used when creating the record sets
byte minUsedMagic = apiVersions.maxUsableProduceMagic();
for (ProducerBatch batch : batches) {
if (batch.magic() < minUsedMagic)
minUsedMagic = batch.magic();
}
//1.将ProducerBatch列表按照partition分类,整理成上述两个集合。
for (ProducerBatch batch : batches) {
TopicPartition tp = batch.topicPartition; // 每个ProducerBatch都有唯一的TopicPartition
MemoryRecords records = batch.records(); //ProducerBatch的records是MemoryRecords,底层是ByteBuffer
// down convert if necessary to the minimum magic used. In general, there can be a delay between the time
// that the producer starts building the batch and the time that we send the request, and we may have
// chosen the message format based on out-dated metadata. In the worst case, we optimistically chose to use
// the new message format, but found that the broker didn't support it, so we need to down-convert on the
// client before sending. This is intended to handle edge cases around cluster upgrades where brokers may
// not all support the same message format version. For example, if a partition migrates from a broker
// which is supporting the new magic version to one which doesn't, then we will need to convert.
if (!records.hasMatchingMagic(minUsedMagic))
records = batch.records().downConvert(minUsedMagic, 0, time).records();
produceRecordsByPartition.put(tp, records);
recordsByPartition.put(tp, batch);
}
String transactionalId = null;
if (transactionManager != null && transactionManager.isTransactional()) {
transactionalId = transactionManager.transactionalId();
}
//构造produce请求,设置回调handler
ProduceRequest.Builder requestBuilder = ProduceRequest.Builder.forMagic(minUsedMagic, acks, timeout,
produceRecordsByPartition, transactionalId);
// 回调函数会作为客户端请求的一个成员变量, 当客户端请求完成后, 会触发回调函数的执行
RequestCompletionHandler callback = new RequestCompletionHandler() {
public void onComplete(ClientResponse response) {
//设置结果回调方法,在handleProduceResponse对服务端返回结果进行处理,
handleProduceResponse(response, recordsByPartition, time.milliseconds());
}
};
String nodeId = Integer.toString(destination);
//创建ClientRequest对象,
ClientRequest clientRequest = client.newClientRequest(nodeId, requestBuilder, now, acks != 0,
requestTimeoutMs, callback);
//放入inFlightRequests,调用selector发送
client.send(clientRequest, now);
log.trace("Sent produce request to {}: {}", nodeId, requestBuilder);
}
主要逻辑:
- 将一个Node对应的ProducerBatch集合,重新整理为Map<TopicPartition, MemoryRecords> produceRecordsByPartition和rMap<TopicPartition, ProducerBatch> recordsByPartition 两个集合。
- 创建ProduceRequest.Builder,其中有效负载就是produceRecordsByPartition中的数据。
- 创建RequestCompletionHandler作为回调对象。
- 将RequestSend对象和RequestCompletionHandler对象封装进ClientRequest对象中,并将其调用NetworkClient.send()发送
4.3 callback
回调函数传给了ClientRequest客户端请求,当客户端NetworkClient真正发生读写后(poll),会产生ClientResponse对象,触发回调函数completeResponses()的执行。因为回调对象RequestCompletionHandler的回调方法onComplete的参数是ClientResponse。NetworkClient.poll是真正发生读写的地方,所以它也会负责生成客户端的响应信息。
NetworkClient.java
@Override
public List<ClientResponse> poll(long timeout, long now) {
if (!abortedSends.isEmpty()) {
// If there are aborted sends because of unsupported version exceptions or disconnects,
// handle them immediately without waiting for Selector#poll.
List<ClientResponse> responses = new ArrayList<>();
handleAbortedSends(responses);
completeResponses(responses);
return responses;
}
// 判断是否需要更新 meta
long metadataTimeout = metadataUpdater.maybeUpdate(now);
try {//调用selector的poll方法
this.selector.poll(Utils.min(timeout, metadataTimeout, defaultRequestTimeoutMs));
} catch (IOException e) {
log.error("Unexpected error during I/O", e);
}
// process completed actions
long updatedNow = this.time.milliseconds();
//真正的读写操作, 会生成responses
List<ClientResponse> responses = new ArrayList<>();
handleCompletedSends(responses, updatedNow);
//处理任何已经完成的接收响应
handleCompletedReceives(responses, updatedNow);
handleDisconnections(responses, updatedNow);
handleConnections();
handleInitiateApiVersionRequests(updatedNow);
handleTimedOutRequests(responses, updatedNow);
//invoke callback
completeResponses(responses);
return responses;
}
private void completeResponses(List<ClientResponse> responses) {
for (ClientResponse response : responses) {
try {//调用response的完成
response.onComplete();
} catch (Exception e) {
log.error("Uncaught error in request completion:", e);
}
}
}
ClientResponse.java
public void onComplete() {
if (callback != null) //回调
callback.onComplete(this);
}
response的callback就调用了Sender.handleProduceResponse(),也是一开始创建请求sendProduceRequest的时候设置回调函数。每个ClientResponse代表的是一个节点的响应,要从中解析出ProduceResponse中所有Partition的PartitionResponse。
Sender.java
/**
* Handle a produce response
*/
private void handleProduceResponse(ClientResponse response, Map<TopicPartition, ProducerBatch> batches, long now) {
RequestHeader requestHeader = response.requestHeader();
long receivedTimeMs = response.receivedTimeMs();
int correlationId = requestHeader.correlationId();
if (response.wasDisconnected()) {//链接断开
log.trace("Cancelled request with header {} due to node {} being disconnected",
requestHeader, response.destination());
for (ProducerBatch batch : batches.values())
completeBatch(batch, new ProduceResponse.PartitionResponse(Errors.NETWORK_EXCEPTION), correlationId, now, 0L);
} else if (response.versionMismatch() != null) {//版本异常
log.warn("Cancelled request {} due to a version mismatch with node {}",
response, response.destination(), response.versionMismatch());
for (ProducerBatch batch : batches.values())
completeBatch(batch, new ProduceResponse.PartitionResponse(Errors.UNSUPPORTED_VERSION), correlationId, now, 0L);
} else {
log.trace("Received produce response from node {} with correlation id {}", response.destination(), correlationId);
// if we have a response, parse it
if (response.hasResponse()) {
ProduceResponse produceResponse = (ProduceResponse) response.responseBody();
for (Map.Entry<TopicPartition, ProduceResponse.PartitionResponse> entry : produceResponse.responses().entrySet()) {
TopicPartition tp = entry.getKey(); // 每一个TopicPartition都对应一个PartitionResponse
ProduceResponse.PartitionResponse partResp = entry.getValue();
ProducerBatch batch = batches.get(tp); // 因为batches中对一个Partition只会有一个ProducerBatch
// 完成这个ProducerBatch,最终调用ProducerBatch.done
completeBatch(batch, partResp, correlationId, now, receivedTimeMs + produceResponse.throttleTimeMs());
}
this.sensors.recordLatency(response.destination(), response.requestLatencyMs());
} else {
// this is the acks = 0 case, just complete all requests
for (ProducerBatch batch : batches.values()) {
completeBatch(batch, new ProduceResponse.PartitionResponse(Errors.NONE), correlationId, now, 0L);
}
}
}
}
Sender.completeBatch()-->ProducerBatch.done()-->ProducerBatch.completeFutureAndFireCallbacks()
ProducerBatch.java
/**
* 客户端组织生成的每一批Batch记录都属于一个Partition,所以每个Batch都要complete(调用RecordBatch.done)。
* 每次Batch中,如果客户端不需要响应,则baseOffset=-1,否则从response中解析出baseOffset用来表示消息的offset。
* 由于RecordBatch记录的是每一条消息,每条消息都有Callback的话,一个Batch里就有和消息数量相等的thunks(callback)
*/
private void completeFutureAndFireCallbacks(long baseOffset, long logAppendTime, RuntimeException exception) {
// 注意 baseOffset为该batch 在 Kafka topic partition 中的位置
// 通过set方法设置batch的响应数据
// Set the future before invoking the callbacks as we rely on its state for the `onCompletion` call
produceFuture.set(baseOffset, logAppendTime, exception);
// execute callbacks (循环执行每个消息的callback回调函数)
for (Thunk thunk : thunks) {
try {
if (exception == null) {//正常处理完成
//获取服务端返回消息的响应数据
RecordMetadata metadata = thunk.future.value();
if (thunk.callback != null)
//调用消息自定义的callback:注意这里第二个参数表示异常。如果为null,则表示请求成功
thunk.callback.onCompletion(metadata, null);
} else {//异常情况
if (thunk.callback != null)//返回异常
thunk.callback.onCompletion(null, exception);
}
} catch (Exception e) {
log.error("Error executing user-provided callback on message for topic-partition '{}'", topicPartition, e);
}
}
//标识整个batch都已经处理完成。
produceFuture.done();
}
看到这里,callback最终回调到producer的。callback,就是一开始通过KafkaProducer发送消息设置的callback。其中RecordMetadata对象作为onCompletion的回调参数。
KafkaProducer.java
/**
* A callback called when producer request is complete. It in turn calls user-supplied callback (if given) and
* notifies producer interceptors about the request completion.
*/
private static class InterceptorCallback<K, V> implements Callback {
private final Callback userCallback;
private final ProducerInterceptors<K, V> interceptors;
private final TopicPartition tp;
private InterceptorCallback(Callback userCallback, ProducerInterceptors<K, V> interceptors, TopicPartition tp) {
this.userCallback = userCallback;
this.interceptors = interceptors;
this.tp = tp;
}
//发送完消息回调
public void onCompletion(RecordMetadata metadata, Exception exception) {
metadata = metadata != null ? metadata : new RecordMetadata(tp, -1, -1, RecordBatch.NO_TIMESTAMP, Long.valueOf(-1L), -1, -1);
this.interceptors.onAcknowledgement(metadata, exception);
if (this.userCallback != null)
this.userCallback.onCompletion(metadata, exception);
}
}
后面会继续整理NIO部分。总是感觉kafka封装的层次太深了,类很多,属性很多。一开始看特容易晕。
参考:
《Apache kafka 源码剖析》2.4
http://zqhxuyuan.github.io/2016/01/06/2016-01-06-Kafka_Producer/