Kafka之Sender&NetworkClient-CSDN博客

本文链接：https://blog.csdn.net/palkia1998/article/details/141905722

这篇文章主要介绍 Kafka 中的 Sender 和 NetworkClient。当生产者发送消息时，RecordAccumulator 对消息进行缓存分组，而 Sender 线程则会对 RecordAccumulator 中缓存的消息进行发送。Sender 线程内部首先会对消息进行发送前的准备，随后通过调用 NetowrkClient 进行网络操作。NetworkClient 负责发送消息请求并对收到的消息响应进行处理。NetworkClient 内部封装了 Selector，负责进行具体的网络 IO 操作。
在这里插入图片描述

1. Sender 线程

Sender 线程的 run() 方法，会循环调用 runOnce() 来发送消息。

@Override
public void run() {
    log.debug("Starting Kafka producer I/O thread.");

    if (transactionManager != null)
        transactionManager.setPoisonStateOnInvalidTransition(true);

    // main loop, runs until close is called
    while (running) {
        try {
            // **循环调用 runOnce()**
            runOnce();
        } catch (Exception e) {
            log.error("Uncaught error in kafka producer I/O thread: ", e);
        }
    }

    log.debug("Beginning shutdown of Kafka producer I/O thread, sending remaining records.");

    // 停止接受发送请求，继续处理完当前堆积的消息
    while (!forceClose && ((this.accumulator.hasUndrained() || this.client.inFlightRequestCount() > 0) || hasPendingTransactionalRequests())) {
        try {
            runOnce();
        } catch (Exception e) {
            log.error("Uncaught error in kafka producer I/O thread: ", e);
        }
    }

    // Abort the transaction if any commit or abort didn't go through the transaction manager's queue
    while (!forceClose && transactionManager != null && transactionManager.hasOngoingTransaction()) {
        if (!transactionManager.isCompleting()) {
            log.info("Aborting incomplete transaction due to shutdown");
            try {
                // It is possible for the transaction manager to throw errors when aborting. Catch these
                // so as not to interfere with the rest of the shutdown logic.
                transactionManager.beginAbort();
            } catch (Exception e) {
                log.error("Error in kafka producer I/O thread while aborting transaction when during closing: ", e);
                // Force close in case the transactionManager is in error states.
                forceClose = true;
            }
        }
        try {
            runOnce();
        } catch (Exception e) {
            log.error("Uncaught error in kafka producer I/O thread: ", e);
        }
    }

    // 强制关闭，放弃所有堆积的请求
    if (forceClose) {
        // We need to fail all the incomplete transactional requests and batches and wake up the threads waiting on
        // the futures.
        if (transactionManager != null) {
            log.debug("Aborting incomplete transactional requests due to forced shutdown");
            transactionManager.close();
        }
        log.debug("Aborting incomplete batches due to forced shutdown");
        this.accumulator.abortIncompleteBatches();
    }
    try {
        // 关闭 NetworkClient
        this.client.close();
    } catch (Exception e) {
        log.error("Failed to close network client", e);
    }

    log.debug("Shutdown of Kafka producer I/O thread has completed.");
}

runOnce() 方法负责实际发送消息。runOnce() 中包含了消息的事务操作，而具体进行消息发送的部分如下：

void runOnce() {


    long currentTimeMs = time.milliseconds();
    // sendProducerData() 内部将消息 batch 组装成 ClientRequest，
    // **其内部调用了 NetworkClient.send() 对将要发送的消息进行准备**
    long pollTimeout = sendProducerData(currentTimeMs);
    // **调用 NetworkClient.poll() 发送消息并对响应进行处理**
    client.poll(pollTimeout, currentTimeMs);
}

Sender 线程会通过 runOnce() 方法不断调用 sendProducerData() 来发送缓存在 RecordAccumulator 中的消息。sendProducerData() 的具体实现如下：

private long sendProducerData(long now) {
    // 获取元数据
    MetadataSnapshot metadataSnapshot = metadata.fetchMetadataSnapshot();
    // 获取已经可以发送的分区
    RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(metadataSnapshot, now);

    // 如果有 leader 节点未知的分区，则强制更新元数据
    if (!result.unknownLeaderTopics.isEmpty()) {
        // The set of topics with unknown leader contains topics with leader election pending as well as
        // topics which may have expired. Add the topic again to metadata to ensure it is included
        // and request metadata update, since there are messages to send to the topic.
        for (String topic : result.unknownLeaderTopics)
            this.metadata.add(topic, now);

        log.debug("Requesting metadata update due to unknown leader topics from the batched records: {}",
            result.unknownLeaderTopics);
        this.metadata.requestUpdate(false);
    }

    // 在结果中继续过滤，通过检查与节点的连接，移除没有准备好的节点
    Iterator<Node> iter = result.readyNodes.iterator();
    long notReadyTimeout = Long.MAX_VALUE;
    while (iter.hasNext()) {
        Node node = iter.next();
        if (!this.client.ready(node, now)) {
            // Update just the readyTimeMs of the latency stats, so that it moves forward
            // every time the batch is ready (then the difference between readyTimeMs and
            // drainTimeMs would represent how long data is waiting for the node).
            this.accumulator.updateNodeLatencyStats(node.id(), now, false);
            iter.remove();
            notReadyTimeout = Math.min(notReadyTimeout, this.client.pollDelayMs(node, now));
        } else {
            // Update both readyTimeMs and drainTimeMs, this would "reset" the node
            // latency.
            this.accumulator.updateNodeLatencyStats(node.id(), now, true);
        }
    }

    // 从 RecordAccumulator 取出数据，按照 node 节点和 ProducerBatch 进行映射，交由网络层发送给对应的节点
    Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(metadataSnapshot, result.readyNodes, this.maxRequestSize, now);
    addToInflightBatches(batches);
    if (guaranteeMessageOrder) {
        // Mute all the partitions drained
        for (List<ProducerBatch> batchList : batches.values()) {
            for (ProducerBatch batch : batchList)
                this.accumulator.mutePartition(batch.topicPartition);
        }
    }

    // 处理已经过期的消息
    accumulator.resetNextBatchExpiryTime();
    List<ProducerBatch> expiredInflightBatches = getExpiredInflightBatches(now);
    List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now);
    expiredBatches.addAll(expiredInflightBatches);

    if (!expiredBatches.isEmpty())
        log.trace("Expired {} batches in accumulator", expiredBatches.size());
    for (ProducerBatch expiredBatch : expiredBatches) {
        String errorMessage = "Expiring " + expiredBatch.recordCount + " record(s) for " + expiredBatch.topicPartition
            + ":" + (now - expiredBatch.createdMs) + " ms has passed since batch creation";
        failBatch(expiredBatch, new TimeoutException(errorMessage), false);
        if (transactionManager != null && expiredBatch.inRetry()) {
            // This ensures that no new batches are drained until the current in flight batches are fully resolved.
            transactionManager.markSequenceUnresolved(expiredBatch);
        }
    }
    
    // 更新 metrics
    sensors.updateProduceRequestMetrics(batches);

    // 将 pollTimeout 设定为 下一次检查节点ready的延迟时间 和 下一次batch过期时间 中较小的值
    long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
    pollTimeout = Math.min(pollTimeout, this.accumulator.nextExpiryTimeMs() - now);
    pollTimeout = Math.max(pollTimeout, 0);
    // 如果有准备好的节点，将 pollTimeout 设定为 0，立即发送消息
    if (!result.readyNodes.isEmpty()) {
        log.trace("Nodes with data ready to send: {}", result.readyNodes);
        // if some partitions are already ready to be sent, the select time would be 0;
        // otherwise if some partition already has some data accumulated but not ready yet,
        // the select time will be the time difference between now and its linger expiry time;
        // otherwise the select time will be the time difference between now and the metadata expiry time;
        pollTimeout = 0;
    }

    // 调用 sendProduceRequests()，通过 NetworkClient 准备发送消息
    sendProduceRequests(batches, now);
    return pollTimeout;
}

按照 node 节点和 ProducerBatch 的映射，遍历每个目标节点准备发送请求。

private void sendProduceRequests(Map<Integer, List<ProducerBatch>> collated, long now) {
    for (Map.Entry<Integer, List<ProducerBatch>> entry : collated.entrySet())
        sendProduceRequest(now, entry.getKey(), acks, requestTimeoutMs, entry.getValue());
}


private void sendProduceRequest(long now, int destination, short acks, int timeout, List<ProducerBatch> batches) {
    if (batches.isEmpty())
        return;

    final Map<TopicPartition, ProducerBatch> recordsByPartition = new HashMap<>(batches.size());

    // 找到最小的消息格式版本
    byte minUsedMagic = apiVersions.maxUsableProduceMagic();
    for (ProducerBatch batch : batches) {
        if (batch.magic() < minUsedMagic)
            minUsedMagic = batch.magic();
    }
    ProduceRequestData.TopicProduceDataCollection tpd = new ProduceRequestData.TopicProduceDataCollection();
    for (ProducerBatch batch : batches) {
        TopicPartition tp = batch.topicPartition;
        MemoryRecords records = batch.records();

        // 为了保证向下兼容，将消息转化为最小的消息格式版本
        if (!records.hasMatchingMagic(minUsedMagic))
            records = batch.records().downConvert(minUsedMagic, 0, time).records();
        
        // 将消息按照 Topic 和 Partition 填充到 TopicProduceDataCollection
        ProduceRequestData.TopicProduceData tpData = tpd.find(tp.topic());
        if (tpData == null) {
            tpData = new ProduceRequestData.TopicProduceData().setName(tp.topic());
            tpd.add(tpData);
        }
        tpData.partitionData().add(new ProduceRequestData.PartitionProduceData()
                .setIndex(tp.partition())
                .setRecords(records));
        recordsByPartition.put(tp, batch);
    }

    String transactionalId = null;
    if (transactionManager != null && transactionManager.isTransactional()) {
        transactionalId = transactionManager.transactionalId();
    }

    // 构建 ProduceRequest.Builder，其中设定了 TopicData 
    ProduceRequest.Builder requestBuilder = ProduceRequest.forMagic(minUsedMagic,
            new ProduceRequestData()
                    .setAcks(acks)
                    .setTimeoutMs(timeout)
                    .setTransactionalId(transactionalId)
                    .setTopicData(tpd));

    // 设定回调函数，在回调时调用 handleProduceResponse()
    RequestCompletionHandler callback = response -> handleProduceResponse(response, recordsByPartition, time.milliseconds());

    String nodeId = Integer.toString(destination);
    // 构建 ClientRequest，其中包含 ProduceRequest.Builder
    // acks != 0 代表需要服务端响应
    ClientRequest clientRequest = client.newClientRequest(nodeId, requestBuilder, now, acks != 0,
            requestTimeoutMs, callback);
    // **调用 client.send() 准备发送请求**
    client.send(clientRequest, now);
    log.trace("Sent produce request to {}: {}", nodeId, requestBuilder);
}

2. NetworkClient

NetworkClient 负责处理网络操作。Sender 线程中的 sendProducerData() 内部调用了 NetworkClient.send() 方法，NetworkClient.send() 方法又调用了 doSend() 方法将消息加入 inFlightRequests 队列，并调用底层的 Selector.send() 来准备发送消息。

public class NetworkClient implements KafkaClient {
    // NetworkClinet 状态
    private enum State {
        ACTIVE,
        CLOSING,
        CLOSED
    }

    private final Logger log;
    // 内部封装的 selector，用来执行网络 IO 操作
    private final Selectable selector;
    // 元数据更新类
    private final MetadataUpdater metadataUpdater;
    private final Random randOffset;
    // 集群所有节点的连接状态
    private final ClusterConnectionStates connectionStates;
    // InFlightRequests，正在发送或等待响应的请求队列
    private final InFlightRequests inFlightRequests;
    private final int socketSendBuffer;
    private final int socketReceiveBuffer;
    // 客户端 id
    private final String clientId;
    private int correlation;
    // 发送单个请求的默认超时时间
    private final int defaultRequestTimeoutMs;
    // 重新连接节点的退避时间
    private final long reconnectBackoffMs;
    private final MetadataRecoveryStrategy metadataRecoveryStrategy;
    private final Time time;
    // 第一次连接到一个节点时设置为 true，以获取节点的 Api version
    private final boolean discoverBrokerVersions;
    // 节点的 Api versions 集合
    private final ApiVersions apiVersions;
    // 需要发送的 Api version 请求的集合
    private final Map<String, ApiVersionsRequest.Builder> nodesNeedingApiVersionsFetch = new HashMap<>();
    // 取消发送的请求列表
    private final List<ClientResponse> abortedSends = new LinkedList<>();
    private final Sensor throttleTimeSensor;
    private final AtomicReference<State> state;
    private final TelemetrySender telemetrySender;
}

send() 方法调用 doSend() 方法发送 ClientRequest。

@Override
public void send(ClientRequest request, long now) {
    // **调用 doSend() 方法发送 ClientRequest**
    doSend(request, false, now);
}

doSend() 方法主要进行发送前的准备，包括检查目标节点的连接状态和 ApiVersion 等，其内部又调用了另一个 doSend() 方法。

private void doSend(ClientRequest clientRequest, boolean isInternalRequest, long now) {
    // 确保 NetworkClient 的状态是 Active 
    ensureActive();
    // 获取发送消息的目标节点的 nodeId
    String nodeId = clientRequest.destination();
    // 如果是外部请求，检查是否可以继续发送
    // 1. 与目标节点的 connectionState 是否 ready，
    // 2. Selector 的 channel 是否 ready，
    // 3. inFlightRequests 队列是否达到上限
    // 而对于内部请求的检查在其他代码中已经涵盖，因此只检查外部请求
    if (!isInternalRequest) {
        if (!canSendRequest(nodeId, now))
            throw new IllegalStateException("Attempt to send a request to node " + nodeId + " which is not ready.");
    }

    // ClientRequest 中持有的 ProduceRequest.Builder
    AbstractRequest.Builder<?> builder = clientRequest.requestBuilder();
    // 获取目标节点的 ApiVersion
    try {
        NodeApiVersions versionInfo = apiVersions.get(nodeId);
        short version;
        // Note: if versionInfo is null, we have no server version information. This would be
        // the case when sending the initial ApiVersionRequest which fetches the version
        // information itself.  It is also the case when discoverBrokerVersions is set to false.
        if (versionInfo == null) {
            version = builder.latestAllowedVersion();
            if (discoverBrokerVersions && log.isTraceEnabled())
                log.trace("No version information found when sending {} with correlation id {} to node {}. " +
                        "Assuming version {}.", clientRequest.apiKey(), clientRequest.correlationId(), nodeId, version);
        } else {
            version = versionInfo.latestUsableVersion(clientRequest.apiKey(), builder.oldestAllowedVersion(),
                    builder.latestAllowedVersion());
        }

        // **调用另一个 doSend() 方法来发送请求**，这里有可能抛出 UnsupportedVersionException
        // builder.build() 构建 ProduceRequest
        doSend(clientRequest, isInternalRequest, now, builder.build(version));
    } catch (UnsupportedVersionException unsupportedVersionException) {
        // version 不支持，返回失败响应，并根据请求类型进行对应处理
        log.debug("Version mismatch when attempting to send {} with correlation id {} to {}", builder,
                clientRequest.correlationId(), clientRequest.destination(), unsupportedVersionException);
        ClientResponse clientResponse = new ClientResponse(clientRequest.makeHeader(builder.latestAllowedVersion()),
                clientRequest.callback(), clientRequest.destination(), now, now,
                false, unsupportedVersionException, null, null);

        if (!isInternalRequest)
            abortedSends.add(clientResponse);
        else if (clientRequest.apiKey() == ApiKeys.METADATA)
            metadataUpdater.handleFailedRequest(now, Optional.of(unsupportedVersionException));
        else if (isTelemetryApi(clientRequest.apiKey()) && telemetrySender != null)
            telemetrySender.handleFailedRequest(clientRequest.apiKey(), unsupportedVersionException);
    }
}

doSend() 方法内部将 request 封装成了 InFlightRequest 并加入 inFlightRequests 队列，然后调用 selector.send() 来对发送消息进行准备。

private void doSend(ClientRequest clientRequest, boolean isInternalRequest, long now, AbstractRequest request) {
    // 获取目标节点 Id
    String destination = clientRequest.destination();
    // 生成请求头
    RequestHeader header = clientRequest.makeHeader(request.version());
    if (log.isDebugEnabled()) {
        log.debug("Sending {} request with header {} and timeout {} to node {}: {}",
            clientRequest.apiKey(), header, clientRequest.requestTimeoutMs(), destination, request);
    }
    // 由 request 构建 Send 对象
    // request 对象的类为 AbstractRequest，这里实际由 ProduceRequest 实现
    // Send 对象包含请求的 header 和 ProduceRequest 中的 data 
    Send send = request.toSend(header);
    // 构建 inFlightRequest
    InFlightRequest inFlightRequest = new InFlightRequest(
            clientRequest,
            header,
            isInternalRequest,
            request,
            send,
            now);
    // 将请求加入 inFlightRequests 队列
    this.inFlightRequests.add(inFlightRequest);
    // 调用 selector.send() 准备发送请求
    selector.send(new NetworkSend(clientRequest.destination(), send));
}

上文的 NetworkClient.send() 方法对消息发送进行了准备，而 NetworkClient.poll() 方法负责实际对消息进行发送，并且当收到消息的响应时对消息响应进行处理。

@Override
public List<ClientResponse> poll(long timeout, long now) {
    // 确保 NetworkClient 的状态是 Active 
    ensureActive();

    // 如果有由于版本不支持或者连接失败而放弃发送的请求，则直接返回响应
    if (!abortedSends.isEmpty()) {
        List<ClientResponse> responses = new ArrayList<>();
        // 将放弃发送的请求加入 responses
        handleAbortedSends(responses);
        // 完成响应，调用回调函数
        completeResponses(responses);
        return responses;
    }

    long metadataTimeout = metadataUpdater.maybeUpdate(now);
    long telemetryTimeout = telemetrySender != null ? telemetrySender.maybeUpdate(now) : Integer.MAX_VALUE;
    try {
        // **调用 selector.poll() 进行网络 IO 操作**
        this.selector.poll(Utils.min(timeout, metadataTimeout, telemetryTimeout, defaultRequestTimeoutMs));
    } catch (IOException e) {
        log.error("Unexpected error during I/O", e);
    }

    // process completed actions
    long updatedNow = this.time.milliseconds();
    List<ClientResponse> responses = new ArrayList<>();
    
    // 处理完成发送的请求
    // 1. 调用 selector.completedSends()
    // 2. 如果不需要响应，则从 inFlightRequests 队列中移除请求，并且返回成功响应 
    handleCompletedSends(responses, updatedNow);

    // 处理成功接收到的响应
    // 1. 调用 selector.completedReceives()
    // 2. 从 inFlightRequests 队列中移除请求
    // 3. 解析响应，并根据响应类型(元数据响应，ApiVersion响应，消息响应......)进行处理
    handleCompletedReceives(responses, updatedNow);

    // 处理断开连接的节点
    handleDisconnections(responses, updatedNow);

    // 处理新建立的连接
    handleConnections();

    // 处理 ApiVersion 请求
    handleInitiateApiVersionRequests(updatedNow);

    // 处理连接超时的节点
    handleTimedOutConnections(responses, updatedNow);

    // 处理 inFlightRequests 队列中超时的请求，
    // 并且关闭相应节点的连接，将节点也视为 disconnection
    handleTimedOutRequests(responses, updatedNow);

    // 完成响应,调用回调函数
    completeResponses(responses);

    return responses;
}