kafka java消费者消息拉取

最新推荐文章于 2024-09-19 07:00:00 发布

tydhot

最新推荐文章于 2024-09-19 07:00:00 发布

阅读量4.8k

点赞数

分类专栏： kafka 文章标签： kafka

本文链接：https://blog.csdn.net/weixin_40318210/article/details/94509124

版权

kafka 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

版本2.4.0

Kafka的客户端消费者在启动的过程中会通过ensureActiveGroup()方法来确保自己是可用的消费者，在这个方法中，会向kafka的broker集群发送join请求，在join请求的response中可以得到该生产者所订阅的topic中被分配得到的分区信息。而接下来的消息拉取将会只请求此处分配得到的topic分区。此时，当前获得的topic分区的消费偏移量还是未知的，在正式拉取消息之前需要构造fetchOffset请求得到具体的偏移量位置以便消费。

private RequestFuture<Map<TopicPartition, OffsetAndMetadata>> sendOffsetFetchRequest(Set<TopicPartition> partitions) {
    Node coordinator = checkAndGetCoordinator();
    if (coordinator == null)
        return RequestFuture.coordinatorNotAvailable();

    log.debug("Fetching committed offsets for partitions: {}", partitions);
    // construct the request
    OffsetFetchRequest.Builder requestBuilder = new OffsetFetchRequest.Builder(this.groupId,
            new ArrayList<>(partitions));

    // send the request with a callback
    return client.send(coordinator, requestBuilder)
            .compose(new OffsetFetchResponseHandler());
}

每次当kafka的消费者需要通过poll()方法拉取消息的时候，将会通过sendFetches()方法来试图拉取消息。

在准备发送fetch请求拉取消息的时候，首先需要通过prepareFetchRequests()方法来准备fetch请求。

已经完成拉取而没有实际处理的topic分区暂时没有必要再次拉取消息，而过滤掉以上情况的broker分配给该消费者的topic分区，将会用来做发送fetch请求的准备。

private List<TopicPartition> fetchablePartitions() {
    Set<TopicPartition> exclude = new HashSet<>();
    if (nextInLineRecords != null && !nextInLineRecords.isFetched) {
        exclude.add(nextInLineRecords.partition);
    }
    for (CompletedFetch completedFetch : completedFetches) {
        exclude.add(completedFetch.partition);
    }
    return subscriptions.fetchablePartitions(tp -> !exclude.contains(tp));
}

而所要发送的topic分区将会根据其leader副本所在的broker节点构造fetch请求准备发送拉取消息。

for (TopicPartition partition : fetchablePartitions()) {
    // Use the preferred read replica if set, or the position's leader
    SubscriptionState.FetchPosition position = this.subscriptions.position(partition);
    Node node = selectReadReplica(partition, position.currentLeader.leader, currentTimeMs);

    if (node == null || node.isEmpty()) {
        metadata.requestUpdate();
    } else if (client.isUnavailable(node)) {
        client.maybeThrowAuthFailure(node);

        // If we try to send during the reconnect blackout window, then the request is just
        // going to be failed anyway before being sent, so skip the send for now
        log.trace("Skipping fetch for partition {} because node {} is awaiting reconnect backoff", partition, node);

    } else if (this.nodesWithPendingFetchRequests.contains(node.id())) {
        log.trace("Skipping fetch for partition {} because previous request to {} has not been processed", partition, node);
    } else {
        // if there is a leader and no in-flight requests, issue a new fetch
        FetchSessionHandler.Builder builder = fetchable.get(node);
        if (builder == null) {
            int id = node.id();
            FetchSessionHandler handler = sessionHandler(id);
            if (handler == null) {
                handler = new FetchSessionHandler(logContext, id);
                sessionHandlers.put(id, handler);
            }
            builder = handler.newBuilder();
            fetchable.put(node, builder);
        }

        builder.add(partition, new FetchRequest.PartitionData(position.offset,
                FetchRequest.INVALID_LOG_START_OFFSET, this.fetchSize, position.currentLeader.epoch));

        log.debug("Added {} fetch request for partition {} at position {} to node {}", isolationLevel,
            partition, position, node);
    }
}

可以看到，发送到同一个broker的fetch请求将会被集中发送，Kafka消费者客户端将会以异步的方式发送这些fetch请求，在其请求返回的时候进行处理。

long fetchOffset = requestData.fetchOffset;
FetchResponse.PartitionData<Records> fetchData = entry.getValue();

log.debug("Fetch {} at offset {} for partition {} returned fetch data {}",
        isolationLevel, fetchOffset, partition, fetchData);
completedFetches.add(new CompletedFetch(partition, fetchOffset, fetchData, metricAggregator,
        resp.requestHeader().apiVersion()));

异步接收的fetch请求将会被组装成CompletedFetch缓存在completedFetches集合中等待解析。

而后，将会通过fetchRecords()方法中，将completedFetches中的拉取消息的请求从缓存中取出并解析得到所需要的消息。

while (recordsRemaining > 0) {
    if (nextInLineRecords == null || nextInLineRecords.isFetched) {
        CompletedFetch completedFetch = completedFetches.peek();
        if (completedFetch == null) break;

        try {
            nextInLineRecords = parseCompletedFetch(completedFetch);
        } catch (Exception e) {
            // Remove a completedFetch upon a parse with exception if (1) it contains no records, and
            // (2) there are no fetched records with actual content preceding this exception.
            // The first condition ensures that the completedFetches is not stuck with the same completedFetch
            // in cases such as the TopicAuthorizationException, and the second condition ensures that no
            // potential data loss due to an exception in a following record.
            FetchResponse.PartitionData partition = completedFetch.partitionData;
            if (fetched.isEmpty() && (partition.records == null || partition.records.sizeInBytes() == 0)) {
                completedFetches.poll();
            }
            throw e;
        }
        completedFetches.poll();
    } else {
        List<ConsumerRecord<K, V>> records = fetchRecords(nextInLineRecords, recordsRemaining);
        TopicPartition partition = nextInLineRecords.partition;
        if (!records.isEmpty()) {
            List<ConsumerRecord<K, V>> currentRecords = fetched.get(partition);
            if (currentRecords == null) {
                fetched.put(partition, records);
            } else {
                // this case shouldn't usually happen because we only send one fetch at a time per partition,
                // but it might conceivably happen in some rare cases (such as partition leader changes).
                // we have to copy to a new list because the old one may be immutable
                List<ConsumerRecord<K, V>> newRecords = new ArrayList<>(records.size() + currentRecords.size());
                newRecords.addAll(currentRecords);
                newRecords.addAll(records);
                fetched.put(partition, newRecords);
            }
            recordsRemaining -= records.size();
        }
    }
}

当准备拉取的消息数量小于最大拉取数量或者completedFetches中没有已经缓存的fetch response，则会结束消息的拉取。

在这里nextInLineRecords将会缓存下一个拉取得到的消息集合。

首先通过parseCompletedFetch()方法解析completedFetches顶部的fetch response，里面主要确保得到的fetchOffset与自己之前预测的一致，并更新hw等参数到自己的缓存中，在完成上述操作后，将这一fetch结果从completedFetches中取出，并准备将其放入nextInLineRecords从中获取所得到的消息正文，并更新下一次所想消费的偏移量。而此处得到的结果也正是kafka消费者所需要得到的消息。