kafka源码---消费者(3)

二,offset操作

2.1 提交offset

 在消费者消费过程,以及rebalance操作之前,抖要提交一次offset记录consumer当前的消费位置。提交offset的功能也由ConsumerCoordinator实现

我们在第一节中讲过,Consumer..Client的SubscriptionState字段,使用TopicPartitionState记录每个TopicPartition的消费情况,它的position字段则记录了消费者下次要从服务器获取消息的offset。他们形成了ConsumerCoordinator.commitOffset*()方法的第一个参数。

private RequestFuture<Void> sendOffsetCommitRequest(final Map<TopicPartition, OffsetAndMetadata> offsets) {
    if (offsets.isEmpty())
        return RequestFuture.voidSuccess();

    Node coordinator = coordinator();
    if (coordinator == null)
        return RequestFuture.coordinatorNotAvailable();

    // create the offset commit request 创建request中的data
    Map<TopicPartition, OffsetCommitRequest.PartitionData> offsetData = new HashMap<>(offsets.size());
    for (Map.Entry<TopicPartition, OffsetAndMetadata> entry : offsets.entrySet()) {
        OffsetAndMetadata offsetAndMetadata = entry.getValue();
        if (offsetAndMetadata.offset() < 0) {
            return RequestFuture.failure(new IllegalArgumentException("Invalid offset: " + offsetAndMetadata.offset()));
        }
        offsetData.put(entry.getKey(), new OffsetCommitRequest.PartitionData(
                offsetAndMetadata.offset(), offsetAndMetadata.metadata()));
    }
    //设置 年代
    final Generation generation;
    if (subscriptions.partitionsAutoAssigned())
        generation = generation();
    else
        generation = Generation.NO_GENERATION;

    if (generation == null)
        return RequestFuture.failure(new CommitFailedException());
    //request 创建
    OffsetCommitRequest.Builder builder = new OffsetCommitRequest.Builder(this.groupId, offsetData).
            setGenerationId(generation.generationId).
            setMemberId(generation.memberId).
            setRetentionTime(OffsetCommitRequest.DEFAULT_RETENTION_TIME);

    log.trace("Sending OffsetCommit request with {} to coordinator {}", offsets, coordinator);
    // 发送
    return client.send(coordinator, builder)
            .compose(new OffsetCommitResponseHandler(offsets));  //【入】 处理handler
}

handler 处理响应

for (Map.Entry<TopicPartition, Errors> entry : commitResponse.responseData().entrySet()) {
    TopicPartition tp = entry.getKey();
    OffsetAndMetadata offsetAndMetadata = this.offsets.get(tp);
    long offset = offsetAndMetadata.offset();

    Errors error = entry.getValue();
    if (error == Errors.NONE) {
        log.debug("Committed offset {} for partition {}", offset, tp);
        if (subscriptions.isAssigned(tp))
            // update the local cache only if the partition is still assigned
            subscriptions.committed(tp, offsetAndMetadata);//给SubscriptionState 重新赋值
    }

2.2 fetch offset

  在rebalance操作结束之后,我们要开始消费消息,但是这个时候consumer不知道对应的分区已经被消费到哪个位置了,所以需要在消费之前,拉取存放于kafka内部的offsets topic中的offset。

 所以我们要发送OffsetFetchRequest请求

public void refreshCommittedOffsetsIfNeeded() {
    if (subscriptions.refreshCommitsNeeded()) {
        Map<TopicPartition, OffsetAndMetadata> offsets = fetchCommittedOffsets(subscriptions.assignedPartitions()); //【入】
        for (Map.Entry<TopicPartition, OffsetAndMetadata> entry : offsets.entrySet()) {
            TopicPartition tp = entry.getKey();
            // verify assignment is still active
            if (subscriptions.isAssigned(tp))
                this.subscriptions.committed(tp, entry.getValue()); //修改offset值
        }
        this.subscriptions.commitsRefreshed();
    }
}
public Map<TopicPartition, OffsetAndMetadata> fetchCommittedOffsets(Set<TopicPartition> partitions) {
    while (true) {
        ensureCoordinatorReady();
        // contact coordinator to fetch committed offsets
        RequestFuture<Map<TopicPartition, OffsetAndMetadata>> future = sendOffsetFetchRequest(partitions); //【入】
        client.poll(future);
        if (future.succeeded())//返回从服务端 获取的offset
            return future.value();
        if (!future.isRetriable())
            throw future.exception();
        time.sleep(retryBackoffMs);
    }
}

创建request,参数:groupId,partition list

private RequestFuture<Map<TopicPartition, OffsetAndMetadata>> sendOffsetFetchRequest(Set<TopicPartition> partitions) {
    Node coordinator = coordinator();
    if (coordinator == null)
        return RequestFuture.coordinatorNotAvailable();
    // construct the request
    OffsetFetchRequest.Builder requestBuilder = new OffsetFetchRequest.Builder(this.groupId,
            new ArrayList<>(partitions));
    // send the request with a callback
    return client.send(coordinator, requestBuilder)
            .compose(new OffsetFetchResponseHandler());
}

handler处理response

Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>(response.responseData().size());//记录从服务端获取的offset集合
for (Map.Entry<TopicPartition, OffsetFetchResponse.PartitionData> entry : response.responseData().entrySet()) {
    TopicPartition tp = entry.getKey();
    OffsetFetchResponse.PartitionData data = entry.getValue();
    if (data.hasError()) {
       ...
    } else if (data.offset >= 0) { // 记录正常的offset
        // record the position with the offset (-1 indicates no committed offset to fetch)
        offsets.put(tp, new OffsetAndMetadata(data.offset, data.metadata));
    } else {
        log.debug("Found no committed offset for partition {}", tp);
    }
}
//传播offsets 集合,最终通过fetchCommitedOffsets()方法返回
future.complete(offsets);

三,fetcher

到了这里,我们一切准备就绪,消费者可以从server拉取消息进行消费了,consumer使用fetcher来拉取消息,通过FetchRequest请求!

字段详解:

private final ConsumerNetworkClient client; //负责网络通信
private final Time time;
private final int minBytes;        //从服务端收到request之后,不是立即响应,而是当可返回的消息数据至少累积到minBytes字节的时候才响应,(server延时任务)
private final int maxBytes;
private final int maxWaitMs;      //等待response最长时间,server根据这个决定何时响应
private final int fetchSize;       //每次fetch操作的最大字节数
private final long retryBackoffMs;
private final int maxPollRecords;     //每次获取record的最大数量
private final boolean checkCrcs;
private final Metadata metadata;
private final FetchManagerMetrics sensors;
private final SubscriptionState subscriptions;  //记录每个partition的消费情况
private final ConcurrentLinkedQueue<CompletedFetch> completedFetches;  //每个response首先转成fetch对象进入缓冲队列,等待解析
private final BufferSupplier decompressionBufferSupplier = BufferSupplier.create();
//两个 反序列化 器
private final ExtendedDeserializer<K> keyDeserializer;
private final ExtendedDeserializer<V> valueDeserializer;
private final IsolationLevel isolationLevel;
//保存了CompleteFetch解析后的结果集合,(内部类)
private PartitionRecords nextInLineRecords = null;

Fetch类的核心方法分为三类:

    1.fetch消息的,获取消息

    2.更新offset,更新TopicPartitionState中的position字段

    3.获取metadata信息的方法,用于指定topic的元信息

3.1 创建FetchRequest请求

1.按条件查找fetchable分区

2.查找fatchable分区的leader所在的node

3.找到node节点,如果Node已经在unsent集合中,则不发送FetchRequest请求。

4.通过SubscriptionState查找每个分区对应的position,

5.最后,按照Node分类,将发往同一个node节点的topicPartition封城成FetchRequest对象

我们先看createFetchRequest()的代码,如上面叙述的步骤一样,先遍历fetchable,然后封装成FetchRequest

private Map<Node, FetchRequest.Builder> createFetchRequests() {
    // create the fetch info
    Cluster cluster = metadata.fetch(); //获取kafka集群元数据
    Map<Node, LinkedHashMap<TopicPartition, FetchRequest.PartitionData>> fetchable = new LinkedHashMap<>();
    // 根据条件 筛选tp,放入fetchAble
    for (TopicPartition partition : fetchablePartitions()) {
        Node node = cluster.leaderFor(partition); //获得leader副本所在node
        if (node == null) {
            metadata.requestUpdate();
        } else if (!this.client.hasPendingRequests(node)) { //是否还有pending请求
            // if there is a leader and no in-flight requests, issue a new fetch
            LinkedHashMap<TopicPartition, FetchRequest.PartitionData> fetch = fetchable.get(node);
            if (fetch == null) { //fetch初始化
                fetch = new LinkedHashMap<>();
                fetchable.put(node, fetch);
            }
            //记录每个分区对应的position
            long position = this.subscriptions.position(partition);
            fetch.put(partition, new FetchRequest.PartitionData(position, FetchRequest.INVALID_LOG_START_OFFSET,
                    this.fetchSize));
            log.debug("Added {} fetch request for partition {} at offset {} to node {}", isolationLevel,
                    partition, position, node);
        } else {
            log.trace("Skipping fetch for partition {} because there is an in-flight request to {}", partition, node);
        }
    }

    // create the fetches
    Map<Node, FetchRequest.Builder> requests = new HashMap<>();
    for (Map.Entry<Node, LinkedHashMap<TopicPartition, FetchRequest.PartitionData>> entry : fetchable.entrySet()) {
        Node node = entry.getKey();
        FetchRequest.Builder fetch = FetchRequest.Builder.forConsumer(this.maxWaitMs, this.minBytes,
                entry.getValue(), isolationLevel)
                .setMaxBytes(this.maxBytes);
        requests.put(node, fetch); //整合
    }
    return requests;
}

  之后我们再来看发送FetchRequest代码,还是老模式,放入unsent中缓冲起来,而与之前XXRequest发送不同的是,这里的send()它没有绑定handler对response进行处理,而是直接使用listener对response进行处理,应该是消费消息提高效率

,对于response的处理,就是把获取到的数据组装成为completedFetch放到列表中,等待解析。

public int sendFetches() {
    Map<Node, FetchRequest.Builder> fetchRequestMap = createFetchRequests(); //【入】返回request
    for (Map.Entry<Node, FetchRequest.Builder> fetchEntry : fetchRequestMap.entrySet()) {
        final FetchRequest.Builder request = fetchEntry.getValue();
        final Node fetchTarget = fetchEntry.getKey();
        //发送+ 监听   将发往每个node的request都缓冲到unsent中。
        client.send(fetchTarget, request)
                .addListener(new RequestFutureListener<ClientResponse>() {
                    @Override  //【★】
                    public void onSuccess(ClientResponse resp) {
                      //response强转为fetchResponse
                        FetchResponse response = (FetchResponse) resp.responseBody();
                        if (!matchesRequestedPartitions(request, response)) {
                            return;
                        }

                        Set<TopicPartition> partitions = new HashSet<>(response.responseData().keySet());
                        FetchResponseMetricAggregator metricAggregator = new FetchResponseMetricAggregator(sensors, partitions);
                        //遍历响应中的数据
                        for (Map.Entry<TopicPartition, FetchResponse.PartitionData> entry : response.responseData().entrySet()) {
                            TopicPartition partition = entry.getKey();
                            long fetchOffset = request.fetchData().get(partition).fetchOffset;
                            FetchResponse.PartitionData fetchData = entry.getValue();
                            //创建completeFetch,并缓冲到xxfetches队列中
                            completedFetches.add(new CompletedFetch(partition, fetchOffset, fetchData, metricAggregator,
                                    resp.requestHeader().apiVersion()));
                        }

                        sensors.fetchLatency.record(resp.requestLatencyMs());
                    }
                });
    }
    return fetchRequestMap.size();
}

主要内容在处理response上面,对于创建request,我们在第一步中已经看到了,之后的重点,就转移到了如何解析CompleteFetches中的response封装了!在这里稍显一点复杂,请提高注意力!

我们的解析是放在fetchedRecords()方法中,同样位于Fetcher类中,同时还存在fetchRecords()方法,主要两者的区别,不要搞混淆!



public Map<TopicPartition, List<ConsumerRecord<K, V>>> fetchedRecords() {
    Map<TopicPartition, List<ConsumerRecord<K, V>>> fetched = new HashMap<>();
    int recordsRemaining = maxPollRecords;  //一次最多取出x条消息
        while (recordsRemaining > 0) {    //如果解析结果为空,也就是第一次 消费消息
            if (nextInLineRecords == null || nextInLineRecords.isFetched) {
                CompletedFetch completedFetch = completedFetches.peek();
                if (completedFetch == null) break;

                nextInLineRecords = parseCompletedFetch(completedFetch); //解析并赋值
                completedFetches.poll();
            } else {   //后续消费,要核对offset                            //【入】
                List<ConsumerRecord<K, V>> records = fetchRecords(nextInLineRecords, recordsRemaining);
                TopicPartition partition = nextInLineRecords.partition;
                if (!records.isEmpty()) {
                    List<ConsumerRecord<K, V>> currentRecords = fetched.get(partition);
                    //将解析结果,放入容器中返回
                    if (currentRecords == null) {
                        fetched.put(partition, records);
                    } else {
                        List<ConsumerRecord<K, V>> newRecords = new ArrayList<>(records.size() + currentRecords.size());
                        newRecords.addAll(currentRecords);
                        newRecords.addAll(records);
                        fetched.put(partition, newRecords);
                    }
                    recordsRemaining -= records.size(); //这里才-1
                }
            }
        }
    return fetched;
}

该方法里面什么参数都没有传入,而是获取了字段:maxPollRecords,后续的操作都与它有关,它表示的是每次获取record的最大数量,所以它也对应了一个response里面存放的最多record。这里主要是最多,而不是一定是这么多!

进入了循环:循环的条件就是maxPollRecords条件,该条件的减少出现在其内容体中的else{}段中,那么第一段if{}中出现的nextInlineRecords字段,它代表的是CompleteFetch解析出来的结果集合,类型是PartitionRecords,是Fetcher的内部类,我们看一下主要字段:

private class PartitionRecords {
    private final TopicPartition partition;  //对应的partion
    private final CompletedFetch completedFetch; //对应的源数据
    private final Iterator<? extends RecordBatch> batches;
    private final Set<Long> abortedProducerIds;    
    private final PriorityQueue<FetchResponse.AbortedTransaction> abortedTransactions;

    private int recordsRead;
    private int bytesRead;
    private RecordBatch currentBatch;   //出现了batch,联想到producer端的batch
    private Record lastRecord;
    private CloseableIterator<Record> records; //消息集合
    private long nextFetchOffset;
    private boolean isFetched = false;
    private Exception cachedRecordException = null;
    private boolean corruptLastRecord = false;
...
}

里面有一个CompletedFetch,它的字段很全面,主要的是CloseableIterator<Record> records。

可以判定,在第一次拉取消息,nextInLineRecords为null,那么就需要进入if{}段中,使用peek()拿到CompletedFetch,对其进行解析,我们看解析过程:parseCompletedFetch()


private PartitionRecords parseCompletedFetch(CompletedFetch completedFetch) {
    //解析 fetch中的 信息,取出来
    TopicPartition tp = completedFetch.partition;
    FetchResponse.PartitionData partition = completedFetch.partitionData;
    long fetchOffset = completedFetch.fetchedOffset;
    //待返回值,先准备好
    PartitionRecords partitionRecords = null;
    Errors error = partition.error;

    try {
        if (!subscriptions.isFetchable(tp)) {
        } else if (error == Errors.NONE) { //【★】 内容不多,就是一大堆检测
            Long position = subscriptions.position(tp);
            Iterator<? extends RecordBatch> batches = partition.records.batches().iterator();
            partitionRecords = new PartitionRecords(tp, completedFetch, batches);  //结果 组装
            if (!batches.hasNext() && partition.records.sizeInBytes() > 0) {
                if (completedFetch.responseVersion < 3) {
                    Map<TopicPartition, Long> recordTooLargePartitions = Collections.singletonMap(tp, fetchOffset);
                } else {
                }
            }

           ....
    return partitionRecords;
}

注意这里的内容,并没有进行反序列化,

我们继续回到while()循环中,这一次,我们走到else{}代码块中:第一行就根据if()中获得的值+maxPollRecord调用了fetchRecords()方法

private List<ConsumerRecord<K, V>> fetchRecords(PartitionRecords partitionRecords, int maxRecords) {
    if (!subscriptions.isAssigned(partitionRecords.partition)) {
        //需要进行rebalance操作,则返回 空集合
    } else {
        long position = subscriptions.position(partitionRecords.partition);
        if (!subscriptions.isFetchable(partitionRecords.partition)) { //核对partition
        } else if (partitionRecords.nextFetchOffset == position) { //核对 offset
            List<ConsumerRecord<K, V>> partRecords = partitionRecords.fetchRecords(maxRecords); //【入】

            long nextOffset = partitionRecords.nextFetchOffset;
            subscriptions.position(partitionRecords.partition, nextOffset);
            Long partitionLag = subscriptions.partitionLag(partitionRecords.partition, isolationLevel);
            if (partitionLag != null)
                this.sensors.recordPartitionLag(partitionRecords.partition, partitionLag);
            return partRecords;
        }
    }
    partitionRecords.drain(); //【入】
    return emptyList();
}

可以看到,重点代码就两个,其他全部都是检测字段是否合法。我们进入partitionRecords.fetchRecords()内部类的方法中:

private List<ConsumerRecord<K, V>> fetchRecords(int maxRecords) {
    List<ConsumerRecord<K, V>> records = new ArrayList<>();
    try {
        //遍历 从0开始,i拿来没用
        for (int i = 0; i < maxRecords; i++) {
            if (cachedRecordException == null) {
                corruptLastRecord = true;
                lastRecord = nextFetchedRecord(); //★
                corruptLastRecord = false;
            }
            if (lastRecord == null)
                break;
            //【入】 解析
            records.add(parseRecord(partition, currentBatch, lastRecord));
            recordsRead++;
            bytesRead += lastRecord.sizeInBytes();
            nextFetchOffset = lastRecord.offset() + 1;
           
            cachedRecordException = null;
        }
    } 
    return records;
}

同样如此,重点方法出现在了add()的参数中,调用了parseRecord()方法,重点是传入的lastRecord值,我们在第一段if()里面生成PartitionRecord实例的时候,其字段records 是空的,它的生成就是在nextFetchedRecord()中!具体代码我们这里不查看了,使用的是迭代器,有点看不懂!

我们直接来到parseRecord()解析record!

private ConsumerRecord<K, V> parseRecord(TopicPartition partition,
                                         RecordBatch batch,
                                         Record record) {
    try {
        long offset = record.offset();
        long timestamp = record.timestamp();
        TimestampType timestampType = batch.timestampType();
        Headers headers = new RecordHeaders(record.headers());
        ByteBuffer keyBytes = record.key();
//反序列化
        byte[] keyByteArray = keyBytes == null ? null : Utils.toArray(keyBytes);
        K key = keyBytes == null ? null : this.keyDeserializer.deserialize(partition.topic(), headers, keyByteArray);
        ByteBuffer valueBytes = record.value();
        byte[] valueByteArray = valueBytes == null ? null : Utils.toArray(valueBytes);
        V value = valueBytes == null ? null : this.valueDeserializer.deserialize(partition.topic(), headers, valueByteArray);

        //返回值的 创建,看似比较复杂,有两个是三元运算
        return new ConsumerRecord<>(partition.topic(), partition.partition(), offset,
                                    timestamp, timestampType, record.checksumOrNull(),
                                    keyByteArray == null ? ConsumerRecord.NULL_SIZE : keyByteArray.length,
                                    valueByteArray == null ? ConsumerRecord.NULL_SIZE : valueByteArray.length,
                                    key, value, headers);
    }
}

解析结果:ConsumerRecord<K,V>出场了!,。。。。。。。回到fetchedRecords()中,我们现在取到了值:

再更具topic,把结果放入容器中,返回,到这里,我们的消息就成功的解析出来了!!!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Kafka提供了一个Java客户端库`kafka-clients`,其中包含用于创建和管理消费者的类和方法。下面是一个示例,展示如何使用`kafka-clients`中的消费者类来消费Kafka消息: ```java import org.apache.kafka.clients.consumer.ConsumerConfig; import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; import org.apache.kafka.clients.consumer.KafkaConsumer; import org.apache.kafka.common.TopicPartition; import java.time.Duration; import java.util.Collections; import java.util.Properties; public class KafkaConsumerExample { public static void main(String[] args) { String bootstrapServers = "localhost:9092"; String groupId = "my-consumer-group"; String topic = "my-topic"; // 配置消费者属性 Properties properties = new Properties(); properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers); properties.put(ConsumerConfig.GROUP_ID_CONFIG, groupId); properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer"); properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer"); // 创建消费者实例 KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties); // 订阅主题 consumer.subscribe(Collections.singletonList(topic)); // 或者指定特定的分区进行订阅 // TopicPartition partition = new TopicPartition(topic, 0); // consumer.assign(Collections.singleton(partition)); // 开始消费消息 while (true) { ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000)); for (ConsumerRecord<String, String> record : records) { // 处理消息 System.out.println("Received message: " + record.value()); } } } } ``` 在上述示例中,首先配置了消费者的属性,包括Kafka集群地址、消费者组ID以及消息的反序列化器。然后创建了一个`KafkaConsumer`对象,并使用`subscribe`方法订阅了一个主题(或者可以使用`assign`方法指定特定的分区进行订阅)。 最后,在一个无限循环中调用`poll`方法来获取消息记录,然后遍历处理每条消息。 需要注意的是,消费者需要定期调用`poll`方法以获取新的消息记录。另外,消费者还可以使用`commitSync`或`commitAsync`方法手动提交消费位移,以确保消息被成功处理。 希望以上示例对你理解如何使用`kafka-clients`库中的消费者类来消费Kafka消息有所帮助!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值