KafkaConsumer Api


消费者组和Topic订阅 Groups and Topic Subscriptions

Kafka uses the concept of consumer groups to allow a pool of processes to divide the work of consuming and processing records. These processes can either be running on the same machine or they can be distributed over many machines to provide scalability and fault tolerance for processing. All consumer instances sharing the same group.id will be part of the same consumer group.


Each consumer in a group can dynamically set the list of topics it wants to subscribe to through one of the subscribe APIs. Kafka will deliver each message in the subscribed topics to one process in each consumer group. This is achieved by balancing the partitions between all members in the consumer group so that each partition is assigned to exactly one consumer in the group. So if there is a topic with four partitions, and a consumer group with two processes, each process would consume from two partitions.

同一消费者组的每个消费者都可以动态的通过subscribe API订阅一些topic。kafka会将topic中的每个消息传送给每个消费者组中的一个消费者。通过平衡分区和组内的消费者,实现每个分区都被分配到一个消费者(一个分区,在同一个消费者组下只对应一个消费者)。因此,如果一个有4个分区的topic,和一个有两个消费者的消费者组,经过平衡后,每个消费者将会消费两个分区。

Membership in a consumer group is maintained dynamically: if a process fails, the partitions assigned to it will be reassigned to other consumers in the same group. Similarly, if a new consumer joins the group, partitions will be moved from existing consumers to the new one. This is known as rebalancing the group and is discussed in more detail below. Group rebalancing is also used when new partitions are added to one of the subscribed topics or when a new topic matching a subscribed regex is created. The group will automatically detect the new partitions through periodic metadata refreshes and assign them to members of the group.

一个消费者组内的消费关系(消费者和分区的分配关系)是动态维护的:如果一个消费者进程发生了故障,原本分配给它的分区将会被重新分配给组内的消费者。类似地,如果一个新的消费者加入了组,分区也将会从已经存在的消费者上重新分配给新的消费者。这就是组内再平衡。组内再平衡也会发生在添加了新的分区或一个新创建的topic匹配了subscribed regex之后。消费者组会通过定期的元数据刷新自动的发现新的分区加入,然后分配给组内的消费者。

Conceptually you can think of a consumer group as being a single logical subscriber that happens to be made up of multiple processes. As a multi-subscriber system, Kafka naturally supports having any number of consumer groups for a given topic without duplicating data (additional consumers are actually quite cheap).


In addition, when group reassignment happens automatically, consumers can be notified through a ConsumerRebalanceListener, which allows them to finish necessary application-level logic such as state cleanup, manual offset commits, etc. See Storing Offsets Outside Kafka for more details.


It is also possible for the consumer to manually assign specific partitions (similar to the older “simple” consumer) using assign(Collection). In this case, dynamic partition assignment and consumer group coordination will be disabled.


感知消费者故障 Detecting Consumer Failures

After subscribing to a set of topics, the consumer will automatically join the group when poll(Duration) is invoked. The poll API is designed to ensure consumer liveness. As long as you continue to call poll, the consumer will stay in the group and continue to receive messages from the partitions it was assigned. Underneath the covers, the consumer sends periodic heartbeats to the server. If the consumer crashes or is unable to send heartbeats for a duration of session.timeout.ms, then the consumer will be considered dead and its partitions will be reassigned.

消费者在订阅了topic之后,调用poll方法时,会自动的加入到消费者组中。poll方法设计被用来确保消费者存活。只要不断的调用poll方法,消费者就会保持存活,并继续接收已被分配的分区消息。在后台,消费者也会定时向kafka server发送心跳。如果消费者宕机了,或者没有在session.timeout.ms时间间隔内发送心跳,消费者就会被认为停机了,原本分配给该消费者的分区也将会被重新分配。

It is also possible that the consumer could encounter a “livelock” situation where it is continuing to send heartbeats, but no progress is being made. To prevent the consumer from holding onto its partitions indefinitely in this case, we provide a liveness detection mechanism using the max.poll.interval.ms setting. Basically if you don’t call poll at least as frequently as the configured max interval, then the client will proactively leave the group so that another consumer can take over its partitions. When this happens, you may see an offset commit failure (as indicated by a CommitFailedException thrown from a call to commitSync()). This is a safety mechanism which guarantees that only active members of the group are able to commit offsets. So to stay in the group, you must continue to call poll.


The consumer provides two configuration settings to control the behavior of the poll loop:

  1. max.poll.interval.ms: By increasing the interval between expected polls, you can give the consumer more time to handle a batch of records returned from poll(Duration). The drawback is that increasing this value may delay a group rebalance since the consumer will only join the rebalance inside the call to poll. You can use this setting to bound the time to finish a rebalance, but you risk slower progress if the consumer cannot actually call poll often enough.
  2. max.poll.records: Use this setting to limit the total records returned from a single call to poll. This can make it easier to predict the maximum that must be handled within each poll interval. By tuning this value, you may be able to reduce the poll interval, which will reduce the impact of group rebalancing.


  1. max.poll.interval.ms: 通过增加poll的间隔时间,消费者可以有更多的时间处理拉到的消息。缺点就是,增加这个值,可能会使组内再平衡延迟(只会在消费者调用poll时进行)。可以通过设置这个值来限制完成一次再平衡的时间,但是如果消费者无法在这个时间调用poll,则可能会使进度变慢。
  2. max.poll.records: 通过设定这个配置项,可以限制一次poll调用返回的消息数量。配置了之后,就可以知道每次poll时必须处理的消息的最大数量。通过调整这个值,可以减少poll间隔,这可以减少组内再平衡的影响。

For use cases where message processing time varies unpredictably, neither of these options may be sufficient. The recommended way to handle these cases is to move message processing to another thread, which allows the consumer to continue calling poll while the processor is still working. Some care must be taken to ensure that committed offsets do not get ahead of the actual position. Typically, you must disable automatic commits and manually commit processed offsets for records only after the thread has finished handling them (depending on the delivery semantics you need). Note also that you will need to pause the partition so that no new records are received from poll until after thread has finished handling those previously returned.


使用示例 Usage Examples

自动提交 Automatic Offset Committing
     Properties props = new Properties();
     props.setProperty("bootstrap.servers", "localhost:9092");
     props.setProperty("group.id", "test");
     props.setProperty("enable.auto.commit", "true");
     props.setProperty("auto.commit.interval.ms", "1000");
     props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
     props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
     KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
     consumer.subscribe(Arrays.asList("foo", "bar"));
     while (true) {
         ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
         for (ConsumerRecord<String, String> record : records)
             System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());


手动位点控制 Manual Offset Control
     Properties props = new Properties();
     props.setProperty("bootstrap.servers", "localhost:9092");
     props.setProperty("group.id", "test");
     props.setProperty("enable.auto.commit", "false");
     props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
     props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
     KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
     consumer.subscribe(Arrays.asList("foo", "bar"));
     final int minBatchSize = 200;
     List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
     while (true) {
         ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
         for (ConsumerRecord<String, String> record : records) {
         if (buffer.size() >= minBatchSize) {
手动指定分区 Manual Partition Assignment
     String topic = "foo";
     TopicPartition partition0 = new TopicPartition(topic, 0);
     TopicPartition partition1 = new TopicPartition(topic, 1);
     consumer.assign(Arrays.asList(partition0, partition1));

The group that the consumer specifies is still used for committing offsets, but now the set of partitions will only change with another call to assign. Manual partition assignment does not use group coordination, so consumer failures will not cause assigned partitions to be rebalanced. Each consumer acts independently even if it shares a groupId with another consumer. To avoid offset commit conflicts, you should usually ensure that the groupId is unique for each consumer instance.


Note that it isn’t possible to mix manual partition assignment (i.e. using assign) with dynamic partition assignment through topic subscription (i.e. using subscribe).


不使用kafka存储位点 Storing Offsets Outside Kafka



  • 如果消费的结果存储在关系数据库中,那么在数据库中存储偏移量也可以允许在单个事务中提交结果和偏移量。因此,要么交易成功,补偿将根据消耗的内容更新,要么结果将不存储,补偿将不更新。


  • 配置enable.auto.commit=false
  • ConsumerRecord中获取消息的位点
  • 重启时,使用seek(Topicpartition, long)来重置位点
控制消费的位点 Controlling The Consumer’s Position

In most use cases the consumer will simply consume records from beginning to end, periodically committing its position (either automatically or manually). However Kafka allows the consumer to manually control its position, moving forward or backwards in a partition at will. This means a consumer can re-consume older records, or skip to the most recent records without actually consuming the intermediate records.


Kafka allows specifying the position using seek(TopicPartition, long) to specify the new position. Special methods for seeking to the earliest and latest offset the server maintains are also available ( seekToBeginning(Collection) and seekToEnd(Collection) respectively).

kafka允许通过seek(TopicPartition, long)来指定位点。指定消费位点特殊的方法还有移动到最前和最后(seekToBeginning(Collection)seekToEnd(Collection)

消费流量控制 Consumption Flow Control

If a consumer is assigned multiple partitions to fetch data from, it will try to consume from all of them at the same time, effectively giving these partitions the same priority for consumption. However in some cases consumers may want to first focus on fetching from some subset of the assigned partitions at full speed, and only start fetching other partitions when these partitions have few or no data to consume.


Kafka supports dynamic controlling of consumption flows by using pause(Collection) and resume(Collection) to pause the consumption on the specified assigned partitions and resume the consumption on the specified paused partitions respectively in the future poll(Duration) calls.


读事务消息 Reading Transactional Messages

多线程处理 Multi-threaded Processing

The Kafka consumer is NOT thread-safe. All network I/O happens in the thread of the application making the call. It is the responsibility of the user to ensure that multi-threaded access is properly synchronized. Un-synchronized access will result in ConcurrentModificationException.

Kafka consumer不是线程安全的。用户应该去报多线程同步访问。非同步访问将会抛出ConcurrentModificationException



  • 只会发生在subscribe订阅时,assign手动指定分区消费的方式,不会发生再平衡
  • 加入/退出消费者,添加新的分区,添加subscribe regex的topic后,都会触发
  • max.poll.interval.mspoll操作的间隔时间会影响组内再平衡
  • 一个分区只会分配给一个消费者,一个消费者可能对应多个分区
  • 如果消费者没有在session.timeout.ms时间内发送心跳或没有进行poll操作,会被认为下线,分配给它的分区也将会重新分配给其他的消费者




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则




¥1 ¥2 ¥4 ¥6 ¥10 ¥20



钱包余额 0


