kafka consumer group总结

最新推荐文章于 2025-10-07 08:24:13 发布

原创最新推荐文章于 2025-10-07 08:24:13 发布 · 2.6w 阅读

5 ·

CC 4.0 BY-SA版权

分布式专栏收录该内容

14 篇文章

订阅专栏

本文介绍了Kafka的高级API消费者组的工作原理和注意事项。消费者组通过Zookeeper进行管理，自动处理负载均衡和offset管理。内容涵盖消费者线程与分区的关系、数据顺序性保证、rebalance影响以及offset更新策略。示例展示了如何创建和运行消费者线程池。

kafka消费者api分为high api和low api，目前上述demo是都是使用kafka high api，高级api不用关心维护消费状态信息和负载均衡,不用关心offset。
高级api的一些注意事项：
1. 如果consumer group中的consumer线程数量比partition多，那么有的线程将永远不会收到消息。
因为kafka的设计是在一个partition上是不允许并发的，所以consumer数不要大于partition数

2，如果consumer group中的consumer线程数量比partition少，那么有的线程将会收到多个消息。并且不保证数据间的顺序性，kafka只保证在一个partition上数据是有序的，

3，增减consumer，broker，partition会导致rebalance，所以rebalance后consumer对应的partition会发生变化

4，High-level接口中获取不到数据的时候是会block的

关于consumer group（high api）的几点总结：
1，以consumer group为单位订阅 topic，每个consumer一起去消费一个topic；
2，consumer group 通过zookeeper来消费kafka集群中的消息（这个过程由zookeeper进行管理）；
相对于low api自己管理offset，high api把offset的管理交给了zookeeper，但是high api并不是消费一次就在zookeeper中更新一次，而是每间隔一个(默认1000ms)时间更新一次offset，可能在重启消费者时拿到重复的消息。此外，当分区leader发生变更时也可能拿到重复的消息。因此在关闭消费者时最好等待一定时间（10s）然后再shutdown。
3，consumer group 设计的目的之一也是为了应用多线程同时去消费一个topic中的数据。

例子：

import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
 
public class ConsumerTest implements Runnable {
    private KafkaStream m_stream;
    private int m_threadNumber;
 
    public ConsumerTest(KafkaStream a_stream, int a_threadNumber) {
        m_threadNumber = a_threadNumber;
        m_stream = a_stream;
    }
 
    public void run() {
        ConsumerIterator<byte[], byte[]> it = m_stream.iterator();
        while (it.hasNext())
            System.out.println("Thread " + m_threadNumber + ": " + new String(it.next().message()));
        System.out.println("Shutting down Thread: " + m_threadNumber);
    }
}

//配置连接zookeeper的信息
private static ConsumerConfig createConsumerConfig(String a_zookeeper, String a_groupId) {
        Properties props = new Properties();
        props.put("zookeeper.connect", a_zookeeper);		//zookeeper连接地址
        props.put("group.id", a_groupId);			//consumer group的id
        props.put("zookeeper.session.timeout.ms", "400");
        props.put("zookeeper.sync.time.ms", "200");
        props.put("auto.commit.interval.ms", "1000");
        return new ConsumerConfig(props);
    }

//建立一个消费者线程池
public void run(int a_numThreads) {
    Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
    topicCountMap.put(topic, new Integer(a_numThreads));
    Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
    List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
 
 
    // now launch all the threads
    //
    executor = Executors.newFixedThreadPool(a_numThreads);
 
    // now create an object to consume the messages
    //
    int threadNumber = 0;
    for (final KafkaStream stream : streams) {
        executor.submit(new ConsumerTest(stream, threadNumber));
        threadNumber++;
    }
}

//经过一段时间后关闭
try {
			Thread.sleep(10000);
		} catch (InterruptedException ie) {

		}
		example.shutdown();

import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
 
public class ConsumerTest implements Runnable {
    private KafkaStream m_stream;
    private int m_threadNumber;
 
    public ConsumerTest(KafkaStream a_stream, int a_threadNumber) {
        m_threadNumber = a_threadNumber;
        m_stream = a_stream;
    }
 
    public void run() {
        ConsumerIterator<byte[], byte[]> it = m_stream.iterator();
        while (it.hasNext())
            System.out.println("Thread " + m_threadNumber + ": " + new String(it.next().message()));
        System.out.println("Shutting down Thread: " + m_threadNumber);
    }
}

//配置连接zookeeper的信息
private static ConsumerConfig createConsumerConfig(String a_zookeeper, String a_groupId) {
        Properties props = new Properties();
        props.put("zookeeper.connect", a_zookeeper);		//zookeeper连接地址
        props.put("group.id", a_groupId);			//consumer group的id
        props.put("zookeeper.session.timeout.ms", "400");
        props.put("zookeeper.sync.time.ms", "200");
        props.put("auto.commit.interval.ms", "1000");
        return new ConsumerConfig(props);
    }

//建立一个消费者线程池
public void run(int a_numThreads) {
    Map<String, Integer> topicCountMap = new HashMap<String, Integer>();
    topicCountMap.put(topic, new Integer(a_numThreads));
    Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap);
    List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic);
 
 
    // now launch all the threads
    //
    executor = Executors.newFixedThreadPool(a_numThreads);
 
    // now create an object to consume the messages
    //
    int threadNumber = 0;
    for (final KafkaStream stream : streams) {
        executor.submit(new ConsumerTest(stream, threadNumber));
        threadNumber++;
    }
}

//经过一段时间后关闭
try {
			Thread.sleep(10000);
		} catch (InterruptedException ie) {

		}
		example.shutdown();

3 条评论

xxoo109 2017.05.23
代码应该是开了一个consumer，起了N 个线程去处理数据。

Vanquishing 2016.10.01
很棒的代码！学习了！有个问题，在kafka中这个组的概念是不是只与消费者相关，只有消费者存在组的概念。从代码中我感觉你的这个线程组是一个消费者，只是开了几个线程去读不同的partition。并没有出现多个消费者的情况，有可能是我没有看懂-。-。能不能麻烦解释一下下，谢啦
- 骑驴小子回复Vanquishing 2018.04.28
  这个代码写的乱啊。。
- 骑驴小子回复Vanquishing 2018.04.28
  [reply]qq379854836[/reply] 对的，消费组只跟消费者有关系。亲历实验得知：消费组的概念是，同一个消费组的消费者，一个消费了这条消息，另一个就不消费了。同一个消费组的消费者轮着来消费同一个topic的消息。
- 丶小柒灬回复Vanquishing 2018.01.17
  [reply]qq379854836[/reply] 创建消费者，就代表创建一个消费者组，消费者组内多个线程为多个消费者实例，共享一个唯一的group id。消费者组内所有实例消费topic的时候通过rebalance进行分工，即各自消费各自的分区。