zookeeper 与 kafka的协同工作

转载 2016年05月31日 19:17:54



18down voteaccepted

First of all, zookeeper is needed only for high level consumer. SimpleConsumer does not require zookeeper to work.

The main reason zookeeper is needed for a high level consumer is to track consumed offsets and handle load balancing.

Now in more detail.

Regarding offset tracking, imagine following scenario: you start a consumer, consume 100 messages and shut the consumer down. Next time you start your consumer you'll probably want to resume from your last consumed offset (which is 100), and that means you have to store the maximum consumed offset somewhere. Here's where zookeeper kicks in: it stores offsets for every group/topic/partition. So this way next time you start your consumer it may ask "hey zookeeper, what's the offset I should start consuming from?". Kafka is actually moving towards being able to store offsets not only in zookeeper, but in other storages as well (for now only zookeeper and kafka offset storages are available and i'm not sure kafka storage is fully implemented).

Regarding load balancing, the amount of messages produced can be quite large to be handled by 1 machine and you'll probably want to add computing power at some point. Lets say you have a topic with 100 partitions and to handle this amount of messages you have 10 machines. There are several questions that arise here actually:

  • how should these 10 machines divide partitions between each other?
  • what happens if one of machines die?
  • what happens if you want to add another machine?

And again, here's where zookeeper kicks in: it tracks all consumers in group and each high level consumer is subscribed for changes in this group. The point is that when a consumer appears or disappears, zookeeper notifies all consumers and triggers rebalance so that they split partitions near-equally (e.g. to balance load). This way it guarantees if one of consumer dies others will continue processing partitions that were owned by this consumer.


 Zookeeper在kafka中的应用 @20150606   简介 Kafka使用zookeeper作为其分布式协调框架,很好的将消息生产、消息存储、消息消费的过程结合在一起。同时借...
  • tianbianlan
  • tianbianlan
  • 2015年06月06日 12:07
  • 19028

zookeeper 和 kafka 集群搭建

Kafka初识 1、Kafka使用背景 在我们大量使用分布式数据库、分布式计算集群的时候,是否会遇到这样的一些问题: 我们想分析下用户行为(pageviews),以便我们设计出更好的广告位我...
  • my_bai
  • my_bai
  • 2017年03月30日 17:35
  • 8445


原文地址:https://cwiki.apache.org/confluence/display/KAFKA/FAQ How does Kafka depend on Zookeeper? ...
  • tanga842428
  • tanga842428
  • 2016年09月28日 17:23
  • 880

ZooKeeper kafka入门:简介、使用场景、设计原理、主要配置

问题导读: 1.zookeeper在kafka的作用是什么? 2.kafka中几乎不允许对消息进行“随机读写”的原因是什么? 3.kafka集群consumer和producer状态...
  • doctor_who2004
  • doctor_who2004
  • 2014年10月27日 21:42
  • 2894


  • wj903829182
  • wj903829182
  • 2017年09月05日 14:34
  • 304

apache kafka系列之在zookeeper中存储结构

1.topic注册信息 /brokers/topics/[topic] : Schema: { "fields" :     [ {"name": "version", "type": ...
  • lizhitao
  • lizhitao
  • 2014年04月15日 10:57
  • 24234


本地自己构建zookeeper和kafka环境 kafka生产者发送消息出现Failed to send messages after 3 tries错误解决方案...
  • fuck__you_
  • fuck__you_
  • 2016年04月09日 01:11
  • 1737


Kafka的集群配置一般有三种方法,即 (1)Single node – single broker集群; (2)Single node – multiple broker集群; (3)Multipl...
  • Hadas_Wang
  • Hadas_Wang
  • 2015年11月27日 22:32
  • 10983


  • nankiao
  • nankiao
  • 2017年11月16日 17:35
  • 97


本文主要查看kafka在zookeeper中的一些存储结构,便于更好的理解kafka的工作原理,其测试环境如下:kafka zookeeper 3.4.51 Broker node 注...
  • ouyang111222
  • ouyang111222
  • 2016年04月08日 13:29
  • 4654
您举报文章:zookeeper 与 kafka的协同工作