本文是对于kafka官方文档的部分翻译
---------------------------------我是分割线----------------------------------------
4.Distribution(分布式特性)
The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. Each partition is replicated across a configurable number of servers for fault tolerance.
(日志的分区数据分布在kafka集群的若干个节点上,每个分区分片的数量是可配置的,这样可以提高容错性。)
Each partition has one server which acts as the “leader” and zero or more servers which act as “followers”. The leader handles all read and write requests for the partition while the followers passively replicate the leader. If the leader fails, one of the followers will automatically become the new leader. Each server acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster.
(对于每一个分区的若干个分片,一定有一个是leader,其余的零个或者多个是follower。leader处理所有的读写请求,follower被动地与leader同步数据,如果leader所在的机器宕机了,其余的followers中的一个follower会自动地成为leader,每个服务器即充当某些分区的领导者又作为某些分区的追随者,因此负载在集群中得到了比较好的平衡。)
5.Produders
Producers publish data to the topics of their choice. The producer is responsible for choosing which record to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the record). More on the use of partitioning in a second!
(生产者将数据推到它们选择的主题,生产者负责选择生产的该条消息被分配到该主题的哪个分区,可以选择轮训策略,也可以自己写一个分区函数。)
6.Consumers(消费者)
Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
(消费者使用消费者组来标记自己,每一个发布到主题的消息被投递到订阅该主题的消费者组中的一个消费者实例上,消费者可以是一个进程,也可以在其它机器上。)
If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.
If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.
(kafka集群会将消息有效地负载到订阅该主题的消费者组的各个消费者上。如果有多个消费者组,则kafka集群会将该消息组播到各个消费者组,如下图:)
A two server Kafka cluster hosting four partitions (P0-P3) with two consumer groups. Consumer group A has two consumer instances and group B has four.
(如上图:两个kafka节点组成的集群有p0-p4四个分区和两个消费者组,组A有两个消费者实例,组B有四个。)
More commonly, however, we have found that topics have a small number of consumer groups, one for each “logical subscriber”. Each group is composed of many consumer instances for scalability and fault tolerance. This is nothing more than publish-subscribe semantics where the subscriber is a cluster of consumers instead of a single process.
(然而,更常见的是,一个主题的一个消费者组只有很少或者一个消费者实例,当需要的时候,可以很方便的再扩展更多的消费者从而达到容错性的目的。)
The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a “fair share” of partitions at any point in time. This process of maintaining membership in the group is handled by the Kafka protocol dynamically. If new instances join the group they will take over some partitions from other members of the group; if an instance dies, its partitions will be distributed to the remaining instances.
(kafka实现消费的方式是按照消费者组中的消费者人头来公平的划分日志中的分区,在某一个时间点,某一个分区只属于消费者组中的某一个消费者,整个过程由kafka的协议动态的处理。如果新实例加入该组,他们将从该组的其他成员接管一些分区; 如果实例死亡,其分区将分发给其余实例。)
Kafka only provides a total order over records within a partition, not between different partitions in a topic. Per-partition ordering combined with the ability to partition data by key is sufficient for most applications. However, if you require a total order over records this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group.
(kafka仅仅提供一个分区中记录的顺序,而不是不同分区中记录的顺序。对于大部分应用来说,按照记录的键来决定记录落到哪一个分区和单个分区中的数据是有顺序的,这两点就足够使用。然而,如果你希望该分区中的所有数据都是有序的,一个解决办法就是只要一个分区,这样做就意味着一个消费者组中某一个时间点只能有一个消费者。ps:这里的最后一句话需要抽时间去实验,比如可以让某一个topic有两个分区,但是让其消费者组中有三个消费者,正常情况下,观察是否会有一个消费者消费不到消息。)
7.Guarantees(kafka可以给我们保证的事情)
At a high-level Kafka gives the following guarantees:
- Messages sent by a producer to a particular topic partition will be appended in the order they are sent. That is, if a record M1 is sent by the same producer as a record M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log.
(记录会按照生产者生产的顺序追加到一个特定的分区中) - A consumer instance sees records in the order they are stored in the log.
(一个消费者实例消费到的消息的顺序是与日志文件中消息的顺序是一致的) - For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any records committed to the log.
(对于一个有N个分片的主题,我们最多容忍N-1个节点宕机,而不会丢失任何写到日志中的数据。)