简单搞定Kafka: 工作流程分析_kafka工作简单流程-CSDN博客

本文链接：https://blog.csdn.net/github_36444580/article/details/116162256

Kafka生产过程分析

写入方式

producer采用推（push）模式将消息发布到broker，每条消息都被追加（append）到分区（patition）中，属于顺序写磁盘（顺序写磁盘效率比随机写内存要高，保障kafka吞吐率）。

分区（Partition）

消息发送时都被发送到一个topic，其本质就是一个目录，而topic是由一些Partition

Logs(分区日志)组成，其组织结构如下图所示：

我们可以看到，每个Partition中的消息都是有序的，生产的消息被不断追加到Partition log上，其中的每一个消息都被赋予了一个唯一的offset值。

1）分区的原因

（1）方便在集群中扩展，每个Partition可以通过调整以适应它所在的机器，而一个topic又可以有多个Partition组成，因此整个集群就可以适应任意大小的数据了；

（2）可以提高并发，因为可以以Partition为单位读写了。

2）分区的原则

（1）指定了patition，则直接使用；

（2）未指定patition但指定key，通过对key的value进行hash出一个patition

（3）patition和key都未指定，使用轮询选出一个patition。

DefaultPartitioner类 public int partition(
    String topic,
    Object key,
    byte [] keyBytes,
    Object value,
    byte [] valueBytes,
    Cluster cluster
) { List < PartitionInfo > partitions = cluster.partitionsForTopic(topic);

int numPartitions = partitions.size();

if (keyBytes == null) { int nextValue = nextValue(topic);

List < PartitionInfo > availablePartitions = cluster.availablePartitionsForTopic(topic);

if (availablePartitions.size() > 0) { int part = Utils.toPositive(nextValue) % availablePartitions.size();

return availablePartitions.get(part).partition();

}
else { / / no partitions are available,
give a non - available partition return Utils.toPositive(nextValue) % numPartitions;

} }
else { / / hash the keyBytes to choose a partition return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;

} }

副本（Replication）

同一个partition可能会有多个replication（对应 server.properties 配置中的 default.replication.factor=N）。没有replication的情况下，一旦broker 宕机，其上所有 patition 的数据都不可被消费，同时producer也不能再将数据存于其上的patition。引入replication之后，同一个partition可能会有多个replication，而这时需要在这些replication之间选出一个leader，producer和consumer只与这个leader交互，其它replication作为follower从leader 中复制数据。

写入流程

producer写入消息流程如下：

1）producer先从zookeeper的 "/brokers/.../state"节点找到该partition的leader

2）producer将消息发送给该leader

3）leader将消息写入本地log

4）followers从leader pull消息，写入本地log后向leader发送ACK

5）leader收到所有ISR中的replication的ACK后，增加HW（high watermark，最后commit 的offset）并向producer发送ACK

broker 保存消息

存储方式

物理上把topic分成一个或多个patition（对应 server.properties 中的num.partitions=3配置），每个patition物理上对应一个文件夹（该文件夹存储该patition的所有消息和索引文件），如下：

[atg@hadoop102 logs]$ ll
drwxrwxr-x. 2 atg atg  4096 8月   6 14:37 first-0
drwxrwxr-x. 2 atg atg  4096 8月   6 14:35 first-1
drwxrwxr-x. 2 atg atg  4096 8月   6 14:37 first-2
[atg@hadoop102 logs]$ cd first-0
[atg@hadoop102 first-0]$ ll
-rw-rw-r--. 1 atg atg 10485760 8月   6 14:33 00000000000000000000.index
-rw-rw-r--. 1 atg atg      219 8月   6 15:07 00000000000000000000.log
-rw-rw-r--. 1 atg atg 10485756 8月   6 14:33 00000000000000000000.timeindex
-rw-rw-r--. 1 atg atg        8 8月   6 14:37 leader-epoch-checkpoint