第1章 简介
在上篇文章介绍完Kafka获取元数据后,本篇文章介绍Kafka生产者端分区分配规则。生产者端分区分配规则决定了数据到底发往Broker端Topic的哪一个分区。
第2章 详细步骤
2.1 dosend方法中确定partition
根据前面获取到的元数据信息,确定数据发往哪个partition
org.apache.kafka.clients.producer.KafkaProducer#doSend
//TODO 根据元数据信息,选择消息应该发往的分区;知识点:生产者分区分配规则!
int partition = partition(record, serializedKey, serializedValue, cluster);
tp = new TopicPartition(record.topic(), partition);
org.apache.kafka.clients.producer.KafkaProducer#partition
private int partition(ProducerRecord<K, V> record, byte[] serializedKey, byte[] serializedValue, Cluster cluster) {
//redord中如果指定了partition,则直接返回;如果没有,则由分区器选择合适的partition
Integer partition = record.partition();
return partition != null ?
partition :
partitioner.partition(
record.topic(), record.key(), serializedKey, record.value(), serializedValue, cluster);
}
2.2 Partitioner确定数据发往的分区
Partitioner是一个接口,我们先看默认的实现
KeyPartitioner:按照key进行分区,按Key取模
DefaultPartitioner:默认分区规则,根据Key判断,是否执行StickyPartitionCache
UniformstackyPartitioner:无论是否指定Key,都直接执行StickyPartitionCache
RoundRobinPartitioner:轮询分区
MockPartitioner:模拟分区,(2.7.0版本源码中直接 return 0;)
2.3 DefaultPartitioner分区规则
我们这里重点阅读默认的分区规则,咱们继续往下看:
org.apache.kafka.clients.producer.internals.DefaultPartitioner#partition(java.lang.String, java.lang.Object, byte[], java.lang.Object, byte[], org.apache.kafka.common.Cluster, int)
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster,
int numPartitions) {
if (keyBytes == null) {
//TODO 如果没有指定key
return stickyPartitionCache.partition(topic, cluster);
}
// hash the keyBytes to choose a partition
//TODO 如果指定了key根据key哈希后取模,得到分区号
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
2.4 StickyPartitionCache
org.apache.kafka.clients.producer.internals.StickyPartitionCache
public class StickyPartitionCache {
//TODO topic -> partition
private final ConcurrentMap<String, Integer> indexCache;
public StickyPartitionCache() {
this.indexCache = new ConcurrentHashMap<>();
}
public int partition(String topic, Cluster cluster) {
// 如果cache中已经有,则直接返回
Integer part = indexCache.get(topic);
if (part == null) {
//TODO cache中没有
return nextPartition(topic, cluster, -1);
}
return part;
}
public int nextPartition(String topic, Cluster cluster, int prevPartition) {
//从元数据中获取该topic中所有的partition
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
Integer oldPart = indexCache.get(topic);
Integer newPart = oldPart;
// Check that the current sticky partition for the topic is either not set or that the partition that
// triggered the new batch matches the sticky partition that needs to be changed.
if (oldPart == null || oldPart == prevPartition) {
//从元数据中获取该topic中可用的partition
List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() < 1) {
//TODO 没有可用的partition,从所有partition中随机选择一个(与所有partition取模)
Integer random = Utils.toPositive(ThreadLocalRandom.current().nextInt());
newPart = random % partitions.size();
} else if (availablePartitions.size() == 1) {
//TODO 只有一个可用的partition
newPart = availablePartitions.get(0).partition();
} else {
//TODO 有多个可用的partition,随机获取一个与缓存中不相同的partition
while (newPart == null || newPart.equals(oldPart)) {
Integer random = Utils.toPositive(ThreadLocalRandom.current().nextInt());
newPart = availablePartitions.get(random % availablePartitions.size()).partition();
}
}
// Only change the sticky partition if it is null or prevPartition matches the current sticky partition.
if (oldPart == null) {
// 加入cache(第一次)
indexCache.putIfAbsent(topic, newPart);
} else {
// 更新cache
indexCache.replace(topic, prevPartition, newPart);
}
//TODO 返回cache中该topic的partition
return indexCache.get(topic);
}
//TODO cache中已经存在,并且与之前的partition不彤,则返回cache中该topic的partition
return indexCache.get(topic);
}
}
总结,kafka的分区分配规则:
- 指定partition直接发送到该partition中;
- 未指定partition,指定了key,则按key哈希后与元数据中所有partition数量取模;
- 未指定partition,未指定key,则随机发送。
今天是2021/6/20父亲节,祝愿天下所有父亲节日快乐,身体健康!