Kafka源码篇 No.3-Producer分区分配规则

最新推荐文章于 2024-04-20 15:46:04 发布

pezynd

最新推荐文章于 2024-04-20 15:46:04 发布

阅读量282

点赞数

分类专栏：从0到1的大数据探索 Kafka源码走读文章标签： kafka java 大数据

本文链接：https://blog.csdn.net/dzh284616172/article/details/118067264

版权

从0到1的大数据探索同时被 2 个专栏收录

36 篇文章 14 订阅

订阅专栏

Kafka源码走读

5 篇文章 1 订阅

订阅专栏

第1章简介

在上篇文章介绍完Kafka获取元数据后，本篇文章介绍Kafka生产者端分区分配规则。生产者端分区分配规则决定了数据到底发往Broker端Topic的哪一个分区。

第2章详细步骤

2.1 dosend方法中确定partition

根据前面获取到的元数据信息，确定数据发往哪个partition

org.apache.kafka.clients.producer.KafkaProducer#doSend

//TODO 根据元数据信息，选择消息应该发往的分区；知识点：生产者分区分配规则！
int partition = partition(record, serializedKey, serializedValue, cluster);
tp = new TopicPartition(record.topic(), partition);

org.apache.kafka.clients.producer.KafkaProducer#partition

private int partition(ProducerRecord<K, V> record, byte[] serializedKey, byte[] serializedValue, Cluster cluster) {
	//redord中如果指定了partition，则直接返回；如果没有，则由分区器选择合适的partition
	Integer partition = record.partition();
	return partition != null ?
			partition :
			partitioner.partition(
					record.topic(), record.key(), serializedKey, record.value(), serializedValue, cluster);
}

2.2 Partitioner确定数据发往的分区

Partitioner是一个接口，我们先看默认的实现

KeyPartitioner：按照key进行分区，按Key取模

DefaultPartitioner：默认分区规则，根据Key判断，是否执行StickyPartitionCache

UniformstackyPartitioner：无论是否指定Key，都直接执行StickyPartitionCache

RoundRobinPartitioner：轮询分区

MockPartitioner：模拟分区，（2.7.0版本源码中直接 return 0;）

2.3 DefaultPartitioner分区规则

我们这里重点阅读默认的分区规则，咱们继续往下看：

org.apache.kafka.clients.producer.internals.DefaultPartitioner#partition(java.lang.String, java.lang.Object, byte[], java.lang.Object, byte[], org.apache.kafka.common.Cluster, int)

public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster,
					 int numPartitions) {
	if (keyBytes == null) {
		//TODO 如果没有指定key
		return stickyPartitionCache.partition(topic, cluster);
	}
	// hash the keyBytes to choose a partition
	//TODO 如果指定了key根据key哈希后取模，得到分区号
	return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}

2.4 StickyPartitionCache

org.apache.kafka.clients.producer.internals.StickyPartitionCache

public class StickyPartitionCache {
    //TODO topic -> partition
    private final ConcurrentMap<String, Integer> indexCache;
    public StickyPartitionCache() {
        this.indexCache = new ConcurrentHashMap<>();
    }

    public int partition(String topic, Cluster cluster) {
        // 如果cache中已经有，则直接返回
        Integer part = indexCache.get(topic);
        if (part == null) {
            //TODO cache中没有
            return nextPartition(topic, cluster, -1);
        }
        return part;
    }

    public int nextPartition(String topic, Cluster cluster, int prevPartition) {
        //从元数据中获取该topic中所有的partition
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        Integer oldPart = indexCache.get(topic);
        Integer newPart = oldPart;
        // Check that the current sticky partition for the topic is either not set or that the partition that 
        // triggered the new batch matches the sticky partition that needs to be changed.
        if (oldPart == null || oldPart == prevPartition) {
            //从元数据中获取该topic中可用的partition
            List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() < 1) {
                //TODO 没有可用的partition，从所有partition中随机选择一个（与所有partition取模）
                Integer random = Utils.toPositive(ThreadLocalRandom.current().nextInt());
                newPart = random % partitions.size();
            } else if (availablePartitions.size() == 1) {
                //TODO 只有一个可用的partition
                newPart = availablePartitions.get(0).partition();
            } else {
                //TODO 有多个可用的partition，随机获取一个与缓存中不相同的partition
                while (newPart == null || newPart.equals(oldPart)) {
                    Integer random = Utils.toPositive(ThreadLocalRandom.current().nextInt());
                    newPart = availablePartitions.get(random % availablePartitions.size()).partition();
                }
            }
            // Only change the sticky partition if it is null or prevPartition matches the current sticky partition.
            if (oldPart == null) {
                // 加入cache（第一次）
                indexCache.putIfAbsent(topic, newPart);
            } else {
                // 更新cache
                indexCache.replace(topic, prevPartition, newPart);
            }
            //TODO 返回cache中该topic的partition
            return indexCache.get(topic);
        }
        //TODO cache中已经存在，并且与之前的partition不彤，则返回cache中该topic的partition
        return indexCache.get(topic);
    }

}

总结，kafka的分区分配规则：