深入理解Kafka（三）：生产者分区策略源码剖析

最新推荐文章于 2024-01-29 15:36:37 发布

君子一笑

最新推荐文章于 2024-01-29 15:36:37 发布

阅读量345

点赞数 1

分类专栏： MQ消息中间件

本文链接：https://blog.csdn.net/linux_one_piece/article/details/99732179

版权

MQ消息中间件专栏收录该内容

8 篇文章 1 订阅

订阅专栏

前言

我们都知道Kafka中的topic和分区（partition）的概念，一个topic可以有一个或多个分区（partition），消息数据都是存储在分区（partition）中的，生产者（Producer）发送消息到topic，消费者（Consumer）从指定的topic中消费消息，但是生产者的消息是投递到topic下的哪个分区（partition）的？
本文通过解读源码分析生产者（Producer）投递消息的分区策略

分区数（partition）设置

分区数有两种设置方式:

通过server.properties配置文件：

num.partitions=1 # 默认为1

通过kafka自带命令，这个方法修改的分区数要大于原来的分区数，否则不能修改：

bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic-default --partitions 5

KafkaTemplate的send方法

KafkaTemplate类的send方法如下：

	public ListenableFuture<SendResult<K, V>> send(String topic, @Nullable V data) {
        ProducerRecord<K, V> producerRecord = new ProducerRecord(topic, data);
        return this.doSend(producerRecord);
    }

    public ListenableFuture<SendResult<K, V>> send(String topic, K key, @Nullable V data) {
        ProducerRecord<K, V> producerRecord = new ProducerRecord(topic, key, data);
        return this.doSend(producerRecord);
    }

    public ListenableFuture<SendResult<K, V>> send(String topic, Integer partition, K key, @Nullable V data) {
        ProducerRecord<K, V> producerRecord = new ProducerRecord(topic, partition, key, data);
        return this.doSend(producerRecord);
    }

    public ListenableFuture<SendResult<K, V>> send(String topic, Integer partition, Long timestamp, K key, @Nullable V data) {
        ProducerRecord<K, V> producerRecord = new ProducerRecord(topic, partition, timestamp, key, data);
        return this.doSend(producerRecord);
    }

由此可见，生产者可以指定分区发送消息，继续深入代码来到KafkaProducer类的doSend()方法

	private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
        ... //省略部分代码
		int partition = this.partition(record, serializedKey, serializedValue, cluster);
		... //省略部分代码
    }

如果指定了分区，则直接发送消息到该分区

	private int partition(ProducerRecord<K, V> record, byte[] serializedKey, byte[] serializedValue, Cluster cluster) {
        Integer partition = record.partition();
        return partition != null ? partition : this.partitioner.partition(record.topic(), record.key(), serializedKey, record.value(), serializedValue, cluster);
    }

如果指定的分区不存在则会抛出TimeoutException

	try {
        this.metadata.awaitUpdate(version, remainingWaitMs);
     } catch (TimeoutException var15) {
         throw new TimeoutException(String.format("Topic %s not present in metadata after %d ms.", topic, maxWaitMs));
     }

代码this.partitioner.partition()，其中this.partitioner默认实现是DefaultPartitioner，查看DefaultPartitioner的实现方法partition()：

	private final ConcurrentMap<String, AtomicInteger> topicCounterMap = new ConcurrentHashMap(); // 自增计数器

	public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
        if (keyBytes == null) { // key不存在
            int nextValue = this.nextValue(topic); // 每次发送消息，计数器自增1
            List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() > 0) {
                int part = Utils.toPositive(nextValue) % availablePartitions.size(); // 自增后的数字和分区数取模
                return ((PartitionInfo)availablePartitions.get(part)).partition();
            } else {
                return Utils.toPositive(nextValue) % numPartitions;  // 自增后的数字和分区数取模
            }
        } else {
            return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions; // murmur2哈希算法,结果和分区数取模
        }
    }

    /**
     * 计数器自增
     */
	private int nextValue(String topic) {
        AtomicInteger counter = (AtomicInteger)this.topicCounterMap.get(topic);
        if (null == counter) {
            counter = new AtomicInteger(ThreadLocalRandom.current().nextInt());
            AtomicInteger currentCounter = (AtomicInteger)this.topicCounterMap.putIfAbsent(topic, counter);
            if (currentCounter != null) {
                counter = currentCounter;
            }
        }
        return counter.getAndIncrement();
    }

因此，默认的分区策略可总结为如下：

如果在发消息的时候指定了分区，则消息投递到指定的分区
如果不指定分区，但是指定了key，则基于key的murmur2哈希值与分区数取模来选择分区
如果既不指定分区，且不指定key，则每次发送消息自增1后的数字与分区数取模来选择分区

君子一笑

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
深入理解Kafka（三）：生产者分区策略源码剖析

前言我们都知道Kafka中的topic和分区（partition）的概念，一个topic可以有一个或多个分区（partition），消息数据都是存储在分区（partition）中的，生产者（Producer）发送消息到topic，消费者（Consumer）从指定的topic中消费消息，但是生产者的消息是投递到topic下的哪个分区（partition）的？本文通过解读源码分析生产者（Produ...
复制链接

扫一扫

专栏目录