Flink Kafka Patition 分配问题

最新推荐文章于 2024-05-14 11:50:27 发布

板凳坐着晒太阳

最新推荐文章于 2024-05-14 11:50:27 发布

阅读量336

点赞数

文章标签：编辑器 flink

本文链接：https://blog.csdn.net/qq_44849679/article/details/129116001

版权

当Flink消费多个Topic的时候，由于分配算法是针对单个Topic的分区数，因此在对多个Topic进行分配的时候会造成重复分配，就会有几个subtask处于空闲当中，造成subtask数据倾斜

源码位置

FlinkKafkaConsumerBase的open方法

        subscribedPartitionsToStartOffsets = new HashMap<>();
        final List<KafkaTopicPartition> allPartitions = partitionDiscoverer.discoverPartitions();

discoverPartitions

                List<KafkaTopicPartition> newDiscoveredPartitions;
newDiscoveredPartitions =
                            getAllPartitionsForTopics(topicsDescriptor.getFixedTopics());

getAllPartitionsForTopics

    for (String topic : topics) {
                final List<PartitionInfo> kafkaPartitions = kafkaConsumer.partitionsFor(topic);
                System.out.println("getAllPartitionsForTopics=" + kafkaPartitions);

                if (kafkaPartitions == null) {
                    throw new RuntimeException(
                            String.format(
                                    "Could not fetch partitions for %s. Make sure that the topic exists.",
                                    topic));
                }

                for (PartitionInfo partitionInfo : kafkaPartitions) {
                    partitions.add(
                            new KafkaTopicPartition(
                                    partitionInfo.topic(), partitionInfo.partition()));
                }
            }

还在discoverPartitions中

 if (newDiscoveredPartitions == null || newDiscoveredPartitions.isEmpty()) {
                    throw new RuntimeException(
                            "Unable to retrieve any partitions with KafkaTopicsDescriptor: "
                                    + topicsDescriptor);
                } else {
                    Iterator<KafkaTopicPartition> iter = newDiscoveredPartitions.iterator();
                    KafkaTopicPartition nextPartition;
                    while (iter.hasNext()) {
                        nextPartition = iter.next();
                        if (!setAndCheckDiscoveredPartition(nextPartition)) {
                            iter.remove();
                        }
                    }
                }

setAndCheckDiscoveredPartition

    public boolean setAndCheckDiscoveredPartition(KafkaTopicPartition partition) {
        if (isUndiscoveredPartition(partition)) {
            discoveredPartitions.add(partition);

            return KafkaTopicPartitionAssigner.assign(partition, numParallelSubtasks)
                    == indexOfThisSubtask;
        }

        return false;
    }

分配的算法在KafkaTopicPartitionAssigner

    public static int assign(KafkaTopicPartition partition, int numParallelSubtasks) {
        int startIndex =
                ((partition.getTopic().hashCode() * 31) & 0x7FFFFFFF) % numParallelSubtasks;
                
        // here, the assumption is that the id of Kafka partitions are always ascending
        // starting from 0, and therefore can be used directly as the offset clockwise from the
        // start index
        return (startIndex + partition.getPartition()) % numParallelSubtasks;
    }

板凳坐着晒太阳

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Flink Kafka Patition 分配问题

当Flink消费多个Topic的时候，由于分配算法是针对单个Topic的分区数，因此在对多个Topic进行分配的时候会造成重复分配，就会有几个subtask处于空闲当中，造成subtask数据倾斜。
复制链接

扫一扫