当Flink消费多个Topic的时候,由于分配算法是针对单个Topic的分区数,因此在对多个Topic进行分配的时候会造成重复分配,就会有几个subtask处于空闲当中,造成subtask数据倾斜
相关的类
FlinkKafkaConsumerBase
FlinkKafkaConsumer
AbstractPartitionDiscoverer
KafkaPartitionDiscoverer
KafkaTopicPartitionAssigner
源码位置
FlinkKafkaConsumerBase的open方法
subscribedPartitionsToStartOffsets = new HashMap<>();
final List<KafkaTopicPartition> allPartitions = partitionDiscoverer.discoverPartitions();
discoverPartitions
List<KafkaTopicPartition> newDiscoveredPartitions;
newDiscoveredPartitions =
getAllPartitionsForTopics(topicsDescriptor.getFixedTopics());
getAllPartitionsForTopics
for (String topic : topics) {
final List<PartitionInfo> kafkaPartitions = kafkaConsumer.partitionsFor(topic);
System.out.println("getAllPartitionsForTopics=" + kafkaPartitions);
if (kafkaPartitions == null) {
throw new RuntimeException(
String.format(
"Could not fetch partitions for %s. Make sure that the topic exists.",
topic));
}
for (PartitionInfo partitionInfo : kafkaPartitions) {
partitions.add(
new KafkaTopicPartition(
partitionInfo.topic(), partitionInfo.partition()));
}
}
还在discoverPartitions中
if (newDiscoveredPartitions == null || newDiscoveredPartitions.isEmpty()) {
throw new RuntimeException(
"Unable to retrieve any partitions with KafkaTopicsDescriptor: "
+ topicsDescriptor);
} else {
Iterator<KafkaTopicPartition> iter = newDiscoveredPartitions.iterator();
KafkaTopicPartition nextPartition;
while (iter.hasNext()) {
nextPartition = iter.next();
if (!setAndCheckDiscoveredPartition(nextPartition)) {
iter.remove();
}
}
}
setAndCheckDiscoveredPartition
public boolean setAndCheckDiscoveredPartition(KafkaTopicPartition partition) {
if (isUndiscoveredPartition(partition)) {
discoveredPartitions.add(partition);
return KafkaTopicPartitionAssigner.assign(partition, numParallelSubtasks)
== indexOfThisSubtask;
}
return false;
}
分配的算法在KafkaTopicPartitionAssigner
public static int assign(KafkaTopicPartition partition, int numParallelSubtasks) {
int startIndex =
((partition.getTopic().hashCode() * 31) & 0x7FFFFFFF) % numParallelSubtasks;
// here, the assumption is that the id of Kafka partitions are always ascending
// starting from 0, and therefore can be used directly as the offset clockwise from the
// start index
return (startIndex + partition.getPartition()) % numParallelSubtasks;
}