FlinkConsumer 分区和subtask对应关系以及FlinkKafkaConsumerBase解析
FlinkKafkaConsumerBase类
FlinkKafkaConsumerBase 是一个核心类,其中的,FlinkKafkaConsumer08,FlinkKafkaConsumer09,FlinkKafkaConsumer10等都继承了这个类,首先我们看下这个类的构造方法:
看一下discoveryIntervalMillis ,这个是partition的自动发现时间,默认是public static final long PARTITION_DISCOVERY_DISABLED = Long.MIN_VALUE;,也就是永远不自动发现,这样如果对应的kafka Topic增加分区,那么需要重启程序,才能被发现,
public FlinkKafkaConsumerBase(
List<String> topics,
Pattern topicPattern,
KafkaDeserializationSchema<T> deserializer,
long discoveryIntervalMillis,
boolean useMetrics) {
this.topicsDescriptor = new KafkaTopicsDescriptor(topics, topicPattern);
this.deserializer = checkNotNull(deserializer, "valueDeserializer");
checkArgument(
discoveryIntervalMillis == PARTITION_DISCOVERY_DISABLED || discoveryIntervalMillis >= 0,
"Cannot define a negative value for the topic / partition discovery interval.");
this.discoveryIntervalMillis = discoveryIntervalMillis;
this.useMetrics = useMetrics;
}
紧接着,看一下FlinkKafkaConsumerBase 的open方法,这里面是对所有partition 的初始化,以及subtazsk和partition一一对应重要代码,也是FlinkConsumer 是如何保证一个 partition 对应一个 thread 的关键所在
public void open(Configuration configuration) throws Exception {
// determine the offset commit mode
this.offsetCommitMode = OffsetCommitModes.fromConfiguration(
getIsAutoCommitEnabled(),
enableCommitOnCheckpoints,
((StreamingRuntimeContext) getRuntimeContext()).isCheckpointingEnabled());
// create the partition discoverer
this.partitionDiscoverer = createPartitionDiscoverer(
topicsDescriptor,
getRuntimeContext().getIndexOfThisSubtask(),
getRuntimeContext().getNumberOfParallelSubtasks());
this.partitionDiscoverer.open();
subscribedPartitionsToStartOffsets = new HashMap<>();
enableCommitOnCheckpoints 默认开启checkpoints的时候,会默认使用offect提交模式 On_CHECKPOINTS,因为目前flink提交kafka的方式有三种,
1、开启 checkpoint : 在 checkpoint 完成后提交
2、开启 checkpoint,禁用 checkpoint 提交: 不提交消费组 offset
3、不开启 checkpoint: 依赖kafka client 的自动提交
后续单独开一篇文章,重点介绍
紧接着,我们会看到一个discoverPartitions方法,这是PartitionDiscoverer类里面的,这个是一个重点方法,也是为什么Flink能一个分区对应一个
final List<KafkaTopicPartition> allPartitions = partitionDiscoverer.discoverPartitions();
if (restoredState != null) {
for (KafkaTopicPartition partition : allPartitions) {
if (!restoredState.containsKey(partition)) {
restoredState.put(partition, KafkaTopicPartitionStateSentinel.EARLIEST_OFFSET);
}
}
for (Map.Entry<KafkaTopicPartition, Long> restoredStateEntry : restoredState.entrySet()) {
// seed the partition discoverer with the union state while filtering out
// restored partitions that should not be subscribed by this subtask
if (KafkaTopicPartitionAssigner.assign(
//getNumberOfParallelSubtasks 所有并行度个数,
//getIndexOfThisSubtask 并行度ID
restoredStateEntry.getKey(), getRuntimeContext().getNumberOfParallelSubtasks())
== getRuntimeContext().getIndexOfThisSubtask()){
subscribedPartitionsToStartOffsets.put(restoredStateEntry.getKey(), restoredStateEntry.getValue());
}
}
if (filterRestoredPartitionsWithCurrentTopicsDescriptor) {
subscribedPartitionsToStartOffsets.entrySet().removeIf(entry -> {
if (!topicsDescriptor.isMatchingTopic(entry.getKey().getTopic())) {
LOG.warn(
"{} is removed from subscribed partitions since it is no longer associated with topics descriptor of current execution.",
entry.getKey());
return true;
}
return false;
});
}
LOG.info("Consumer subtask {} will start reading {} partitions with offsets in restored state: {}",
getRuntimeContext().getIndexOfThisSubtask(), subscribedPartitionsToStartOffsets.size(), subscribedPartitionsToStartOffsets);
我们进去看下discoverPartitions()方法
public List<KafkaTopicPartition> discoverPartitions() throws WakeupException, ClosedException {
if (!closed && !wakeup) {
try {
List<KafkaTopicPartition> newDiscoveredPartitions;
//这里只是做了判断,判断传入的Topic是否是一个topic名称,还是正则匹配,平时只会传入一个具体的topic名称
// (1) get all possible partitions, based on whether we are subscribed to fixed topics or a topic pattern
if (topicsDescriptor.isFixedTopics()) {
//获取所有的Kafkapartition
newDiscoveredPartitions = getAllPartitionsForTopics(topicsDescriptor.getFixedTopics());
} else {
List<String> matchedTopics = getAllTopics();
// retain topics that match the pattern
Iterator<String> iter = matchedTopics.iterator();
while (iter.hasNext()) {
if (!topicsDescriptor.isMatchingTopic(iter.next())) {
iter.remove();
}
}
if (matchedTopics.size() != 0) {
// get partitions only for matched topics
newDiscoveredPartitions = getAllPartitionsForTopics(matchedTopics);
} else {
newDiscoveredPartitions = null;
}
}
//newDiscoveredPartitions 获取全部的Kafka分区,但是目前还不是和subtask一一对应的关系,
// 如果为Null,或者为o,那么这个topic是没有分区的,也就会报错"Unable to retrieve any partitions
// (2) eliminate partition that are old partitions or should not be subscribed by this subtask
if (newDiscoveredPartitions == null || newDiscoveredPartitions.isEmpty()) {
//如果kafka分区为空,那么初始化的时候就会报错
throw new RuntimeException("Unable to retrieve any partitions with KafkaTopicsDescriptor: " + topicsDescriptor);
} else {
//这里要注意,下面的代码主要逻辑视为了让subtask和topic对应起来,
//
// 具体我们点进去看下setAndCheckDiscoveredPartition
Iterator<KafkaTopicPartition> iter = newDiscoveredPartitions.iterator();
KafkaTopicPartition nextPartition;
while (iter.hasNext()) {
nextPartition = iter.next();
if (!setAndCheckDiscoveredPartition(nextPartition)) {
iter.remove();
}
}
}
return newDiscoveredPartitions;
} catch (WakeupException e) {
// the actual topic / partition metadata fetching methods
// may be woken up midway; reset the wakeup flag and rethrow
wakeup = false;
throw e;
}
} else if (!closed && wakeup) {
// may have been woken up before the method call
wakeup = false;
throw new WakeupException();
} else {
throw new ClosedException();
}
}
这里我都进行了注释,可以仔细阅读一下,这个方法最终会返回一个只数据这一个subtask的分区List,其中最为核心的算法就封装在setAndCheckDiscoveredPartition(),我们点击去看下
public boolean setAndCheckDiscoveredPartition(KafkaTopicPartition partition) {
//如果是新分区,会增加到这个set中,
if (isUndiscoveredPartition(partition)) {
discoveredPartitions.add(partition);
//kafkaPartition与indexOfThisSubTask --对应
return KafkaTopicPartitionAssigner.assign(partition, numParallelSubtasks) == indexOfThisSubtask;
}
return false;
}
这里就牵涉到具体的计算逻辑了,为什么Flink能保证一个partition对应一个Thread
具体原理:
int startIndex = ((partition.getTopic().hashCode() * 31) & 0x7FFFFFFF) % numParallelSubtasks;
(startIndex + partition.getPartition()) % numParallelSubtasks
numParallelSubtasks:subtask的并行数,也就是flink设置的 并行度
如L partition 个数为 6;并行度为 3
那么会刚好平均分配到一个subtask中
但是要主要,如果并行度设置过大,大于了分区数,那么就会产生,有的线程是空的,导致资源浪费,
public class KafkaTopicPartitionAssigner {
/**
* Returns the index of the target subtask that a specific Kafka partition should be
* assigned to.
*
* <p>The resulting distribution of partitions of a single topic has the following contract:
* <ul>
* <li>1. Uniformly distributed across subtasks</li>
* <li>2. Partitions are round-robin distributed (strictly clockwise w.r.t. ascending
* subtask indices) by using the partition id as the offset from a starting index
* (i.e., the index of the subtask which partition 0 of the topic will be assigned to,
* determined using the topic name).</li>
* </ul>
*
* <p>The above contract is crucial and cannot be broken. Consumer subtasks rely on this
* contract to locally filter out partitions that it should not subscribe to, guaranteeing
* that all partitions of a single topic will always be assigned to some subtask in a
* uniformly distributed manner.
*
* @param partition the Kafka partition
* @param numParallelSubtasks total number of parallel subtasks
*
* @return index of the target subtask that the Kafka partition should be assigned to.
*/
public static int assign(KafkaTopicPartition partition, int numParallelSubtasks) {
int startIndex = ((partition.getTopic().hashCode() * 31) & 0x7FFFFFFF) % numParallelSubtasks;
// here, the assumption is that the id of Kafka partitions are always ascending
// starting from 0, and therefore can be used directly as the offset clockwise from the start index
return (startIndex + partition.getPartition()) % numParallelSubtasks;
}
}
通过以上操作:最终返回的allPartitions 是属于这个并行线程的全部partition,
接下来就分两部分,一部分是不从checkpoint中恢复,一种是从checkpoint中恢复
Flink如何生成消费Kakfa分区的任务
第一步已经生成好一个List allPartitions ,它里面包含了这个subtask对应的分区信息
这时候,返回到最初的位置我们可以看到有if (restoredState != null)判断,restoredState是flink从中间状态恢复的信息,我们先讨论没有ckeckpoint的情况,
// use the partition discoverer to fetch the initial seed partitions,
// and set their initial offsets depending on the startup mode.
// for SPECIFIC_OFFSETS and TIMESTAMP modes, we set the specific offsets now;
// for other modes (EARLIEST, LATEST, and GROUP_OFFSETS), the offset is lazily determined
// when the partition is actually read.
switch (startupMode) {
case SPECIFIC_OFFSETS:
if (specificStartupOffsets == null) {
throw new IllegalStateException(
"Startup mode for the consumer set to " + StartupMode.SPECIFIC_OFFSETS +
", but no specific offsets were specified.");
}
for (KafkaTopicPartition seedPartition : allPartitions) {
Long specificOffset = specificStartupOffsets.get(seedPartition);
if (specificOffset != null) {
// since the specified offsets represent the next record to read, we subtract
// it by one so that the initial state of the consumer will be correct
subscribedPartitionsToStartOffsets.put(seedPartition, specificOffset - 1);
} else {
// default to group offset behaviour if the user-provided specific offsets
// do not contain a value for this partition
subscribedPartitionsToStartOffsets.put(seedPartition, KafkaTopicPartitionStateSentinel.GROUP_OFFSET);
}
}
break;
case TIMESTAMP:
if (startupOffsetsTimestamp == null) {
throw new IllegalStateException(
"Startup mode for the consumer set to " + StartupMode.TIMESTAMP +
", but no startup timestamp was specified.");
}
for (Map.Entry<KafkaTopicPartition, Long> partitionToOffset
: fetchOffsetsWithTimestamp(allPartitions, startupOffsetsTimestamp).entrySet()) {
subscribedPartitionsToStartOffsets.put(
partitionToOffset.getKey(),
(partitionToOffset.getValue() == null)
// if an offset cannot be retrieved for a partition with the given timestamp,
// we default to using the latest offset for the partition
? KafkaTopicPartitionStateSentinel.LATEST_OFFSET
// since the specified offsets represent the next record to read, we subtract
// it by one so that the initial state of the consumer will be correct
: partitionToOffset.getValue() - 1);
}
break;
default:
for (KafkaTopicPartition seedPartition : allPartitions) {
subscribedPartitionsToStartOffsets.put(seedPartition, startupMode.getStateSentinel());
}
接下来是有CheckPoint的情况,基本类似,中间会有一些状态的校验工作:
if (restoredState != null) {
for (KafkaTopicPartition partition : allPartitions) {
//判断是否是新增的,如果是添加到restoredState中,并从最新开始消费
if (!restoredState.containsKey(partition)) {
restoredState.put(partition, KafkaTopicPartitionStateSentinel.EARLIEST_OFFSET);
}
}
for (Map.Entry<KafkaTopicPartition, Long> restoredStateEntry : restoredState.entrySet()) {
// seed the partition discoverer with the union state while filtering out
// restored partitions that should not be subscribed by this subtask
//获取状态信息中的分区信息是否属于这个subtask,如果属于增加到subscribedPartitionsToStartOffsets
if (KafkaTopicPartitionAssigner.assign(
//getNumberOfParallelSubtasks 所有并行度个数,
//getIndexOfThisSubtask 并行度ID
restoredStateEntry.getKey(), getRuntimeContext().getNumberOfParallelSubtasks())
== getRuntimeContext().getIndexOfThisSubtask()){
subscribedPartitionsToStartOffsets.put(restoredStateEntry.getKey(), restoredStateEntry.getValue());
}
}
//检查是否与topic是一致的,相当于一个校验
if (filterRestoredPartitionsWithCurrentTopicsDescriptor) {
subscribedPartitionsToStartOffsets.entrySet().removeIf(entry -> {
if (!topicsDescriptor.isMatchingTopic(entry.getKey().getTopic())) {
LOG.warn(
"{} is removed from subscribed partitions since it is no longer associated with topics descriptor of current execution.",
entry.getKey());
return true;
}
return false;
});
}
LOG.info("Consumer subtask {} will start reading {} partitions with offsets in restored state: {}",
getRuntimeContext().getIndexOfThisSubtask(), subscribedPartitionsToStartOffsets.size(), subscribedPartitionsToStartOffsets);
首先说一下:startupMode是消费kafka的方式有以下几种
StartupMode 的模式有以下几种,如果我们需要指定offect消费,那么就需要使用SPECIFIC_OFFSETS模式,本次只讨论使用最新的offect消费模式,其他的后续文章会介绍
/** Start from committed offsets in ZK / Kafka brokers of a specific consumer group (default). */
GROUP_OFFSETS(KafkaTopicPartitionStateSentinel.GROUP_OFFSET),
/** 从最早的offect开始消费
* Start from the earliest offset possible. */
EARLIEST(KafkaTopicPartitionStateSentinel.EARLIEST_OFFSET),
/**
* 从最新的offect开始消费
* Start from the latest offset. */
LATEST(KafkaTopicPartitionStateSentinel.LATEST_OFFSET),
/**
*
* 根据用户提供的时间戳开始消费
*
* 可以传入不同的文件,指定消费方式
* Start from user-supplied timestamp for each partition.
* Since this mode will have specific offsets to start with, we do not need a sentinel value;
* using Long.MIN_VALUE as a placeholder.
*/
TIMESTAMP(Long.MIN_VALUE),
/** 根据用户提供特定的offect开始消费
* Start from user-supplied specific offsets for each partition.
* Since this mode will have specific offsets to start with, we do not need a sentinel value;
* using Long.MIN_VALUE as a placeholder.
*/
SPECIFIC_OFFSETS(Long.MIN_VALUE);
/** The sentinel offset value corresponding to this startup mode. */
private long stateSentinel;
最后生成一个 Hashmap subscribedPartitionsToStartOffsets
根据之前生成的allPartitions 生成一个subscribedPartitionsToStartOffsets,这里我们会传入一个分区的offect默认是最大值,重最新消费最后所有的任务会进入到run方法中执行
@Override
public void run(SourceContext<T> sourceContext) throws Exception {
if (subscribedPartitionsToStartOffsets == null) {
throw new Exception("The partitions were not set for the consumer");
}
// initialize commit metrics and default offset callback method
this.successfulCommits = this.getRuntimeContext().getMetricGroup().counter(COMMITS_SUCCEEDED_METRICS_COUNTER);
this.failedCommits = this.getRuntimeContext().getMetricGroup().counter(COMMITS_FAILED_METRICS_COUNTER);
final int subtaskIndex = this.getRuntimeContext().getIndexOfThisSubtask();
this.offsetCommitCallback = new KafkaCommitCallback() {
@Override
public void onSuccess() {
successfulCommits.inc();
}
@Override
public void onException(Throwable cause) {
LOG.warn(String.format("Consumer subtask %d failed async Kafka commit.", subtaskIndex), cause);
failedCommits.inc();
}
};
// mark the subtask as temporarily idle if there are no initial seed partitions;
// once this subtask discovers some partitions and starts collecting records, the subtask's
// status will automatically be triggered back to be active.
if (subscribedPartitionsToStartOffsets.isEmpty()) {
sourceContext.markAsTemporarilyIdle();
}
LOG.info("Consumer subtask {} creating fetcher with offsets {}.",
getRuntimeContext().getIndexOfThisSubtask(), subscribedPartitionsToStartOffsets);
// from this point forward:
// - 'snapshotState' will draw offsets from the fetcher,
// instead of being built from `subscribedPartitionsToStartOffsets`
// - 'notifyCheckpointComplete' will start to do work (i.e. commit offsets to
// Kafka through the fetcher, if configured to do so)
this.kafkaFetcher = createFetcher(
sourceContext,
subscribedPartitionsToStartOffsets,
watermarkStrategy,
(StreamingRuntimeContext) getRuntimeContext(),
offsetCommitMode,
getRuntimeContext().getMetricGroup().addGroup(KAFKA_CONSUMER_METRICS_GROUP),
useMetrics);
if (!running) {
return;
}
// depending on whether we were restored with the current state version (1.3),
// remaining logic branches off into 2 paths:
// 1) New state - partition discovery loop executed as separate thread, with this
// thread running the main fetcher loop
// 2) Old state - partition discovery is disabled and only the main fetcher loop is executed
if (discoveryIntervalMillis == PARTITION_DISCOVERY_DISABLED) {
kafkaFetcher.runFetchLoop();
} else {
runWithPartitionDiscovery();
}
}
具体如何生成的,后续继续更新