Flink 消费 kafka 时如何处理空闲 task

最新推荐文章于 2023-07-05 19:34:17 发布

shengjk1

最新推荐文章于 2023-07-05 19:34:17 发布

阅读量598

点赞数

分类专栏： Apache Flink

本文链接：https://blog.csdn.net/jsjsjs1789/article/details/105098952

版权

Apache Flink 专栏收录该内容

57 篇文章 67 订阅

订阅专栏

我们都知道 flink 消费 kafka 是一个 partition 对应一个 task，但比如说 flink task 数多于 kafka partition 时。flink 是如何处理这个空闲的 task 的。

@Override
	public void run(SourceContext<T> sourceContext) throws Exception {
		if (subscribedPartitionsToStartOffsets == null) {
			throw new Exception("The partitions were not set for the consumer");
		}

		// initialize commit metrics and default offset callback method
		this.successfulCommits = this.getRuntimeContext().getMetricGroup().counter(COMMITS_SUCCEEDED_METRICS_COUNTER);
		this.failedCommits =  this.getRuntimeContext().getMetricGroup().counter(COMMITS_FAILED_METRICS_COUNTER);

		// offset commit 的回调方法，当 notifyCheckpointComplete 时，会调用此方法
		this.offsetCommitCallback = new KafkaCommitCallback() {
			@Override
			public void onSuccess() {
				successfulCommits.inc();
			}

			@Override
			public void onException(Throwable cause) {
				LOG.warn("Async Kafka commit failed.", cause);
				failedCommits.inc();
			}
		};

		// mark the subtask as temporarily idle if there are no initial seed partitions;
		// once this subtask discovers some partitions and starts collecting records, the subtask's
		// status will automatically be triggered back to be active.
		//标记为该 task 为空闲状态。什么样的场景会被标记为空闲状态呢？当 Flink 的并行度大于 partitions 数时，有一个 task 就会被标记为空闲状态
		//标记为空闲状态时，就会通知下游，我不在发送任何 recode 和 watermarks，可以理解为我不存在
		if (subscribedPartitionsToStartOffsets.isEmpty()) {
			sourceContext.markAsTemporarilyIdle();
		}

		......
	}

标记为空闲状态就完了