flink kafka 消费者的订阅模式和动态分区拉取数据的源码解析

最新推荐文章于 2024-01-19 08:35:48 发布

程序媛-yang

最新推荐文章于 2024-01-19 08:35:48 发布

阅读量1.8k

点赞数

分类专栏： flink

本文链接：https://blog.csdn.net/weixin_38472282/article/details/106206405

版权

flink 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

kafka 的订阅模式

在描写flink kafka 的订阅模式前，先来回顾一下kafka 的订阅模式，kafka 的订阅模式有以下两种。
1.subscribe() 具有消费者自动再均衡的功能，当组内的消费者增加或者减少的时候，分区的关系就会自动调整。
2.assign 是可以自定义根据分区来拉取数据的。assgin模式因为是自己定义的，所以就缺少来kafkasubscribe中自带的reblance。
而flink kafka选取的模式就是assign模式。

flink kafka 消费者

在flink kafka 的消费者中FlinkKafkaConsumerBase 为主要程序。在flinkKafkaConsumerBase中的open会初始化一些参数。在open 函数中有个变量subscribedPartitionsToStartOffsets，这个参数是传给 createFetcher 函数，这个函数最后进入AbstractFetcher。在abstractFetcher中会启动kafka consumer的线程，并且会根据分区拉取数据，实现动态分区的分配问题。下面就来详细的分析AbstractFetcher 和kafkaConsumer线程的源码。

AbstractFetcher 详解

定义的变量

初始化相应的waterMarker

	private static final int NO_TIMESTAMPS_WATERMARKS = 0;
	private static final int PERIODIC_WATERMARKS = 1;
	private static final int PUNCTUATED_WATERMARKS = 2；

分区相关的初始化

	/** The source context to emit records and watermarks to. */
	protected final SourceContext<T> sourceContext;

	/** The lock that guarantees that record emission and state updates are atomic,
	 * from the view of taking a checkpoint. */
	private final Object checkpointLock;

	/** All partitions (and their state) that this fetcher is subscribed to. */
	private final List<KafkaTopicPartitionState<KPH>> subscribedPartitionStates;

	/**
	 * Queue of partitions that are not yet assigned to any Kafka clients for consuming.
	 * Kafka version-specific implementations of {@link AbstractFetcher#runFetchLoop()}
	 * should continuously poll this queue for unassigned partitions, and start consuming
	 * them accordingly.
	 *
	 * <p>All partitions added to this queue are guaranteed to have been added
	 * to {@link #subscribedPartitionStates} already.
	 */
	protected final ClosableBlockingQueue<KafkaTopicPartitionState<KPH>> unassignedPartitionsQueue;

	/** The mode describing whether the fetcher also generates timestamps and watermarks. */
	private final int timestampWatermarkMode;

	/**
	 * Optional timestamp extractor / watermark generator that will be run per Kafka partition,
	 * to exploit per-partition timestamp characteristics.
	 * The assigner is kept in serialized form, to deserialize it into multiple copies.
	 */
	private final SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic;

	/**
	 * Optional timestamp extractor / watermark generator that will be run per Kafka partition,
	 * to exploit per-partition timestamp characteristics.
	 * The assigner is kept in serialized form, to deserialize it into multiple copies.
	 */
	private final SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated;

	/** User class loader used to deserialize watermark assigners. */
	private final ClassLoader userCodeClassLoader;

	/** Only relevant for punctuated watermarks: The current cross partition watermark. */
	private volatile long maxWatermarkSoFar = Long.MIN_VALUE;

这里subscribedPartitionStates 和 unassignedPartitionsQueue 是对于分区很重要的两个变量。可以看到上面源码中给出的官方解释。
1. subscribedPartitonStates :All partitions (and their state) that this fetcher is subscribed to. 存放的是当前topic的所有分区的状态以及一些相应的描述。
2.unassignedPartitionsQueue：我们可知kafka 的消费者，根据分区来拉取数据的，在开始要先根据分区，subtask确定，哪个分区归哪个subtask，这里的分配原理，在前面的文章中有写如何分配subtask 和分区的策略。这个queue中存放的是没有被分配的分区。
未完待续。。。

程序媛-yang

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
flink kafka 消费者的订阅模式和动态分区拉取数据的源码解析

kafka 的订阅模式在描写flink kafka 的订阅模式前，先来回顾一下kafka 的订阅模式，kafka 的订阅模式有以下两种。1.subscribe() 具有消费者自动再均衡的功能，当组内的消费者增加或者减少的时候，分区的关系就会自动调整。2.assign 是可以自定义根据分区来拉取数据的。assgin模式因为是自己定义的，所以就缺少来kafkasubscribe中自带的reblance。而flink kafka选取的模式就是assign模式。flink kafka 消费者在flin
复制链接

扫一扫