转自:https://developer.aliyun.com/ask/361651
按照官方文档的kafka source部分,有如下配置说明:
scan.startup.mode
: optionalgroup-offsetsStringStartup mode for Kafka
consumer, valid values are ‘earliest-offset’, ‘latest-offset’, ‘group-offsets’, ‘timestamp’ and ‘specific-offsets’. See the following
Start Reading Position for more details.
其中Reading Positions部分说明如下:
The config option scan.startup.mode specifies the startup mode for Kafka consumer. The valid enumerations are:
group-offsets
: start from committed offsets in ZK / Kafka brokers of a specific consumer group.
earliest-offset
: start from the earliest offset possible.
latest-offset
: start from the latest offset.
timestamp
: start from user-supplied timestamp for each partition.
specific-offsets
: start from user-supplied specific offsets for each partition.
可见,latest-offset和group-offsets是2个配置,所以我配置 latest-offset 肯定是从最新部分开始消费的,而不管使用的说明 group id,以及这个group id 已提交的offset,这个估计没问题。
然后我想知道的是:带有latest-offset
这个配置的情况下,sql任务自动重启基于检查点是从最新消费,还是基于检查点的offset消费?
- 对于flink stream中实现,是从
checkpoint offset
的 - flinksql 也是会从上一次成功的 checkpoint 中保存的 offset 位置开始恢复数据的
DataStream API
### 如果是从checkpoint中恢复的,取得是checkpoint offset
### 如果不是checkpoint恢复的,取得是 kafka 保存的offset
consumerprops.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");