kafka 数据源可以是有界数据源,也可以是无界数据源
示例代码
public static void main(String[] args) {
StreamExecutionEnvironment env
= StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
//设置运行方式为批处理
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
//设置消费的开始时间戳和结束的时间戳
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers("localhost:9092")
.setTopics("input-topic")
.setGroupId("my-group")
.setStartingOffsets(OffsetsInitializer.timestamp(1657038028000L))
.setBounded(OffsetsInitializer.timestamp(1657120828000l))
.setValueOnlyDeserializer(new SimpleStringSchema())
.build();
env.fromSource(source, WatermarkStrategy.noWatermarks(), "Kafka Source").print();
try {
env.execute("batch-kafka-test");
} catch (Exception e) {
e.printStackTrace();
}
}
原理
把时间戳转换成offset
KafkaPartitionSplitReader 类
private void acquireAndSetStoppingOffsets(
List<TopicPartition> partitionsStoppingAtLatest,
Set<TopicPartition> partitionsStoppingAtCommitted) {
//设置end offset
Map<TopicPartition, Long> endOffset = consumer.endOffsets(partitionsStoppingAtLatest);
stoppingOffsets.putAll(endOffset);
if (!partitionsStoppingAtCommitted.isEmpty()) {
consumer.committed(partitionsStoppingAtCommitted)
.forEach(
(tp, offsetAndMetadata) -> {
Preconditions.checkNotNull(
offsetAndMetadata,
String.format(
"Partition %s should stop at committed offset. "
+ "But there is no committed offset of this partition for group %s",
tp, groupId));
stoppingOffsets.put(tp, offsetAndMetadata.offset());
});
}
}
总是
如果任务没有结束,则有可能是时间范围内没有数据,没有产生offset end ,这个时候变成了无界的数据源