1.正文
前面,我们已经学过了 一文搞懂 Flink 处理 Barrier 全过程,今天我们一起来看一下 flink 是如何处理水印的,以 Flink 消费 kafka 为例
FlinkKafkaConsumer consumer = new FlinkKafkaConsumer(topics, new SimpleStringSchema(), properties);
consumer.setStartFromLatest();
consumer.assignTimestampsAndWatermarks(new AscendingTimestampExtractor() {
@Override
public long extractAscendingTimestamp(String element) {
String locTime = "";
try {
Map map = Json2Others.json2map(element);
locTime = map.get("locTime").toString();
} catch (IOException e) {
}
LocalDateTime startDateTime =
LocalDateTime.parse(locTime, DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"));
long milli = startDateTime.toInstant(OffsetDateTime.now().getOffset()).toEpochMilli();
return milli;
}
});
通过 assignTimestampsAndWatermarks 来对 watermarksPeriodic 进行赋值,当 KafkaFetcher ( 关于 KafkaFetcher 可以参考 写给大忙人看的Flink 消费 Kafka) 在初始化的时候,会创建 PeriodicWatermarkEmitter
// if we have periodic watermarks, kick off the interval scheduler
// 在构建 fetcher 的时候创建 PeriodicWatermarkEmitter 并启动,以周期性发送
if (timestampWatermarkMode == PERIODIC_WATERMARKS) {
@SuppressWarnings("unchecked")
PeriodicWatermarkEmitter periodicEmitter = new PeriodicWatermarkEmitter(
subscribedPartitionStates,
sourceContext,
processingTimeProvider,
autoWatermarkInterval);
periodicEmitter.start();
}
PeriodicWatermarkEmitter 主要的作用就是周期性的发送 watermark,默认周期是 200 ms,通过 env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); 指定。
@Override
//每隔 interval 时间会调用此方法
public void onProcessingTime(long timestamp) throws Exception {
long minAcrossAll = Long.MAX_VALUE;
boolean isEffectiveMinAggregation = false;
for (KafkaTopicPartitionState> state : allPartitions) {
// we access the current watermark for the pe