SystemProcessingTimeService
getCurrentProcessingTime() 方法返回 System.currentTimeMillis()
ScheduledFuture<?> registerTimer(long timestamp, ProcessingTimeCallback callback) 注册一个要在不早于时间 timestamp 的情况下执行的任务。这里输入的 timestamp指的是任务启动的时间如果设定的时间早于当前时间,则会在当前时间的一毫秒后执行(这里的一毫秒是为了超过水位线,避免因为和水位线相同而被抛弃),否则在设定的时间执行;定时任务使用的是java里的 ScheduledThreadPoolExecutor方法实现定时任务,ProcessingTimeCallback这个接口里有一个onProcessingTime方法也是定时任务最终会定时调用的接口。
ScheduledTask 实现了Runnable接口
TimestampsAndWatermarksOperator
对单个流操作,从事件中提取时间戳并生成水印。open()方法中进行初始化这里有一个watermarkInterval参数,默认为0;每次通过registerTimer方法注册定时任务(生成时间水印)。
private transient TimestampAssigner<T> timestampAssigner //提取时间戳的方法
private transient WatermarkGenerator<T> watermarkGenerator //生成时间水印
public TimestampsAndWatermarksOperator(
WatermarkStrategy<T> watermarkStrategy) {
this.watermarkStrategy = checkNotNull(watermarkStrategy);
this.chainingStrategy = ChainingStrategy.ALWAYS;
}
@Override
public void open() throws Exception {
super.open();
timestampAssigner = watermarkStrategy.createTimestampAssigner(this::getMetricGroup);
watermarkGenerator = watermarkStrategy.createWatermarkGenerator(this::getMetricGroup);
wmOutput = new WatermarkEmitter(output, getContainingTask().getStreamStatusMaintainer());
watermarkInterval = getExecutionConfig().getAutoWatermarkInterval();
if (watermarkInterval > 0) {
final long now = getProcessingTimeService().getCurrentProcessingTime();
getProcessingTimeService().registerTimer(now + watermarkInterval, this);
}
}
@Override
public void processElement(final StreamRecord<T> element) throws Exception {
final T event = element.getValue();
final long previousTimestamp = element.hasTimestamp() ? element.getTimestamp() : Long.MIN_VALUE;
final long newTimestamp = timestampAssigner.extractTimestamp(event, previousTimestamp);
element.setTimestamp(newTimestamp);
output.collect(element);
watermarkGenerator.onEvent(event, newTimestamp, wmOutput);
}
@Override
public void onProcessingTime(long timestamp) throws Exception {
watermarkGenerator.onPeriodicEmit(wmOutput);
final long now = getProcessingTimeService().getCurrentProcessingTime();
getProcessingTimeService().registerTimer(now + watermarkInterval, this);
}
这里可以注意到watermarkStrategy这个变量是由用户自定义的(从assignTimestampsAndWatermarks传进来的)
open中通过这个watermarkStrategy对上面说的两个变量初始化,
这里可以注意到registerTimer方法分别在open和onProcessingTime调用,使用的延迟都是在配置中的 每隔一段时间发出水印的延迟 这里理解前者的registerTimer是对本对象的初始化只被执行一次,后者onProcessingTime则会根据实际情况被多次调度
WatermarkStrategy
继承了TimestampAssignerSupplier, WatermarkGeneratorSupplier 这二者都是用的是供应商模式,分别返回TimestampAssigner和WatermarkGenerator。
在flink 1.11 后assignTimestampsAndWatermarks支持传入WatermarkStrategy对象用于生成时间水位,以下是对相关的接口进行的简单复现。这里对接口的运用非常值得学习。
@PublicEvolving
@FunctionalInterface
interface TimestampAssignerSupplier<T> {
TimestampAssigner<T> createTimestampAssigner(Context context);
static <T> TimestampAssignerSupplier<T> of(SerializableTimestampAssigner<T> assigner) {
return new SupplierFromSerializableTimestampAssigner<>(assigner);
}
interface Context {
int getMetricGroup();
}
class SupplierFromSerializableTimestampAssigner<T> implements TimestampAssignerSupplier<T> {
private final SerializableTimestampAssigner<T> assigner;
public SupplierFromSerializableTimestampAssigner(SerializableTimestampAssigner<T> assigner) {
this.assigner = assigner;
}
@Override
public TimestampAssigner<T> createTimestampAssigner(Context context) {
return assigner;
}
}
}
@PublicEvolving
@FunctionalInterface
interface WatermarkGeneratorSupplier<T> {
WatermarkGenerator<T> createWatermarkGenerator(Context context);
interface Context {
int getMetricGroup();
}
}
@Public
@FunctionalInterface
interface TimestampAssigner<T> {
long extractTimestamp(T element, long recordTimestamp);
}
@PublicEvolving
@FunctionalInterface
interface SerializableTimestampAssigner<T> extends TimestampAssigner<T>, Serializable {
}
@Public
interface WatermarkGenerator<T> {
void onEvent(T event, long eventTimestamp, long output);
void onPeriodicEmit(long output);
}
@Public
class BoundedOutOfOrdernessWatermarks<T> implements WatermarkGenerator<T> {
private long maxTimestamp;
private final long outOfOrdernessMillis;
public BoundedOutOfOrdernessWatermarks(Duration maxOutOfOrderness) {
this.outOfOrdernessMillis = maxOutOfOrderness.toMillis();
this.maxTimestamp = Long.MIN_VALUE + outOfOrdernessMillis + 1;
}
@Override
public void onEvent(T event, long eventTimestamp, long output) {
maxTimestamp = Math.max(maxTimestamp, eventTimestamp);
}
@Override
public void onPeriodicEmit(long output) {
output = maxTimestamp - outOfOrdernessMillis - 1;
}
}
@Public
final class WatermarkStrategyWithTimestampAssigner<T> implements WatermarkStrategy<T> {
private final WatermarkStrategy<T> baseStrategy;
private final TimestampAssignerSupplier<T> timestampAssigner;
WatermarkStrategyWithTimestampAssigner(
WatermarkStrategy<T> baseStrategy,
TimestampAssignerSupplier<T> timestampAssigner) {
this.baseStrategy = baseStrategy;
this.timestampAssigner = timestampAssigner;
}
@Override
public TimestampAssigner<T> createTimestampAssigner(TimestampAssignerSupplier.Context context) {
return timestampAssigner.createTimestampAssigner(context);
}
@Override
public WatermarkGenerator<T> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {
return baseStrategy.createWatermarkGenerator(context);
}
}
@Public
interface WatermarkStrategy<T> extends TimestampAssignerSupplier<T>, WatermarkGeneratorSupplier<T> {
@Override
default TimestampAssigner<T> createTimestampAssigner(TimestampAssignerSupplier.Context context) {
return null;
}
@Override
WatermarkGenerator<T> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context);
static <T> WatermarkStrategy<T> forMonotonousTimestamps() {
return (context) -> new BoundedOutOfOrdernessWatermarks<>(Duration.ofMillis(0));
}
default WatermarkStrategy<T> withTimestampAssigner(TimestampAssignerSupplier<T> timestampAssigner) {
return new WatermarkStrategyWithTimestampAssigner<>(this, timestampAssigner);
}
default WatermarkStrategy<T> withTimestampAssigner(SerializableTimestampAssigner<T> timestampAssigner) {
return new WatermarkStrategyWithTimestampAssigner<>(this,
TimestampAssignerSupplier.of(timestampAssigner));
}
}
WatermarkGenerator
这个接口可以根据事件并定期(以固定间隔)生成水印。
onEvent 为每个事件调用,允许水印生成器检查并记住事件时间戳
onPeriodicEmit 定期调用,会根据当前记录的时间和允许迟到的时间发出新的水印。
TimestampAssigner
这个接口内只有一个方法 long extractTimestamp(T element, long recordTimestamp) element为将要标记时间戳的对象,recordTimestamp的值是前一个时间戳分配器分配的时间戳,如果没有分配则为Long.MIN_VALUE