1、注意点
watermark的API在1.11和1.12有比较大的变更,setStreamTimeCharacteristic等被弃用,并且默认使用事件时间
2、watermark产生间隔
默认情况下,ProcessingTime的产生间隔是0,其他的是周期性的200ms更新一次(这应该就是为什么大数据量测试时可以在watermark达到后还能接受一定的数据)
ExecutionConfig提供了接口setAutoWatermarkInterval设置watermark产生的间隔(默认值200ms)
env.getConfig.setAutoWatermarkInterval(200L)
此外,默认值在调用setStreamTimeCharacteristic设置时间类型的时候确认
public void setStreamTimeCharacteristic(TimeCharacteristic characteristic) {
this.timeCharacteristic = Preconditions.checkNotNull(characteristic);
if (characteristic == TimeCharacteristic.ProcessingTime) {
getConfig().setAutoWatermarkInterval(0);
} else {
getConfig().setAutoWatermarkInterval(200);
}
}
3、watermark初始化设置
watermark提取是由业务调用assignTimestampsAndWatermarks设置,其中会创建TimestampsAndWatermarksOperator,TimestampsAndWatermarksOperator当中会启用定时器任务
assignTimestampsAndWatermarks
public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(
WatermarkStrategy<T> watermarkStrategy) {
final WatermarkStrategy<T> cleanedStrategy = clean(watermarkStrategy);
final TimestampsAndWatermarksOperator<T> operator =
new TimestampsAndWatermarksOperator<>(cleanedStrategy);
// match parallelism to input, to have a 1:1 source -> timestamps/watermarks relationship and chain
final int inputParallelism = getTransformation().getParallelism();
return transform("Timestamps/Watermarks", getTransformation().getOutputType(), operator)
.setParallelism(inputParallelism);
}
TimestampsAndWatermarksOperator
public void open() throws Exception {
super.open();
timestampAssigner = watermarkStrategy.createTimestampAssigner(this::getMetricGroup);
watermarkGenerator = watermarkStrategy.createWatermarkGenerator(this::getMetricGroup);
wmOutput = new WatermarkEmitter(output, getContainingTask().getStreamStatusMaintainer());
watermarkInterval = getExecutionConfig().getAutoWatermarkInterval();
if (watermarkInterval > 0) {
final long now = getProcessingTimeService().getCurrentProcessingTime();
getProcessingTimeService().registerTimer(now + watermarkInterval, this);
}
}
ScheduledThreadPoolExecutor
定时任务最终基于ScheduledThreadPoolExecutor完成,一般十几分钟才有1秒多的误差
4、watermark周期调度
assignTimestampsAndWatermarks初始化设置时会设置定时任务,定时任务调用回调函数
public ScheduledFuture<?> registerTimer(long timestamp, ProcessingTimeCallback callback) {
long delay = ProcessingTimeServiceUtil.getProcessingTimeDelay(timestamp, getCurrentProcessingTime());
// we directly try to register the timer and only react to the status on exception
// that way we save unnecessary volatile accesses for each timer
try {
return timerService.schedule(wrapOnTimerCallback(callback, timestamp), delay, TimeUnit.MILLISECONDS);
}
回调函数会调用watermark抽取API(WatermarkAssignerOperator)的onProcessingTime方法
@Override
public void run() {
if (serviceStatus.get() != STATUS_ALIVE) {
return;
}
try {
callback.onProcessingTime(nextTimestamp);
WatermarkAssignerOperator中会提交watermark
public void onProcessingTime(long timestamp) throws Exception {
advanceWatermark();
if (idleTimeout > 0) {
final long currentTime = getProcessingTimeService().getCurrentProcessingTime();
if (currentTime - lastRecordTime > idleTimeout) {
// mark the channel as idle to ignore watermarks from this channel
streamStatusMaintainer.toggleStreamStatus(StreamStatus.IDLE);
}
}
// register next timer
long now = getProcessingTimeService().getCurrentProcessingTime();
getProcessingTimeService().registerTimer(now + watermarkInterval, this);
}
private void advanceWatermark() {
if (currentWatermark > lastWatermark) {
lastWatermark = currentWatermark;
// emit watermark
output.emitWatermark(new Watermark(currentWatermark));
}
}
WatermarkAssignerOperator的操作函数processElement会更新当前watermark的值,同时为了防止周期任务不触发(CPU繁忙),也会提交watermark
@Override
public void processElement(StreamRecord<RowData> element) throws Exception {
if (idleTimeout > 0) {
// mark the channel active
streamStatusMaintainer.toggleStreamStatus(StreamStatus.ACTIVE);
lastRecordTime = getProcessingTimeService().getCurrentProcessingTime();
}
RowData row = element.getValue();
if (row.isNullAt(rowtimeFieldIndex)) {
throw new RuntimeException("RowTime field should not be null," +
" please convert it to a non-null long value.");
}
Long watermark = watermarkGenerator.currentWatermark(row);
if (watermark != null) {
currentWatermark = Math.max(currentWatermark, watermark);
}
// forward element
output.collect(element);
// eagerly emit watermark to avoid period timer not called (this often happens when cpu load is high)
// current_wm - last_wm > interval
if (currentWatermark - lastWatermark > watermarkInterval) {
advanceWatermark();
}
}
5、watermark每数据调度
这个比较简单,TimestampsAndWatermarksOperator跟其他Operator一样,针对每个数据有processElement方法,方法中有watermarkGenerator.onEvent方法,根据用户实现,可以每个数据都生成Watermarks
@Override
public void processElement(final StreamRecord<T> element) throws Exception {
final T event = element.getValue();
final long previousTimestamp = element.hasTimestamp() ? element.getTimestamp() : Long.MIN_VALUE;
final long newTimestamp = timestampAssigner.extractTimestamp(event, previousTimestamp);
element.setTimestamp(newTimestamp);
output.collect(element);
watermarkGenerator.onEvent(event, newTimestamp, wmOutput);
}
6、watermark流转
watermark会像普通的element和stream status一样随着stream流不断的向下游流转
watermark在流中等同于数据,如下,数据和watermark都是StreamElement的子类:
1.png
创建StreamTask时会创建处理函数,如下,processInput是实际的处理逻辑:
protected StreamTask(
Environment environment,
@Nullable TimerService timerService,
Thread.UncaughtExceptionHandler uncaughtExceptionHandler,
StreamTaskActionExecutor actionExecutor,
TaskMailbox mailbox) throws Exception {
super(environment);
this.configuration = new StreamConfig(getTaskConfiguration());
this.recordWriter = createRecordWriterDelegate(configuration, environment);
this.actionExecutor = Preconditions.checkNotNull(actionExecutor);
this.mailboxProcessor = new MailboxProcessor(this::processInput, mailbox, actionExecutor);
StreamTask运行时会调用到上述的处理逻辑runMailboxLoop:
public final void invoke() throws Exception {
try {
beforeInvoke();
runMailboxLoop();
processInput有几种实现:
4.jpg
以StreamOneInputProcessor为例,最后调用的是emitNext处理逻辑:
public InputStatus processInput() throws Exception {
InputStatus status = input.emitNext(output);
if (status == InputStatus.END_OF_INPUT) {
operatorChain.endHeadOperatorInput(1);
}
return status;
}
  emitNext有几种实现:
5.jpg
StreamTaskNetworkInput为例,具体实现如下,processElement是实际的处理函数:
public InputStatus emitNext(DataOutput<T> output) throws Exception {
while (true) {
// get the stream element from the deserializer
if (currentRecordDeserializer != null) {
DeserializationResult result = currentRecordDeserializer.getNextRecord(deserializationDelegate);
if (result.isBufferConsumed()) {
currentRecordDeserializer.getCurrentBuffer().recycleBuffer();
currentRecordDeserializer = null;
}
if (result.isFullRecord()) {
processElement(deserializationDelegate.getInstance(), output);
return InputStatus.MORE_AVAILABLE;
}
}
currentRecordDeserializer.getNextRecord会获取序列化的数据,读取时会置标志位,如果读取的数据不是一条完整的record,不会处理,等到完整的读取到record才会处理,PARTIAL_RECORD为非完全的record:
public DeserializationResult getNextRecord(T target) throws IOException {
// always check the non-spanning wrapper first.
// this should be the majority of the cases for small records
// for large records, this portion of the work is very small in comparison anyways
if (nonSpanningWrapper.hasCompleteLength()) {
return readNonSpanningRecord(target);
} else if (nonSpanningWrapper.hasRemaining()) {
nonSpanningWrapper.transferTo(spanningWrapper.lengthBuffer);
return PARTIAL_RECORD;
} else if (spanningWrapper.hasFullRecord()) {
target.read(spanningWrapper.getInputView());
spanningWrapper.transferLeftOverTo(nonSpanningWrapper);
return nonSpanningWrapper.hasRemaining() ? INTERMEDIATE_RECORD_FROM_BUFFER : LAST_RECORD_FROM_BUFFER;
} else {
return PARTIAL_RECORD;
}
}
processElement实现如下,分不同的数据类型:数据、watermark等等
private void processElement(StreamElement recordOrMark, DataOutput<T> output) throws Exception {
if (recordOrMark.isRecord()){
output.emitRecord(recordOrMark.asRecord());
} else if (recordOrMark.isWatermark()) {
statusWatermarkValve.inputWatermark(recordOrMark.asWatermark(), lastChannel);
} else if (recordOrMark.isLatencyMarker()) {
output.emitLatencyMarker(recordOrMark.asLatencyMarker());
} else if (recordOrMark.isStreamStatus()) {
statusWatermarkValve.inputStreamStatus(recordOrMark.asStreamStatus(), lastChannel);
} else {
throw new UnsupportedOperationException("Unknown type of StreamElement");
}
}
7、watermark对齐
对齐就是有多个输入,如何选取watermark的过程;接上文,statusWatermarkValve.inputWatermark是处理watermark的地方
public void inputWatermark(Watermark watermark, int channelIndex) throws Exception {
// ignore the input watermark if its input channel, or all input channels are idle (i.e. overall the valve is idle).
if (lastOutputStreamStatus.isActive() && channelStatuses[channelIndex].streamStatus.isActive()) {
long watermarkMillis = watermark.getTimestamp();
// if the input watermark's value is less than the last received watermark for its input channel, ignore it also.
if (watermarkMillis > channelStatuses[channelIndex].watermark) {
channelStatuses[channelIndex].watermark = watermarkMillis;
// previously unaligned input channels are now aligned if its watermark has caught up
if (!channelStatuses[channelIndex].isWatermarkAligned && watermarkMillis >= lastOutputWatermark) {
channelStatuses[channelIndex].isWatermarkAligned = true;
}
// now, attempt to find a new min watermark across all aligned channels
findAndOutputNewMinWatermarkAcrossAlignedChannels();
}
}
InputChannelStatus(channelStatuses)存储了所有inputChannel的watermark状态,主要三个成员:
watermark(最后一个watermark的时间戳)
streamStatus(流的状态,是否闲置)
isWatermarkAligned(watermark是否对齐,即通道的watermark超过已经输出的watermark)
状态是idle、状态刚恢复成active且当前channel的watermark还没赶上总体输出的watermark这两种情况,watermark无需对齐考虑
protected static class InputChannelStatus {
protected long watermark;
protected StreamStatus streamStatus;
protected boolean isWatermarkAligned;
findAndOutputNewMinWatermarkAcrossAlignedChannels是最终进行对齐处理的地方,实际就是取所有inputChannel的最小watermark并作为当前watermark
private void findAndOutputNewMinWatermarkAcrossAlignedChannels() throws Exception {
long newMinWatermark = Long.MAX_VALUE;
boolean hasAlignedChannels = false;
// determine new overall watermark by considering only watermark-aligned channels across all channels
for (InputChannelStatus channelStatus : channelStatuses) {
if (channelStatus.isWatermarkAligned) {
hasAlignedChannels = true;
newMinWatermark = Math.min(channelStatus.watermark, newMinWatermark);
}
}
// we acknowledge and output the new overall watermark if it really is aggregated
// from some remaining aligned channel, and is also larger than the last output watermark
if (hasAlignedChannels && newMinWatermark > lastOutputWatermark) {
lastOutputWatermark = newMinWatermark;
output.emitWatermark(new Watermark(lastOutputWatermark));
}
}
8、watermark附着
output.emitWatermark(new Watermark(currentWatermark));这一步会将watermark附加到chain当中
Task启动任务后,StreamTask初始化会创建OperatorChain对象,这个会包含watermark。OperatorChain是整个chain的算子集合的一个操作任务
protected void beforeInvoke() throws Exception {
disposedOperators = false;
LOG.debug("Initializing {}.", getName());
operatorChain = new OperatorChain<>(this, recordWriter);
9、分区空闲
下游算子 watermark 的计算方式是取所有不同的上游并行数据源 watermark 的最小值。如果数据源中的某一个分区/分片在一段时间内未发送事件数据,则意味着 WatermarkGenerator 也不会获得任何新数据去生成 watermark。在这种情况下,当某些其他分区仍然发送事件数据的时候就会出现问题。
为了解决这个问题,你可以使用 WatermarkStrategy 来检测空闲输入并将其标记为空闲状态
WatermarkStrategy
.<Tuple2<Long, String>>forBoundedOutOfOrderness(Duration.ofSeconds(20))
.withIdleness(Duration.ofMinutes(1));