一、Time
谈及Watermark之前,需要先了解一下Flink中的三种Time,分别是Event Time(事件时间)、摄入时间(Ingestion Time)和Processing Time(处理时间):
(图片来自Flink官网)
如上图,可以很清晰的了解这三种时间的概念:
- 事件时间:事件发生的时间,数据本身一般会携带的时间,可以从每个事件中获取到事件时间戳;
- 摄入时间:事件进入Flink的时间,即Source操作时的时间戳;
- 处理时间:事件被算子处理时的时间戳。
二、Watermark
Watermark是Flink为了处理事件时间窗口计算提出的一种机制,其本身也是一种带时间戳的对象,这个可以从Watermark类构造器函数定义中看出:
package org.apache.flink.streaming.api.watermark;
import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.streaming.runtime.streamrecord.StreamElement;
/**
* A Watermark tells operators that no elements with a timestamp older or equal
* to the watermark timestamp should arrive at the operator. Watermarks are emitted at the
* sources and propagate through the operators of the topology. Operators must themselves emit
* watermarks to downstream operators using
* {@link org.apache.flink.streaming.api.operators.Output#emitWatermark(Watermark)}. Operators that
* do not internally buffer elements can always forward the watermark that they receive. Operators
* that buffer elements, such as window operators, must forward a watermark after emission of
* elements that is triggered by the arriving watermark.
*
* <p>In some cases a watermark is only a heuristic and operators should be able to deal with
* late elements. They can either discard those or update the result and emit updates/retractions
* to downstream operations.
*
* <p>When a source closes it will emit a final watermark with timestamp {@code Long.MAX_VALUE}.
* When an operator receives this it will know that no more input will be arriving in the future.
*/
@PublicEvolving
public final class Watermark extends StreamElement {
/** The watermark that signifies end-of-event-time. */
public static final Watermark MAX_WATERMARK = new Watermark(Long.MAX_VALUE);
// ------------------------------------------------------------------------
/** The timestamp of the watermark in milliseconds. */
private final long timestamp;
/**
* Creates a new watermark with the given timestamp in milliseconds.
*/
public Watermark(long timestamp) {
this.timestamp = timestamp;
}
/**
* Returns the timestamp associated with this {@link Watermark} in milliseconds.
*/
public long getTimestamp() {
return timestamp;
}
// ------------------------------------------------------------------------
@Override
public boolean equals(Object o) {
return this == o ||
o != null && o.getClass() == Watermark.class && ((Watermark) o).timestamp == this.timestamp;
}
@Override
public int hashCode() {
return (int) (timestamp ^ (timestamp >>> 32));
}
@Override
public String toString() {
return "Watermark @ " + timestamp;
}
}
Watermark由Source操作或Watermark生成器插入数据流中。
Watermark的作用:具有时间戳t的Watermark就是告诉Flink算子所有时间戳小于等于t的数据流已经到达了。