waterMark的意义本质就是处理数据延迟及乱序问题,与window结合,数据的延迟是不可避免的,因为各种原因,可能是网络,机器负载等,
我们无法解决数据延迟,但是我们必须自己设置,如果发生了数据延迟我们需要怎么处理,可以直接抛弃也可以通过side output写入文件。
waterMark的意义就是在数据源获取的时候添加waterMark,一旦达到window的触发操作,表示这个waterMark之前的数据已经全部到达。
当然实际不是真的全部达到,只是理论上说到达了。如果之后有小于waterMark的值,会直接抛弃或者写入side output.
waterMark 可以设置延迟多久,比如5秒,那么触发window同样会延迟,实际的意义在本该触发window操作的时候再延迟5秒,
用白话文就是我再给你5秒时间,数据要是还不来我就不管了。
import org.apache.flink.api.common.functions.FoldFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AssignerWithPeriodicWatermarks;
import org.apache.flink.streaming.api.watermark.Watermark;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import javax.annotation.Nullable;
public class waterMark {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment();
env.enableCheckpointing(1000);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStream<String> dataStream = env
.socketTextStream("localhost", 9900).assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks<String>() {
long currentTimeStamp = 0l;
long maxDelayAllowed = 0l;
long currentWaterMark;
@Nullable
@Override
public Watermark getCurrentWatermark() {
currentWaterMark = currentTimeStamp - maxDelayAllowed;
return new Watermark(currentWaterMark);
}
@Override
public long extractTimestamp(String s, long l) {
String[] arr = s.split(",");
long timeStamp = Long.parseLong(arr[1]);
currentTimeStamp = Math.max(timeStamp, currentTimeStamp);
System.out.println("Key:" + arr[0] + ",EventTime:" + timeStamp + ",水位线:" + currentWaterMark);
return timeStamp;
}
});
dataStream.map(new MapFunction<String, Tuple2<String, String>>() {
@Override
public Tuple2<String, String> map(String s) throws Exception {
return new Tuple2<String, String>(s.split(",")[0], s.split(",")[1]);
}
}).keyBy(0)
.window(TumblingEventTimeWindows.of(Time.seconds(10)))
.fold("Start:", new FoldFunction<Tuple2<String, String>, String>() {
@Override
public String fold(String s, Tuple2<String, String> o) throws Exception {
return s + " - " + o.f1;
}
}).print();
env.execute("WaterMark Test Demo");
}
}