flink的WaterMark(水位线)函数
简介
WaterMark中文翻译过来是水位线,是flink的一种雁延迟触发机制,通常跟EventTime结合使用
大意就是等等迟到的数据.
使用
1.要设置使用EventTime作为时间标准,env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
2.要提取EventTime作为TimeStamp
3.设置最大乱序延迟时间
watermark = 数据所携带的EventTime - 延迟时间
触发的时机:watermark的时间>=一个窗口的结束边界
代码实现:
public class Watermark1 {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//Flink默认使用ProcessingTime作为时间标准
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); //设置EventTime作为时间标准
//需要将时间转成Timestamp格式
//2020-03-01 00:00:00,1
//2020-03-01 00:00:04,2
//2020-03-01 00:00:05,3
DataStreamSource<String> lines = env.socketTextStream("localhost", 8888);
//提取数据中的EventTime
SingleOutputStreamOperator<String> dataStreamWithWaterMark = lines.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<String>(Time.seconds(0)) {
private SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
@Override
public long extractTimestamp(String element) {
String[] fields = element.split(",");
String dateStr = fields[0];
try {
Date date = sdf.parse(dateStr);
long timestamp = date.getTime();
return timestamp;
} catch (ParseException e) {
throw new RuntimeException("时间转换异常");
}
}
});
SingleOutputStreamOperator<Integer> nums = dataStreamWithWaterMark.map(new MapFunction<String, Integer>() {
@Override
public Integer map(String value) throws Exception {
String[] fields = value.split(",");
String numStr = fields[1];
return Integer.parseInt(numStr);
}
});
//如果是划分窗口,如果没有调用keyBy分组(Non-Keyed Stream),调用windowAll
AllWindowedStream<Integer, TimeWindow> window = nums
.windowAll(TumblingEventTimeWindows.of(Time.seconds(5)));
SingleOutputStreamOperator<Integer> sum = window.sum(0);
sum.print();
env.execute();
}
}