注意问题:
1、设置并行度为1
2、先启动nc,并发送数据,哪怕直发送一条数据(为什么?),然后再idea启动程序
提前准备点数据源:
01,1586489566000
01,1586489567000
01,1586489568000
01,1586489569000
01,1586489570000
01,1586489571000
01,1586489572000
01,1586489573000
01,1586489574000
01,1586489575000
01,1586489576000
01,1586489577000
01,1586489578000
01,1586489579000
01,1586489589000
代码:
import net.minidev.json.JSONUtil;
import org.apache.flink.api.common.eventtime.*;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.common.functions.FoldFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.ConfigConstants;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.WindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
public class WatermarkDemo {
public static void main(String[] args) throws Exception {
Configuration config = new Configuration();
// config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true); //这个地方为了在idea中运行能看到webui的修改,然并卵
// StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(config);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
// env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
// env.getConfig().setAutoWatermarkInterval(1000L);
env.setParallelism(1);
// DataStreamSource<String> data = env.socketTextStream("hdp-1", 7777);
DataStreamSource<String> data = env.socketTextStream("hdp-1", 7777);
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
//只执行一遍
// System.out.println("----------------------" + data.toString());
SingleOutputStreamOperator<Tuple2<String, Long>> maped = data.map(new MapFunction<String, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> map(String value) throws Exception {
String[] split = value.split(",");
return new Tuple2<String, Long>(split[0], Long.valueOf(split[1]));
}
});
SingleOutputStreamOperator<Tuple2<String, Long>> watermarks = maped.assignTimestampsAndWatermarks(new WatermarkStrategy<Tuple2<String, Long>>() {
@Override
public WatermarkGenerator<Tuple2<String, Long>> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {
return new WatermarkGenerator<Tuple2<String, Long>>() {
private long maxTimeStamp = Long.MIN_VALUE;
@Override
public void onEvent(Tuple2<String, Long> event, long eventTimestamp, WatermarkOutput output) {
maxTimeStamp = Math.max(maxTimeStamp, event.f1);
System.out.println("maxTimeStamp:" + maxTimeStamp + "...format:" + sdf.format(maxTimeStamp));
}
@Override
public void onPeriodicEmit(WatermarkOutput output) {
// System.out.println(".....onPeriodicEmit....");
long maxOutOfOrderness = 1000;
Watermark watermark = new Watermark(maxTimeStamp - maxOutOfOrderness);
// System.out.println("水印时间:"+watermark.getTimestamp()+",eventtime="+eventtime);
output.emitWatermark(watermark);
}
};
}
}.withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
@Override
public long extractTimestamp(Tuple2<String, Long> element, long recordTimestamp) {
return element.f1;
}
}));
KeyedStream<Tuple2<String, Long>, String> keyed = watermarks.keyBy(value -> value.f0);
// System.out.println("...keyed:" + keyed);
WindowedStream<Tuple2<String, Long>, String, TimeWindow> windowed = keyed.window(TumblingEventTimeWindows.of(Time.seconds(5)));
// WindowedStream<Tuple2<String, Long>, String, TimeWindow> windowed = keyed.timeWindow(Time.seconds(5));
SingleOutputStreamOperator<String> result = windowed.apply(new WindowFunction<Tuple2<String, Long>, String, String, TimeWindow>() {
@Override
public void apply(String s, TimeWindow window, Iterable<Tuple2<String, Long>> input, Collector<String> out) throws Exception {
System.out.println("..." + sdf.format(window.getStart()));
String key = s;
Iterator<Tuple2<String, Long>> iterator = input.iterator();
ArrayList<Long> list = new ArrayList<>();
while (iterator.hasNext()) {
Tuple2<String, Long> next = iterator.next();
list.add(next.f1);
}
Collections.sort(list);
String result = "key:" + key + "..." + "list.size:" + list.size() + "...list.first:" + sdf.format(list.get(0)) + "...list.last:" + sdf.format(list.get(list.size() - 1)) + "...window.start:" + sdf.format(window.getStart()) + "..window.end:" + sdf.format(window.getEnd());
out.collect(result);
}
});
result.print();
env.execute();
}
}
注意问题:
1、设置并行度为1
2、先启动nc,并发送数据,哪怕直发送一条数据(为什么?)
2021/1/11
window窗口的触发机制
窗口分配器—TumblingEventTimeWindows.of(Time.seconds(5)) 没5秒钟生成一个滚动窗口
WindowedStream<Tuple2<String, Long>, String, TimeWindow> windowed = keyed.window(TumblingEventTimeWindows.of(Time.seconds(5)));
具体怎么分配的窗口?
A {@link WindowAssigner} that windows elements into windows based on the timestamp of the
elements. Windows cannot overlap.
一个{@link WindowAssigner},根据元素的时间戳将元素添加到windows中。窗户不能重叠。
首先去追踪看下 TumblingEventTimeWindows 的 trigger :
@Override
public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
return EventTimeTrigger.create();
}
可以看到使用的是 EventTimeTrigger,继续追到里面看看触发逻辑:
@Override
public TriggerResult onElement(Object element, long timestamp, TimeWindow window, TriggerContext ctx) throws Exception {
if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
// if the watermark is already past the window fire immediately
return TriggerResult.FIRE;
} else {
ctx.registerEventTimeTimer(window.maxTimestamp());
return TriggerResult.CONTINUE;
}
}
@Override
public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) {
return time == window.maxTimestamp() ?
TriggerResult.FIRE :
TriggerResult.CONTINUE;
}
从触发器里面知道,只有调用 onElement 和 onEventTime 时才有肯能会触发 FIRE。
onElement
先看 onElement 函数,这个函数是数据流中每来一条消息都会调用的,它的逻辑是:
如果窗口最大时间小于等于当前的水印时间,则触发计算
否则,注册一个定时器
2021.1.13
终于找到原因了-------越界!!!!
private long maxTimeStamp = Long.MIN_VALUE;
在这里把maxTimeStamp初始值设置成了Long类型中的最小值
Watermark watermark = new Watermark(maxTimeStamp - maxOutOfOrderness);
在这里又用这个最小值减去了个1000,结果出的结果是错的,错的很巧,贴图记录一下:
这个十六进制的值
0x8000000000000000
转成十进制是
-9223372036854775808
正常应该:
Watermark watermark = new Watermark(maxTimeStamp - maxOutOfOrderness);
maxTimeStamp - maxOutOfOrderness
-9223372036854775808 - 1000 = -9223372036854776808
然而结果却是:
maxTimeStamp - maxOutOfOrderness
9223372036854775808 - 1000 = 9223372036854774808
所以造成水印:时间为9223372036854774808
这个时间戳是未来不知道多少亿年以后了,所以:
output.emitWatermark(watermark);跟新的新水印时间被flink忽略了,不生效!!
老老实实的把代码改成:
private long maxTimeStamp = 0L;