Flink开发-滑动窗口SlidingWindows
滑动窗口是按照时间划分的窗口,其Assinger会将输入的每一条数据按照时间分配到固定长度的窗口内,并且还可以指定一个额外的滑动参数用来指定窗口滑动的频率(也叫滑动步长),因此当滑动步长小于窗口的长度时,窗口和窗口之间有数据重叠。
1.Non-Keyed Sliding Windows
SlidingWindows的of方法如果指定两个参数,第一个参数为窗口的长度,第二个为滑动的频率(或加滑动步长)。例如SlidingEventTimeWindows.of(Time.seconds(10), Time.seconds(5)),那么窗口的起始时间是以数据对应的EventTime并且是滑动步长的整数倍为单位对齐。例如[1:00:00.000, 1:00:09.999]对应一个窗口,[1:00:05.000, 1:00:14.999]会对应下一个窗口,两窗口有数据重叠,并且会不断的生成窗口。
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> socketStream = env.socketTextStream("localhost", 8888);
SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndOne = socketStream.map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String s) throws Exception {
return Tuple2.of(s, 1);
}
});
//老API
// env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
// SingleOutputStreamOperator<Tuple2<String, Integer>> sum = wordAndOne.timeWindowAll(Time.seconds(10),Time.seconds(5)).sum(1);
SingleOutputStreamOperator<Tuple2<String, Integer>> reduce = wordAndOne.windowAll(SlidingProcessingTimeWindows.of(Time.seconds(10), Time.seconds(5))).reduce(new ReduceFunction<Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> reduce(Tuple2<String, Integer> s1, Tuple2<String, Integer> s2) throws Exception {
s1.f1 = s1.f1 + s2.f1;
return s1;
}
});
reduce.print();
env.execute("");
}
输入内容:
C:\Users\zhibai>nc -lp 8888
hadoop
flink
spark --5s
spark
spark
spark
flink --5s
flink --5s
flink --5s
输出结果:
6> (hadoop,3)
7> (hadoop,7)
8> (spark,6)
1> (flink,3)
2> (flink,1)
2.Keyed Sliding Windows
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> socketStream = env.socketTextStream("localhost", 8888);
SingleOutputStreamOperator<Tuple2<String, Integer>> wordAndOne = socketStream.map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String s) throws Exception {
String[] fields = s.split(" ");
return Tuple2.of(fields[0], Integer.parseInt(fields[1]));
}
});
KeyedStream<Tuple2<String, Integer>, String> keyedStream = wordAndOne.keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
@Override
public String getKey(Tuple2<String, Integer> s) throws Exception {
return s.f0;
}
});
WindowedStream<Tuple2<String, Integer>, String, TimeWindow> processingwindow = keyedStream.window(SlidingProcessingTimeWindows.of(Time.seconds(30),Time.seconds(10)));
processingwindow.sum(1).print();
env.execute("");
}
输入内容:
C:\Users\zhibai>nc -lp 8888
hadoop 1
flink 1 --10s
flink 1
spark 3
spark 2
hadoop 3
flink 5 --10s
spark 3 --10s
flink 4
spark 3 --10s
flink 1
输出结果:
7> (flink,1)
8> (hadoop,1)
7> (flink,7)
8> (hadoop,4)
1> (spark,5)
8> (hadoop,4)
1> (spark,8)
7> (flink,7)
7> (flink,11)
8> (hadoop,3)
1> (spark,11)
1> (spark,6)
7> (flink,5)
1> (spark,3)
7> (flink,5)
SlidingWindows的of方法还可以传入3个参数,第三个参数的作用是将时间调整成指定时区的时间。在UTC-0以外的时区,就需要指定一个偏移量进行调整。例如,在中国就必须指定Time.hours(-8)的偏移量。
//窗口长度为12小时,1小时滑动一次,同时将数据转换成对应的时区的时间
WindowedStream<Tuple2<String, Integer>, String, TimeWindow> eventTimewindow = keyedStream.window(SlidingProcessingTimeWindows.of(Time.hours(12), Time.hours(1), Time.hours(-8)));