目录
定义窗口分配器是构建窗口算子的第一步,窗口分配器其实就是在指定窗口的类型。窗口按驱动类型可以分为时间窗口和计数窗口,按照具体的规则,还可以细分为滚动窗口,滑动窗口,会话窗口,全局窗口四种。
除去需要自定义的全局窗口,其余类型Flink都给出了内置的分配器,下面就简单进行演示内置的分配器如何使用。
计数窗口
计数窗口本身基于全局窗口来实现,直接调用countWindow方法即可。下面演示滚动计数窗口和滑动计数窗口的具体实现。
滚动计数窗口
只需要对于keyby后的输入流调用countWindow() 方法,传入一个参数,即窗口大小
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
DataStreamSource<String> sensorDS = env.socketTextStream("node1", 7777);
KeyedStream<WaterSensor, String> sensorKS = sensorDS
.map(new WaterSensorMapFunction())
.keyBy(new KeySelector<WaterSensor, String>() {
@Override
public String getKey(WaterSensor waterSensor) throws Exception {
return waterSensor.getId();
}
});
WindowedStream<WaterSensor, String, GlobalWindow> sensorWS = sensorKS
// 滚动窗口,只传一个参数,窗口长度为5条数据
.countWindow(5);
// 滚动窗口,传两个参数,第二个参数为滑动步长
SingleOutputStreamOperator<String> process = sensorWS.process(new ProcessWindowFunction<WaterSensor, String, String, GlobalWindow>() {
@Override
public void process(String s, Context context, Iterable<WaterSensor> iterable, Collector<String> collector) throws Exception {
long maxTs = context.window().maxTimestamp();
String maxTime = DateFormatUtils.format(maxTs, "yyyy-MM-dd HH:mm:ss.SSS");
long count = iterable.spliterator().estimateSize();
collector.collect("key=" + s + "的窗口最大时间=" + maxTime + ",包含" + count + "条数据===>" + iterable.toString());
}
});
process.print();
env.execute();
}
下面来看一下滚动计数窗口的效果
滑动计数窗口
和滚动窗口一样,也直接调用countWindow()函数即可,不过需要填写两个参数,第一个为窗口大小,第二个为滑动步长。
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
DataStreamSource<String> sensorDS = env.socketTextStream("node1", 7777);
KeyedStream<WaterSensor, String> sensorKS = sensorDS
.map(new WaterSensorMapFunction())
.keyBy(new KeySelector<WaterSensor, String>() {
@Override
public String getKey(WaterSensor waterSensor) throws Exception {
return waterSensor.getId();
}
});
WindowedStream<WaterSensor, String, GlobalWindow> sensorWS = sensorKS
// 滚动窗口,只传一个参数,窗口长度为5条数据
// .countWindow(5);
// 滚动窗口,传两个参数,第二个参数为滑动步长
.countWindow(5,2);
SingleOutputStreamOperator<String> process = sensorWS.process(new ProcessWindowFunction<WaterSensor, String, String, GlobalWindow>() {
@Override
public void process(String s, Context context, Iterable<WaterSensor> iterable, Collector<String> collector) throws Exception {
long maxTs = context.window().maxTimestamp();
String maxTime = DateFormatUtils.format(maxTs, "yyyy-MM-dd HH:mm:ss.SSS");
long count = iterable.spliterator().estimateSize();
collector.collect("key=" + s + "的窗口最大时间=" + maxTime + ",包含" + count + "条数据===>" + iterable.toString());
}
});
process.print();
env.execute();
}
可以看到,刚刚开始传输两条数据就生成了一个窗口。可以这样理解,每接受两条数据就会生成一个窗口,窗口最多容纳5条数据。
第三个窗口的数据是ts为2到6,印证了上述结论。
时间窗口
时间分为处理时间和时间时间,该部分仅展示处理时间。时间窗口可以分为滚动、滑动和会话三种类型,下面将一一实现。
滚动处理时间窗口
窗口分配器由类TumblingProcessingTimeWindows提供,需要调用它的静态方法.of()。
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
DataStreamSource<String> sensorDS = env.socketTextStream("node1", 7777);
KeyedStream<WaterSensor, String> sensorKS = sensorDS
.map(new WaterSensorMapFunction())
.keyBy(new KeySelector<WaterSensor, String>() {
@Override
public String getKey(WaterSensor waterSensor) throws Exception {
return waterSensor.getId();
}
});
WindowedStream<WaterSensor, String, TimeWindow> sensorWS = sensorKS
// 滚动窗口,传一个参数,即窗口大小为10s
.window(TumblingProcessingTimeWindows.of(Time.seconds(10)));
SingleOutputStreamOperator<String> process = sensorWS.process(new ProcessWindowFunction<WaterSensor, String, String, TimeWindow>() {
@Override
public void process(String s, Context context, Iterable<WaterSensor> iterable, Collector<String> collector) throws Exception {
// 上下文可以拿到window对象,还有其他东西:侧输出流 等等
long startTs = context.window().getStart();
long endTs = context.window().getEnd();
String windowStart = DateFormatUtils.format(startTs, "yyyy-MM-dd HH:mm:ss.SSS");
String windowEnd = DateFormatUtils.format(endTs, "yyyy-MM-dd HH:mm:ss.SSS");
long count = iterable.spliterator().estimateSize();
collector.collect("key=" + s + "的窗口[" + windowStart + "," + windowEnd + ")包含" + count + "条数据===>" + iterable.toString());
}
});
process.print();
env.execute();
}
of后也可以加上第二个参数,表示偏移量。
WindowedStream<WaterSensor, String, TimeWindow> sensorWS = sensorKS
// 滚动窗口,传一个参数,即窗口大小为10s
// 也可传入第二个参数,偏移量
.window(TumblingProcessingTimeWindows.of(Time.seconds(10),Time.seconds(2)));
添加后运行结果如下:
滑动处理时间窗口
窗口分配器由类SlidingProcessingTimeWindows提供,同样需要调用它的静态方法.of()。of里传入两个参数,按顺序分别表示窗口大小,滑动步长。
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
DataStreamSource<String> sensorDS = env.socketTextStream("node1", 7777);
KeyedStream<WaterSensor, String> sensorKS = sensorDS
.map(new WaterSensorMapFunction())
.keyBy(new KeySelector<WaterSensor, String>() {
@Override
public String getKey(WaterSensor waterSensor) throws Exception {
return waterSensor.getId();
}
});
WindowedStream<WaterSensor, String, TimeWindow> sensorWS = sensorKS
// 滚动窗口,传一个参数,即窗口大小为10s
// 也可传入第二个参数,偏移量
// .window(TumblingProcessingTimeWindows.of(Time.seconds(10),Time.seconds(2)));
// 滑动窗口,第一个参数为窗口大小,第二个参数为滑动步长
.window(SlidingProcessingTimeWindows.of(Time.seconds(10),Time.seconds(5)));
SingleOutputStreamOperator<String> process = sensorWS.process(new ProcessWindowFunction<WaterSensor, String, String, TimeWindow>() {
@Override
public void process(String s, Context context, Iterable<WaterSensor> iterable, Collector<String> collector) throws Exception {
// 上下文可以拿到window对象,还有其他东西:侧输出流 等等
long startTs = context.window().getStart();
long endTs = context.window().getEnd();
String windowStart = DateFormatUtils.format(startTs, "yyyy-MM-dd HH:mm:ss.SSS");
String windowEnd = DateFormatUtils.format(endTs, "yyyy-MM-dd HH:mm:ss.SSS");
long count = iterable.spliterator().estimateSize();
collector.collect("key=" + s + "的窗口[" + windowStart + "," + windowEnd + ")包含" + count + "条数据===>" + iterable.toString());
}
});
process.print();
env.execute();
}
观察处理时间,每5s生成一个窗口,一个窗口包含最近10s的数据。
同理,滑动窗口也可以加上偏移量的参数,与滚动窗口基本一样,这里不再演示。
处理时间会话窗口
窗口分配器由类ProcessingTimeSessionWindows提供,需要调用它的静态方法.withGap()或者.withDynamicGap()。
这里.withGap()方法需要传入一个Time类型的参数size,表示会话的超时时间,也就是最小间隔session gap。距离下一次输入数据的时间间隔大于5s,就会关闭窗口。
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
DataStreamSource<String> sensorDS = env.socketTextStream("node1", 7777);
KeyedStream<WaterSensor, String> sensorKS = sensorDS
.map(new WaterSensorMapFunction())
.keyBy(new KeySelector<WaterSensor, String>() {
@Override
public String getKey(WaterSensor waterSensor) throws Exception {
return waterSensor.getId();
}
});
WindowedStream<WaterSensor, String, TimeWindow> sensorWS = sensorKS
// 滚动窗口,传一个参数,即窗口大小为10s
// 也可传入第二个参数,偏移量
// .window(TumblingProcessingTimeWindows.of(Time.seconds(10),Time.seconds(2)));
// 滑动窗口,第一个参数为窗口大小,第二个参数为滑动步长
// .window(SlidingProcessingTimeWindows.of(Time.seconds(10),Time.seconds(5)));
// 会话窗口
.window(ProcessingTimeSessionWindows.withGap(Time.seconds(5)));
SingleOutputStreamOperator<String> process = sensorWS.process(new ProcessWindowFunction<WaterSensor, String, String, TimeWindow>() {
@Override
public void process(String s, Context context, Iterable<WaterSensor> iterable, Collector<String> collector) throws Exception {
// 上下文可以拿到window对象,还有其他东西:侧输出流 等等
long startTs = context.window().getStart();
long endTs = context.window().getEnd();
String windowStart = DateFormatUtils.format(startTs, "yyyy-MM-dd HH:mm:ss.SSS");
String windowEnd = DateFormatUtils.format(endTs, "yyyy-MM-dd HH:mm:ss.SSS");
long count = iterable.spliterator().estimateSize();
collector.collect("key=" + s + "的窗口[" + windowStart + "," + windowEnd + ")包含" + count + "条数据===>" + iterable.toString());
}
});
process.print();
env.execute();
}
在上图中可以看到,窗口的时间口径并不一致。
另外,还可以调用withDynamicGap()方法定义session gap的动态提取逻辑。
每来一条数据,就将这条数据的ts值作为最小时间间隔。
WindowedStream<WaterSensor, String, TimeWindow> sensorWS = sensorKS
// 会话窗口,动态间隔,每条来的数据都会更新 间隔时间
.window(ProcessingTimeSessionWindows.withDynamicGap(new SessionWindowTimeGapExtractor<WaterSensor>() {
@Override
public long extract(WaterSensor waterSensor) {
return waterSensor.getTs()*1000L;
}
}));