【时间】2022.05.24 周二
【题目】【Flink入门(4)】Flink的Windows API
本专栏是尚硅谷Flink课程的笔记与思维导图。
目录
引言
window是一种切割无限数据为有限块进行处理的手段,核心是将一个无限的stream拆分成有限大小的”buckets”桶,常见的有滚动窗口和滑动窗口。
一、Window
1.1 概述
1.2 创建不同类型的窗口
滚动窗口(Tumbling Windows)
滑动窗口(Sliding Windows)
会话窗口(Session Windows)
二、Window API
2.1 开窗概述
2.2 window function
增量聚合函数(aggregate方法)例子
- 统计数据量:
package apitest.window;
import apitest.beans.SensorReading;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
public class WindowTest1_TimeWindow {
public static void main(String[] args) throws Exception {
// 创建执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 并行度设置1,方便看结果
env.setParallelism(1);
// // 从文件读取数据
// DataStream<String> dataStream = env.readTextFile("/tmp/Flink_Tutorial/src/main/resources/sensor.txt");
// 从socket文本流获取数据
DataStream<String> inputStream = env.socketTextStream("localhost", 7777);
// 转换成SensorReading类型
DataStream<SensorReading> dataStream = inputStream.map(line -> {
String[] fields = line.split(",");
return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
});
// 开窗测试
// 1. 增量聚合函数 (这里简单统计每个key组里传感器信息的总数)
DataStream<Integer> resultStream = dataStream.keyBy("id")
// .countWindow(10, 2);
// .window(EventTimeSessionWindows.withGap(Time.minutes(1)));
// .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
// .timeWindow(Time.seconds(15)) // 已经不建议使用@Deprecated
.window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
.aggregate(new AggregateFunction<SensorReading, Integer, Integer>() {
// 新建的累加器
@Override
public Integer createAccumulator() {
return 0;
}
// 每个数据在上次的基础上累加
@Override
public Integer add(SensorReading value, Integer accumulator) {
return accumulator + 1;
}
// 返回结果值
@Override
public Integer getResult(Integer accumulator) {
return accumulator;
}
// 分区合并结果(TimeWindow一般用不到,SessionWindow可能需要考虑合并)
@Override
public Integer merge(Integer a, Integer b) {
return a + b;
}
});
resultStream.print("result");
env.execute();
}
}
全窗口函数(apply方法)例子
- 统计数据量:
package apitest.window;
import apitest.beans.SensorReading;
import org.apache.commons.collections.IteratorUtils;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.WindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
public class WindowTest1_TimeWindow {
public static void main(String[] args) throws Exception {
// 创建执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 并行度设置1,方便看结果
env.setParallelism(1);
// // 从文件读取数据
// DataStream<String> dataStream = env.readTextFile("/tmp/Flink_Tutorial/src/main/resources/sensor.txt");
// 从socket文本流获取数据
DataStream<String> inputStream = env.socketTextStream("localhost", 7777);
// 转换成SensorReading类型
DataStream<SensorReading> dataStream = inputStream.map(line -> {
String[] fields = line.split(",");
return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
});
// 2. 全窗口函数 (WindowFunction和ProcessWindowFunction,后者更全面)
SingleOutputStreamOperator<Tuple3<String, Long, Integer>> resultStream2 = dataStream.keyBy(SensorReading::getId)
.window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
// .process(new ProcessWindowFunction<SensorReading, Object, Tuple, TimeWindow>() {
// })
.apply(new WindowFunction<SensorReading, Tuple3<String, Long, Integer>, String, TimeWindow>() {
@Override
public void apply(String key, TimeWindow window, Iterable<SensorReading> input, Collector<Tuple3<String, Long, Integer>> out) throws Exception {
String id = key;
long windowEnd = window.getEnd();
int count = IteratorUtils.toList(input.iterator()).size();
out.collect(new Tuple3<>(id, windowEnd, count));
}
});
resultStream2.print("result2");
env.execute();
}
}
三、其它可选API
例子:
// 3. 其他可选API
OutputTag<SensorReading> outputTag = new OutputTag<SensorReading>("late") {
};
SingleOutputStreamOperator<SensorReading> sumStream = dataStream.keyBy("id")
.timeWindow(Time.seconds(15))
// .trigger() // 触发器,一般不使用
// .evictor() // 移除器,一般不使用
.allowedLateness(Time.minutes(1)) // 允许1分钟内的迟到数据(比如数据产生时间在窗口范围内,但是要处理的时候已经超过窗口时间了,护城河1分钟)
.sideOutputLateData(outputTag) // 侧输出流,迟到超过1分钟的数据,收集于此
.sum("temperature"); // 侧输出流 对 温度信息 求和。
// 之后可以再用别的程序,把侧输出流的信息和前面窗口的信息聚合。(可以把侧输出流理解为用来批处理来补救处理超时数据)
四、Window API总览
总导图