窗口模式简述:
可以把无限的数据流进行切分为有线流的一种方式,它会将流数据分发到有限大小桶中进行分析。
类型:
时间窗口:按照时间段进行分桶,例如 8点-9点,9点-10点
计数窗口:按照数据的个数进行分桶
操作:
Flink中我们用.window()来定义一个窗口,然后用这个window去做一些聚合或者其他处理操作。window()方法必须在keyBy之后才能使用。
Flink提供了 .timeWindow 和 .countWindow来定义时间窗口跟计数窗口。
计算:
增量聚合函数:每条数据到来就计算
全窗口函数:先把窗口所有数据收集起来,等到计算时遍历所有数据
demo
增量聚合模式(每来一次执行一次,最后到时间会输出)
package window;
import beans.SensorReading;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.EventTimeSessionWindows;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
public class WindowTest1 {
public static void main(String[] args) throws Exception{
//获取当前执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//设置并行度
env.setParallelism(1);
//从文件读取数据 ***demo模拟的文本流处理,其实不该用文本流处理,因为读取文本根本不许要分桶,速度过快了
DataStream<String> inputStream = env.readTextFile("D:\\idle\\FlinkTest\\src\\main\\resources");
//转换成SensorReading类型
DataStream<SensorReading> dataStream = inputStream.map(new MapFunction<String, SensorReading>() {
public SensorReading map(String line) throws Exception {
String[] fields = line.split(",");
return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
}
});
//开窗测试 聚合操作 15秒分桶
DataStream resultStream = dataStream.keyBy("id").timeWindow(Time.seconds(15)).aggregate(new AggregateFunction<SensorReading, Integer, Integer>() {
public Integer createAccumulator() {
return 0;//初始值
}
public Integer add(SensorReading sensorReading, Integer integer) {
return integer + 1; //每次来了+1
}
public Integer getResult(Integer integer) {
return integer; //返回的结果
}
public Integer merge(Integer integer, Integer acc1) {
return null; //用在合并操作
}
});
// .timeWindow(Time.seconds(15));//滚动时间窗口 将数据依据固定的窗口长度对数据进行切分,没有重叠
// .timeWindow(Time.seconds(15), Time.seconds(5))//滑动窗口由固定的窗口长度和滑动间隔组成。可以有重叠。
// .window(EventTimeSessionWindows.withGap(Time.minutes(10)));//会话窗口
resultStream.print();
env.execute();
}
}
全窗口模式(所有数据结束之后再计算然后输出结果)
package window;
import beans.SensorReading;
import org.apache.commons.collections.IteratorUtils;
import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.WindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.EventTimeSessionWindows;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
public class WindowTest1 {
public static void main(String[] args) throws Exception{
//获取当前执行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//设置并行度
env.setParallelism(1);
//从文件读取数据 ***demo模拟的文本流处理,其实不该用文本流处理,因为读取文本根本不许要分桶,速度过快了
DataStream<String> inputStream = env.readTextFile("D:\\idle\\FlinkTest\\src\\main\\resources");
//转换成SensorReading类型
DataStream<SensorReading> dataStream = inputStream.map(new MapFunction<String, SensorReading>() {
public SensorReading map(String line) throws Exception {
String[] fields = line.split(",");
return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
}
});
//全局窗口,所有的
dataStream.keyBy("id").timeWindow(Time.seconds(15)).apply(new WindowFunction<SensorReading, Integer, Tuple, TimeWindow>() {
public void apply(Tuple tuple, TimeWindow window, Iterable<SensorReading> input, Collector<Integer> out) throws Exception {
String id = tuple.getField(0); //可以取值
Long end = window.getEnd();//取结束窗口的时间戳
Integer count = IteratorUtils.toList(input.iterator()).size(); //直接从元组中取值
out.collect(count);
}
}).print();
env.execute();
}
}