- 需求:基于事件时间EventTime Tumbling Window窗口【5秒】,进行聚合统计:WordCount。
`准备数据`
1000,a,1
2000,a,1
5000,a,1
9999,a,1
11000,a,2
14000,b,1
14999,b,1
12345678
-
如果使用基于事件时间
EventTime
窗口统计,需要如下三个步骤:
package com.itszt.flink.task;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.util.Arrays;
import java.util.List;
import java.util.Random;
import java.util.concurrent.TimeUnit;
/**
* @DESC [掌握]-Flink Time之事件时间案例【编程】
* 需求:==基于事件时间EventTime Tumbling Window窗口【5秒】,进行聚合统计:WordCount。== 滚动事件时间窗口
* 窗口统计案例演示:滚动事件时间窗口(Tumbling EventTime Window),窗口内数据进行词频统计
*/
public class StreamTumblingEventTimeWindow {
public static void main(String[] args) throws Exception {
//1-环境准备
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// TODO: step1. 设置时间语义为事件时间EventTime
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
//2-数据源source
//DataStreamSource<String> inputDataStream = env.socketTextStream("localhost", 9999);
DataStreamSource<String> inputDataStream = env.addSource(new SourceFunction<String>() {
@Override
public void run(SourceContext<String> sourceContext) throws Exception {
//1000, 1500, 2000, 2500, 3000, 3500, 4000
//时间戳,字符
long startTime = 1000;
List<String> list = Arrays.asList("a", "b");
Random random = new Random();
while (true) {
int index = random.nextInt(2);
sourceContext.collect(startTime + "," + list.get(index));
startTime += 500;
TimeUnit.MILLISECONDS.sleep(10);
}
}
@Override
public void cancel() {}
});
inputDataStream.print();
// TODO: step2. 设置事件时间字段,数据类型必须为Long类型
SingleOutputStreamOperator<String> timeDataStream = inputDataStream
// 此时,不允许数据延迟,如果延迟,就不处理数据
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<String>(Time.seconds(2)) {
@Override
public long extractTimestamp(String line) {
return Long.parseLong(line.split(",")[0]);
}
});
//3-数据的transformation
SingleOutputStreamOperator<String> sumDataStream = timeDataStream
// 先分组
.keyBy((KeySelector<String, Object>) s -> s.split(",")[1]) // 元组数据类型是,使用下标索引
// TODO: step3. 设置窗口: 5秒 左闭又开区间
//.window(TumblingProcessingTimeWindows.of(Time.seconds(3)))
.timeWindow(Time.seconds(3))
// 窗口内数据聚合
.process(new ProcessWindowFunction<String, String, Object, TimeWindow>() {
@Override
public void process(Object o, ProcessWindowFunction<String, String, Object, TimeWindow>.Context context, Iterable<String> elements, Collector<String> out) throws Exception {
System.out.println("elements = " + elements);
}
});
//5-执行器execute
env.execute();
TimeUnit.SECONDS.sleep(10);
}
}
2. Flink Time中EventTime窗口起始时间确定?
- 基于事件时间窗口分析时,第一个窗口的起始时间是如何确定的呢??
第一条数据:1970-01-01 08:00:01,a,1
第一个窗口起始时间:1970-01-01 08:00:00
第一条数据:1970-01-01 08:18:31,a,4
第一个窗口起始时间:1970-01-01 08:18:30
假设第一条数据:1000,a,3
,那么计算第一个窗口起始时间:1970-01-01 08:00:00