Flink Time中事件时间案例详解

  • 需求:基于事件时间EventTime Tumbling Window窗口【5秒】,进行聚合统计:WordCount。
`准备数据`
1000,a,1
2000,a,1
5000,a,1
9999,a,1
11000,a,2
14000,b,1
14999,b,1
12345678
  • 如果使用基于事件时间EventTime窗口统计,需要如下三个步骤:

    Flink%20Time%E4%B8%AD%E4%BA%8B%E4%BB%B6%E6%97%B6%E9%97%B4%E6%A1%88%E4%BE%8B%E8%AF%A6%E8%A7%A3%20de6c7f5d6cf2480692ed9c7b839fe694/20210308215019188.png

package com.itszt.flink.task;

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;

import java.util.Arrays;
import java.util.List;
import java.util.Random;
import java.util.concurrent.TimeUnit;

/**
 * @DESC [掌握]-Flink Time之事件时间案例【编程】
 * 需求:==基于事件时间EventTime Tumbling Window窗口【5秒】,进行聚合统计:WordCount。==  滚动事件时间窗口
 * 窗口统计案例演示:滚动事件时间窗口(Tumbling EventTime Window),窗口内数据进行词频统计
 */
public class StreamTumblingEventTimeWindow {
    public static void main(String[] args) throws Exception {
        //1-环境准备
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // TODO: step1. 设置时间语义为事件时间EventTime
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        //2-数据源source
        //DataStreamSource<String> inputDataStream = env.socketTextStream("localhost", 9999);
        DataStreamSource<String> inputDataStream = env.addSource(new SourceFunction<String>() {
            @Override
            public void run(SourceContext<String> sourceContext) throws Exception {

                //1000, 1500, 2000, 2500, 3000, 3500, 4000


                //时间戳,字符
                long startTime = 1000;

                List<String> list = Arrays.asList("a", "b");
                Random random = new Random();
                while (true) {
                    int index = random.nextInt(2);
                    sourceContext.collect(startTime + "," + list.get(index));
                    startTime += 500;
                    TimeUnit.MILLISECONDS.sleep(10);
                }
            }

            @Override
            public void cancel() {}
        });

        inputDataStream.print();
        // TODO: step2. 设置事件时间字段,数据类型必须为Long类型
        SingleOutputStreamOperator<String> timeDataStream = inputDataStream
                // 此时,不允许数据延迟,如果延迟,就不处理数据
                .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<String>(Time.seconds(2)) {
                    @Override
                    public long extractTimestamp(String line) {
                        return Long.parseLong(line.split(",")[0]);
                    }
                });

        //3-数据的transformation
        SingleOutputStreamOperator<String> sumDataStream = timeDataStream
                // 先分组
                .keyBy((KeySelector<String, Object>) s -> s.split(",")[1])  // 元组数据类型是,使用下标索引
                // TODO: step3. 设置窗口: 5秒   左闭又开区间
                //.window(TumblingProcessingTimeWindows.of(Time.seconds(3)))
                .timeWindow(Time.seconds(3))
                // 窗口内数据聚合
                .process(new ProcessWindowFunction<String, String, Object, TimeWindow>() {
                    @Override
                    public void process(Object o, ProcessWindowFunction<String, String, Object, TimeWindow>.Context context, Iterable<String> elements, Collector<String> out) throws Exception {
                        System.out.println("elements = " + elements);
                    }
                });

        //5-执行器execute
        env.execute();

        TimeUnit.SECONDS.sleep(10);
    }
}

Flink%20Time%E4%B8%AD%E4%BA%8B%E4%BB%B6%E6%97%B6%E9%97%B4%E6%A1%88%E4%BE%8B%E8%AF%A6%E8%A7%A3%20de6c7f5d6cf2480692ed9c7b839fe694/20210308215928349.png

Flink%20Time%E4%B8%AD%E4%BA%8B%E4%BB%B6%E6%97%B6%E9%97%B4%E6%A1%88%E4%BE%8B%E8%AF%A6%E8%A7%A3%20de6c7f5d6cf2480692ed9c7b839fe694/20210308220533761.png

2. Flink Time中EventTime窗口起始时间确定?

  • 基于事件时间窗口分析时,第一个窗口的起始时间是如何确定的呢??
第一条数据:1970-01-01 08:00:01,a,1
	第一个窗口起始时间:1970-01-01 08:00:00

第一条数据:1970-01-01 08:18:31,a,4
	第一个窗口起始时间:1970-01-01 08:18:30

Flink%20Time%E4%B8%AD%E4%BA%8B%E4%BB%B6%E6%97%B6%E9%97%B4%E6%A1%88%E4%BE%8B%E8%AF%A6%E8%A7%A3%20de6c7f5d6cf2480692ed9c7b839fe694/20210308220921430.png

假设第一条数据:1000,a,3,那么计算第一个窗口起始时间:1970-01-01 08:00:00

Flink%20Time%E4%B8%AD%E4%BA%8B%E4%BB%B6%E6%97%B6%E9%97%B4%E6%A1%88%E4%BE%8B%E8%AF%A6%E8%A7%A3%20de6c7f5d6cf2480692ed9c7b839fe694/2021030822095114.png

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

繁星-赵老师

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值