Flink静态Session Windows

1.Session Windows
会话窗口分配器按活动会话对元素进行分组。与翻滚窗口和滑动窗口相比,会话窗口不重叠并且没有固定的开始和结束时间。当会话窗口在一段时间内没有接收到元素时,即当发生不活动的间隙时,会话窗口关闭。会话窗口分配器可以设置静态会话间隙和动态会话间隙(其中会话间隙为session gap)。如下图:
session
2.使用Flink自带的示例:

package org.apache.flink.streaming.examples.windowing;

import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.api.watermark.Watermark;
import org.apache.flink.streaming.api.windowing.assigners.EventTimeSessionWindows;
import org.apache.flink.streaming.api.windowing.time.Time;

import java.util.ArrayList;
import java.util.List;

/**
 * An example of session windowing that keys events by ID and groups and counts them in
 * session with gaps of 3 milliseconds.
 */
public class SessionWindowing {

	@SuppressWarnings("serial")
	public static void main(String[] args) throws Exception {

		final ParameterTool params = ParameterTool.fromArgs(args);
		final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

		env.getConfig().setGlobalJobParameters(params);
		env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
		env.setParallelism(1);

		final boolean fileOutput = params.has("output");

		final List<Tuple3<String, Long, Integer>> input = new ArrayList<>();

		input.add(new Tuple3<>("c", 1L, 1));
		input.add(new Tuple3<>("b", 1L, 1));
		input.add(new Tuple3<>("b", 2L, 1));
		input.add(new Tuple3<>("b", 5L, 1));
		input.add(new Tuple3<>("c", 6L, 1));
		// We expect to detect the session "a" earlier than this point (the old
		// functionality can only detect here when the next starts)
		input.add(new Tuple3<>("a", 10L, 1));
		// We expect to detect session "b" and "c" at this point as well
		input.add(new Tuple3<>("c", 11L, 1));

		DataStream<Tuple3<String, Long, Integer>> source = env
				.addSource(new SourceFunction<Tuple3<String, Long, Integer>>() {
					private static final long serialVersionUID = 1L;

					@Override
					public void run(SourceContext<Tuple3<String, Long, Integer>> ctx) throws Exception {
						for (Tuple3<String, Long, Integer> value : input) {

							if (value.f1==2) {
								ctx.collectWithTimestamp(value, value.f1 + 2);
								ctx.emitWatermark(new Watermark(value.f1 + 2));
							}
							else
							{
								ctx.collectWithTimestamp(value, value.f1);
								ctx.emitWatermark(new Watermark(value.f1 ));
							}

						}
						ctx.emitWatermark(new Watermark(Long.MAX_VALUE));
					}
					@Override
					public void cancel() {
					}
				});
		DataStream<Tuple3<String, Long, Integer>> aggregated = source
				.keyBy(0)
				.window(EventTimeSessionWindows.withGap(Time.milliseconds(2L)))
				.sum(2);

		if (fileOutput) {
			aggregated.writeAsText(params.get("output"));
		} else {
			System.out.println("Printing result to stdout. Use --output to specify output path.");
			aggregated.print();
		}

		env.execute();
	}
}

原始数据:
input
自定义source并且定义了eventtime和Watermark
source
窗口函数时间指定为eventtime,会话间隔sessionggap=2秒
window
3 结果:
result1
result
这里解释一下key=b,当(“b”, 1L, 1)到达创建一个窗口(“b”, 1L, 1),eventtime(1),watermark(1),当(“b”, 2L, 1)到达创建一个窗口(“b”, 2L, 1),eventtime(4),watermark(4),由于watermark(4)>eventtime(1)+sessiongap(2)于是触发计算(b,1,1)并关闭窗口(“b”, 1L, 1)。当(“b”, 5L, 1)到达创建一个窗口(“b”, 2L, 1),eventtime(5),watermark(5),由于watermark(5)<eventtime(4)+sessiongap(2)于是触发两个窗口合并操作(“b”,2,1+1)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值