Flink countWindow 使用

1. 说明

countWindows 包括滚动窗口类型和滑动窗口类型。以下通过代码和输出来说明 countWindows()逻辑。数据源代码:

    public static class StreamDataSource extends RichParallelSourceFunction<Tuple2<String, String>> {
        private volatile boolean running = true;

        @Override
        public void run(SourceContext<Tuple2<String, String>> ctx) throws InterruptedException {

            Tuple2[] elements = new Tuple2[]{
                Tuple2.of("a", "1"),
                Tuple2.of("a", "2"),
                Tuple2.of("a", "3"),
                Tuple2.of("a", "4"),
                Tuple2.of("a", "5"),
                Tuple2.of("a", "6"),
                Tuple2.of("b", "7"),
                Tuple2.of("b", "8"),
                Tuple2.of("b", "9"),
                Tuple2.of("b", "0")
            };

            int count = 0;
            while (running && count < elements.length) {
                ctx.collect(new Tuple2<>((String) elements[count].f0, (String) elements[count].f1));
                count++;
                Thread.sleep(1000);
            }
        }

        @Override
        public void cancel() {
            running = false;
        }
    }

2. 滚动计数窗口

设置 windowSize=3,即每3个元素进来计算一次,这个好像比较简单,不额外说明了。

2.1 代码

import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;

public class FlinkCountWindowDemo {

    public static void main(String[] args) throws Exception {
        final ParameterTool params = ParameterTool.fromArgs(args);
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.getConfig().setGlobalJobParameters(params);
        env.setParallelism(1);
        final int windowSize = params.getInt("window", 3);

        // read source data
        DataStreamSource<Tuple2<String, String>> inStream = env.addSource(new StreamDataSource());

        // calculate
        DataStream<Tuple2<String, String>> outStream = inStream
                                                           .keyBy(0)
                                                           .countWindow(windowSize)
                                                           .reduce(
                                                               new ReduceFunction<Tuple2<String, String>>() {
                                                                   @Override
                                                                   public Tuple2<String, String> reduce(Tuple2<String, String> value1, Tuple2<String, String> value2) throws Exception {
                                                                       return Tuple2.of(value1.f0, value1.f1 + "" + value2.f1);
                                                                   }
                                                               }
                                                           );
        outStream.print();
        env.execute("WindowWordCount");
    }
}

2.2 输出

(a,123)
(a,456)
(b,789)

2.3 说明

结果显示丢了一条数据Tuple2.of("b", "0"),因为最后一条数据已经无法被触发计算了。输出符合预期。

3. 滑动计数窗口

盗用Flink 原理与实现:Window 机制中的一张图,假设有一个滑动计数窗口,每2个元素计算一次最近4个元素的总和,那么窗口工作示意图如下所示:
这里写图片描述

以下代码是字符串相加示例,和这个图的逻辑几乎一致。以下输出正好可以验证这个图。

3.1 代码

import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;

public class FlinkCountWindowDemo {

    public static void main(String[] args) throws Exception {
        final ParameterTool params = ParameterTool.fromArgs(args);
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.getConfig().setGlobalJobParameters(params);
        env.setParallelism(1);
        final int windowSize = params.getInt("window", 3);
        final int slideSize = params.getInt("slide", 2);

        // read source data
        DataStreamSource<Tuple2<String, String>> inStream = env.addSource(new StreamDataSource());

        // calculate
        DataStream<Tuple2<String, String>> outStream = inStream
                                                           .keyBy(0)
                                                           .countWindow(windowSize, slideSize)
                                                           .reduce(
                                                               new ReduceFunction<Tuple2<String, String>>() {
                                                                   @Override
                                                                   public Tuple2<String, String> reduce(Tuple2<String, String> value1, Tuple2<String, String> value2) throws Exception {
                                                                       return Tuple2.of(value1.f0, value1.f1 + "" + value2.f1);
                                                                   }
                                                               }
                                                           );
        outStream.print();
        env.execute("WindowWordCount");
    }
}

3.2 输出

(a,12)
(a,234)
(a,456)
(b,78)
(b,890)

3.3 说明

其实就是每进来两个元素,就对最近的三个元素计算一遍,结果符合预期的。

  • 8
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值