Flink Tumbling Windows Join

前言

前面说了官网的实例其实就是Inner JoinWindows Join Example
GitHub地址:flink-learn
下面就开始说说Tumbling Windows Join翻滚窗口连接

官网翻译

Tumbling Window Join

当执行翻滚窗口连接时,具有公共密钥和公共翻滚窗口的所有元素以成对组合的形式连接并传递给JoinFunction或FlatJoinFunction。因为它的行为类似于内连接,所以不会发出一个流的元素,这些元素在其翻滚窗口中没有来自另一个流的元素!
在这里插入图片描述
如图所示,我们定义了一个大小为2毫秒的翻滚窗口,这导致了窗体的窗口[0,1], [2,3], …。图像显示了每个窗口中所有元素的成对组合,这些元素将被传递给JoinFunction。请注意,在翻滚窗口中[6,7]没有任何东西被发射,因为绿色流中不存在与橙色元素⑥和⑦连接的元素。

import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time; 
...
DataStream<Integer> orangeStream = ...
DataStream<Integer> greenStream = ...
orangeStream.join(greenStream)
    .where(<KeySelector>)
    .equalTo(<KeySelector>)
    .window(TumblingEventTimeWindows.of(Time.seconds(2)))
    .apply (new JoinFunction<Integer, Integer, String> (){
        @Override
        public String join(Integer first, Integer second) {
            return first + "," + second;
        }
    });

从上面的实例中可以看到关键词JoinFunction
那么我们就来写JoinFunction
还是老样子

  • 1.flink驱动搞起来
  /*
    flink驱动注册
    自己直接封装乘一个方法免得后面重复造轮子
     */
    public static StreamExecutionEnvironment getEnv() {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        env.setParallelism(1);
        return env;
    }
  • 2.设置数据源
DataStream<Tuple3<String, String, Long>> leftSource =
               getEnv().addSource(new StreamDataSource()).name("Demo Source");
       DataStream<Tuple3<String, String, Long>> rightSource =
               getEnv().addSource(new StreamDataSource1()).name("Demo Source");
static DataStream<Tuple3<String, String, Long>> getLeftStream() {
       // 设置数据源
   private static DataStream<Tuple3<String, String, Long>>
   getDataStream(DataStream<Tuple3<String, String, Long>> rightSource) {
       long delay = 5100L;
       // 设置水位线
       return rightSource.assignTimestampsAndWatermarks(
               new BoundedOutOfOrdernessTimestampExtractor<Tuple3<String, String, Long>>(Time.milliseconds(delay)) {
                   private static final long serialVersionUID = 518406720598977074L;

                   @Override
                   public long extractTimestamp(Tuple3<String, String, Long> element) {
                       return element.f2;
                   }
               }
       );
   }
  • 3.join操作
     // join 操作
        JoinUtil.getDataStream(leftSource).join(JoinUtil.getDataStream(rightSource))
                .where(new LeftSelectKey())
                .equalTo(new RightSelectKey())
                .window(TumblingEventTimeWindows.of(Time.seconds(windowSize)))
                .apply((JoinFunction<Tuple3<String, String, Long>, Tuple3<String, String, Long>,
                        Tuple5<String, String, String, Long, Long>>)
                        (first, second) -> new Tuple5<>(first.f0, first.f1, second.f1, first.f2, second.f2)).print();
    ```

* 4.执行程序
```java
getEnv().execute("TimeWindowDemo");

数据源

StreamDataSource
package com.king.learn.Flink.streaming.join.source;

import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
import static com.king.learn.Flink.streaming.join.JoinUtil.test;

/**
 * @Author: king
 * @Date: 2019-01-14
 * @Desc: TODO 数据源
 */

public class StreamDataSource extends RichParallelSourceFunction<Tuple3<String, String, Long>> {
    private volatile boolean running = true;
    @Override
    public void run(SourceContext<Tuple3<String, String, Long>> ctx) throws Exception {
        Tuple3[] elements = new Tuple3[]{
                Tuple3.of("a", "1", 1000000050000L),
                Tuple3.of("a", "2", 1000000054000L),
                Tuple3.of("a", "3", 1000000079900L),
                Tuple3.of("a", "4", 1000000115000L),
                Tuple3.of("b", "5", 1000000100000L),
                Tuple3.of("b", "6", 1000000108000L)
        };
        test(ctx, elements, running);
        }
    @Override
    public void cancel() {
        running = false;
    }
}
StreamDataSource1
package com.king.learn.Flink.streaming.join.source;

import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
import static com.king.learn.Flink.streaming.join.JoinUtil.test;

/**
 * @Author: king
 * @Date: 2019-01-14
 * @Desc: TODO
 */

public class StreamDataSource1 extends RichParallelSourceFunction<Tuple3<String, String, Long>> {
    private static final long serialVersionUID = -8338462943401114121L;
    private volatile boolean running = true;
    @Override
    public void run(SourceContext<Tuple3<String, String, Long>> ctx) throws Exception {
        Tuple3[] elements = new Tuple3[]{
                Tuple3.of("a", "hangzhou", 1000000059000L),
                Tuple3.of("b", "beijing", 1000000105000L),
        };
        test(ctx, elements, running);
    }
    @Override
    public void cancel() {
        running = false;
    }
}
StreamDataSource2
package com.king.learn.Flink.streaming.join.source;

import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import static com.king.learn.Flink.streaming.join.JoinUtil.test;

/**
 * @Author: king
 * @Date: 2019-01-14
 * @Desc: TODO
 */

public class StreamDataSource2 extends RichParallelSourceFunction<Tuple3<String, String, Long>> {
    private volatile boolean running = true;
    @Override
    public void run(SourceFunction.SourceContext<Tuple3<String, String, Long>> ctx) throws InterruptedException {

        Tuple3[] elements = new Tuple3[]{
                Tuple3.of("a", "beijing", 1000000058000L),
                Tuple3.of("c", "beijing", 1000000055000L),
                Tuple3.of("d", "beijing", 1000000106000L),
        };
        test(ctx, elements, running);
    }
    @Override
    public void cancel() {
        running = false;
    }
}

封装的工具类 JoinUtil

package com.king.learn.Flink.streaming.join;

import com.king.learn.Flink.streaming.join.source.StreamDataSource;
import com.king.learn.Flink.streaming.join.source.StreamDataSource1;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor;
import org.apache.flink.streaming.api.windowing.time.Time;

/**
 * @Author: king
 * @Date: 2019-01-14
 * @Desc: TODO
 */

public class JoinUtil {
    /*
    flink驱动注册
     */
    public static StreamExecutionEnvironment getEnv() {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        env.setParallelism(1);
        return env;
    }
    static DataStream<Tuple3<String, String, Long>>
    getDataStream(DataStream<Tuple3<String, String, Long>> rightSource) {
        long delay = 5100L;
        // 设置水位线
        return rightSource.assignTimestampsAndWatermarks(
                new BoundedOutOfOrdernessTimestampExtractor<Tuple3<String, String, Long>>(Time.milliseconds(delay)) {
                    private static final long serialVersionUID = 518406720598977074L;

                    @Override
                    public long extractTimestamp(Tuple3<String, String, Long> element) {
                        return element.f2;
                    }
                }
        );
    }

    public static void test(SourceFunction.SourceContext<Tuple3<String, String, Long>> ctx, Tuple3[] elements, boolean running) throws InterruptedException {
        int count = 0;
        while (running && count < elements.length) {
            ctx.collect(new Tuple3<>((String) elements[count].f0, (String) elements[count].f1, (long) elements[count].f2));
            count++;
            Thread.sleep(1000);
        }
    }
}

InnerJoin

package com.king.learn.Flink.streaming.join;

import com.king.learn.Flink.streaming.join.source.StreamDataSource;
import com.king.learn.Flink.streaming.join.source.StreamDataSource1;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.api.java.tuple.Tuple5;
import org.apache.flink.api.common.functions.JoinFunction;
import static com.king.learn.Flink.streaming.join.JoinUtil.getEnv;

/**
 * @Author: king
 * @Date: 2019-01-14
 * @Desc: TODO Inner Join
 */

public class FlinkTumblingWindowsInnerJoinDemo {
    public static void main(String[] args) throws Exception {
        int windowSize = 10;
        // 设置数据源
        DataStream<Tuple3<String, String, Long>> leftSource =
                getEnv().addSource(new StreamDataSource()).name("Demo Source");
        DataStream<Tuple3<String, String, Long>> rightSource =
                getEnv().addSource(new StreamDataSource1()).name("Demo Source");
        // join 操作
        JoinUtil.getDataStream(leftSource).join(JoinUtil.getDataStream(rightSource))
                .where(new LeftSelectKey())
                .equalTo(new RightSelectKey())
                .window(TumblingEventTimeWindows.of(Time.seconds(windowSize)))
                .apply((JoinFunction<Tuple3<String, String, Long>, Tuple3<String, String, Long>,Tuple5<String, String, String, Long, Long>>)
                        (first, second) -> new Tuple5<>(first.f0, first.f1, second.f1, first.f2, second.f2)).print();

        getEnv().execute("TimeWindowDemo");
    }

    public static class LeftSelectKey implements KeySelector<Tuple3<String, String, Long>, String> {
        private static final long serialVersionUID = 3962206049185587477L;

        @Override
        public String getKey(Tuple3<String, String, Long> w) {
            return w.f0;
        }
    }

    public static class RightSelectKey implements KeySelector<Tuple3<String, String, Long>, String> {
        private static final long serialVersionUID = -5385125386985167962L;

        @Override
        public String getKey(Tuple3<String, String, Long> w) {
            return w.f0;
        }
    }

}

LeftJoin

package com.king.learn.Flink.streaming.join;


import com.king.learn.Flink.streaming.join.source.StreamDataSource;
import com.king.learn.Flink.streaming.join.source.StreamDataSource1;
import org.apache.flink.api.common.functions.CoGroupFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.api.java.tuple.Tuple5;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;
import static com.king.learn.Flink.streaming.join.JoinUtil.getEnv;

/**
 * @Author: king
 * @Date: 2019-01-14
 * @Desc: TODO Left Outer Join
 */

public class FlinkTumblingWindowsLeftJoinDemo {
    public static void main(String[] args) throws Exception {
        int windowSize = 10;
        // 设置数据源
        DataStream<Tuple3<String, String, Long>> leftSource =
                getEnv().addSource(new StreamDataSource()).name("Demo Source");
        DataStream<Tuple3<String, String, Long>> rightSource =
                getEnv().addSource(new StreamDataSource1()).name("Demo Source");
        // join 操作
        JoinUtil.getDataStream(leftSource).coGroup(JoinUtil.getDataStream(rightSource))
                .where(new LeftSelectKey()).equalTo(new RightSelectKey())
                .window(TumblingEventTimeWindows.of(Time.seconds(windowSize)))
                .apply(new LeftJoin())
                .print();

        JoinUtil.getEnv().execute("TimeWindowDemo");
    }
    public static class LeftJoin implements CoGroupFunction<Tuple3<String, String, Long>, Tuple3<String, String, Long>, Tuple5<String, String, String, Long, Long>> {
        private static final long serialVersionUID = 3583938761914965374L;

        @Override
        public void coGroup(Iterable<Tuple3<String, String, Long>> leftElements, Iterable<Tuple3<String, String, Long>> rightElements, Collector<Tuple5<String, String, String, Long, Long>> out) {

            for (Tuple3<String, String, Long> leftElem : leftElements) {
                boolean hadElements = false;
                for (Tuple3<String, String, Long> rightElem : rightElements) {
                    out.collect(new Tuple5<>(leftElem.f0, leftElem.f1, rightElem.f1, leftElem.f2, rightElem.f2));
                    hadElements = true;
                }
                if (!hadElements) {
                    out.collect(new Tuple5<>(leftElem.f0, leftElem.f1, "null", leftElem.f2, -1L));
                }
            }
        }
    }

    public static class LeftSelectKey implements KeySelector<Tuple3<String, String, Long>, String> {
        private static final long serialVersionUID = -4996755192016797420L;

        @Override
        public String getKey(Tuple3<String, String, Long> w) {
            return w.f0;
        }
    }

    public static class RightSelectKey implements KeySelector<Tuple3<String, String, Long>, String> {
        private static final long serialVersionUID = -4959317241606342598L;

        @Override
        public String getKey(Tuple3<String, String, Long> w) {
            return w.f0;
        }
    }
}

从上面的实例可以看出我们是基于coGroup()实现的

OuterJoin

package com.king.learn.Flink.streaming.join;

import com.king.learn.Flink.streaming.join.bean.Element;
import com.king.learn.Flink.streaming.join.source.StreamDataSource;
import com.king.learn.Flink.streaming.join.source.StreamDataSource1;
import com.king.learn.Flink.streaming.join.source.StreamDataSource2;
import org.apache.flink.api.common.functions.CoGroupFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.api.java.tuple.Tuple5;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;
import java.util.HashMap;
import java.util.HashSet;
import static com.king.learn.Flink.streaming.join.JoinUtil.getEnv;


/**
 * @Author: king
 * @Date: 2019-01-14
 * @Desc: TODO out join 必须左右两边去重的
 */

public class FlinkTumblingWindowsOuterJoinDemo {
    public static void main(String[] args) throws Exception {
        int windowSize = 10;
        // 设置数据源
        DataStream<Tuple3<String, String, Long>> leftSource =
                getEnv().addSource(new StreamDataSource1()).name("Demo Source");
        DataStream<Tuple3<String, String, Long>> rightSource =
                getEnv().addSource(new StreamDataSource2()).name("Demo Source");
        // join 操作
        JoinUtil.getDataStream(leftSource).coGroup(JoinUtil.getDataStream(rightSource))
                .where(new LeftSelectKey()).equalTo(new RightSelectKey())
                .window(TumblingEventTimeWindows.of(Time.seconds(windowSize)))
                .apply(new OuterJoin())
                .print();


        JoinUtil.getEnv().execute("TimeWindowDemo");
    }

    public static class OuterJoin implements CoGroupFunction<Tuple3<String, String, Long>, Tuple3<String, String, Long>, Tuple5<String, String, String, Long, Long>> {
        private static final long serialVersionUID = 844632302486386586L;

        @Override
        public void coGroup(Iterable<Tuple3<String, String, Long>> leftElements, Iterable<Tuple3<String, String, Long>> rightElements, Collector<Tuple5<String, String, String, Long, Long>> out) {
            HashMap<String, Element> left = new HashMap<>();
            HashMap<String, Element> right = new HashMap<>();
            HashSet<String> set = new HashSet<>();

            for (Tuple3<String, String, Long> leftElem : leftElements) {
                set.add(leftElem.f0);
                left.put(leftElem.f0, new Element(leftElem.f1, leftElem.f2));
            }

            for (Tuple3<String, String, Long> rightElem : rightElements) {
                set.add(rightElem.f0);
                right.put(rightElem.f0, new Element(rightElem.f1, rightElem.f2));
            }

            for (String key : set) {
                Element leftElem = getHashMapByDefault(left, key, new Element("null", -1L));
                Element rightElem = getHashMapByDefault(right, key, new Element("null", -1L));

                out.collect(new Tuple5<>(key, leftElem.getName(), rightElem.getName(), leftElem.getNumber(), rightElem.getNumber()));
            }
        }

        private Element getHashMapByDefault(HashMap<String, Element> map, String key, Element defaultValue) {
            return map.get(key) == null ? defaultValue : map.get(key);
        }
    }

    public static class LeftSelectKey implements KeySelector<Tuple3<String, String, Long>, String> {
        private static final long serialVersionUID = -8189893569324632208L;

        @Override
        public String getKey(Tuple3<String, String, Long> w) {
            return w.f0;
        }
    }

    public static class RightSelectKey implements KeySelector<Tuple3<String, String, Long>, String> {
        private static final long serialVersionUID = 2249963842374426629L;

        @Override
        public String getKey(Tuple3<String, String, Long> w) {
            return w.f0;
        }
    }

}

这个也是基于coGroup()实现的,但是有些限制,我下面的Code需要两个Stream 中不存在相同的 Join Key。也就是Join的字段值不能出现重复的。其中会用到两个数据源类 :StreamDataSource1 和 StreamDataSource2。以及一个pojo类:Element。StreamDataSource1见上文,Elemen:如下所示

bean Element

package com.king.learn.Flink.streaming.join.bean;

/**
 * @Author: king
 * @Date: 2019-01-14
 * @Desc: TODO
 */

public class Element {
    public String name;
    public long number;
    public Element() {
    }

    public Element(String name, long number) {
        this.name = name;
        this.number = number;
    }
    public String getName() {
        return name;
    }
   public void setName(String name) {
        this.name = name;
    }

    public long getNumber() {
        return number;
    }
    public void setNumber(long number) {
        this.number = number;
    }
    @Override
    public String toString() {
        return this.name + ":" + this.number;
    }
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值