Flink如何实现3个实时流同时join,leftjoin,rightjoin

还有几分钟就登记了,目前在哈尔滨飞往北京的候机厅。由于晚上回去很晚,第二天忙活没时间更新文章,挤时间整理了一下。

Flink如何实现3个实时流同时join?整体思路就是:

•设置相同的时间类型•设置相同的时间窗口,这样就会到达相同窗口时,3个实时流会同时触发。

由于flink不支持3个实时流同时join,你需要先把2个实时流join完成的结果,再跟第三个实时流join。

import java.util	
import SessionIdKeyedProcessFunction.MyTimeTimestampsAndWatermarks	
import org.apache.flink.streaming.api.TimeCharacteristic	
import org.apache.flink.streaming.api.functions.{AssignerWithPeriodicWatermarks, AssignerWithPunctuatedWatermarks}	
import org.apache.flink.streaming.api.scala._	
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment	
import org.apache.flink.streaming.api.watermark.Watermark	
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows	
import org.apache.flink.streaming.api.windowing.time.Time	
import org.apache.flink.util.Collector	
object FlinkWindow {	
  class MyTimeTimestampsAndWatermarks extends AssignerWithPeriodicWatermarks[(String,Int)] with Serializable{	
    //生成时间戳	
    val maxOutOfOrderness = 3500L // 3.5 seconds	
    var currentMaxTimestamp: Long = _	
    override def extractTimestamp(element: (String,Int), previousElementTimestamp: Long): Long = {	
      val timestamp = System.currentTimeMillis()	
      currentMaxTimestamp = Math.max(timestamp, currentMaxTimestamp)	
      timestamp	
    }	
    override def getCurrentWatermark(): Watermark = {	
      // return the watermark as current highest timestamp minus the out-of-orderness bound	
      new Watermark(currentMaxTimestamp - maxOutOfOrderness);	
    }	
  }	
  def main(args: Array[String]): Unit = {	
    val env = StreamExecutionEnvironment.getExecutionEnvironment	
    env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)	
    val input = env.socketTextStream("localhost", 9001)	
    val inputMap = input.flatMap(f => {	
      f.split("\\W+")	
    }).map(line =>(line ,1)).assignTimestampsAndWatermarks(new MyTimeTimestampsAndWatermarks())	

	
    inputMap.print()	
    val input1 = env.socketTextStream("localhost", 9002)	
    val inputMap1 = input1.flatMap(f => {	
      f.split("\\W+")	
    }).map(line =>(line ,1)).assignTimestampsAndWatermarks(new MyTimeTimestampsAndWatermarks())	
    inputMap1.print()	
    val input2 = env.socketTextStream("localhost", 9003)	
    val inputMap2 = input2.flatMap(f => {	
      f.split("\\W+")	
    }).map(line =>(line ,1)).assignTimestampsAndWatermarks(new MyTimeTimestampsAndWatermarks())	
    inputMap2.print()	
    val aa = inputMap.join(inputMap1).where(_._1).equalTo(_._1).window(TumblingProcessingTimeWindows.of(Time.seconds(6)))	
    .apply{(t1:(String,Int),t2:(String,Int), out : Collector[(String,Int,Int)])=>	
      out.collect(t1._1,t1._2,t2._2)	
    }	
  aa.print()	
      val cc = aa.join(inputMap2).where(_._1).equalTo(_._1).window(TumblingProcessingTimeWindows.of(Time.seconds(6)))	
      .apply{(t1:(String,Int,Int),t2:(String,Int), out : Collector[(String,Int,Int,Int)])=>	
        out.collect(t1._1,t1._2,t1._3,t2._2)	
      }	
    cc.print()	
    env.execute()	
  }	
}

leftjoin,rightjoin由于flink官网没有明确指出实现方案,join算子无法实现,大家需要用cogroup来实现leftjoin和rightjoin,大家可以参考这个改一下就可以了

import util.source.StreamDataSource1;	
import util.source.StreamDataSource;	
import org.apache.flink.api.common.functions.CoGroupFunction;	
import org.apache.flink.api.java.functions.KeySelector;	
import org.apache.flink.api.java.tuple.Tuple3;	
import org.apache.flink.api.java.tuple.Tuple5;	
import org.apache.flink.streaming.api.TimeCharacteristic;	
import org.apache.flink.streaming.api.datastream.DataStream;	
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;	
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor;	
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;	
import org.apache.flink.streaming.api.windowing.time.Time;	
import org.apache.flink.util.Collector;	
public class FlinkTumblingWindowsLeftJoinDemo {	
    public static void main(String[] args) throws Exception {	
        int windowSize = 10;	
        long delay = 5100L;	
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();	
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);	
        env.setParallelism(1);	
        // 设置数据源	
        DataStream<Tuple3<String, String, Long>> leftSource = env.addSource(new StreamDataSource()).name("Demo Source");	
        DataStream<Tuple3<String, String, Long>> rightSource = env.addSource(new StreamDataSource1()).name("Demo Source");	
        // 设置水位线	
        DataStream<Tuple3<String, String, Long>> leftStream = leftSource.assignTimestampsAndWatermarks(	
            new BoundedOutOfOrdernessTimestampExtractor<Tuple3<String, String, Long>>(Time.milliseconds(delay)) {	
                @Override	
                public long extractTimestamp(Tuple3<String, String, Long> element) {	
                    return element.f2;	
                }	
            }	
        );	
        DataStream<Tuple3<String, String, Long>> rigjhtStream = rightSource.assignTimestampsAndWatermarks(	
            new BoundedOutOfOrdernessTimestampExtractor<Tuple3<String, String, Long>>(Time.milliseconds(delay)) {	
                @Override	
                public long extractTimestamp(Tuple3<String, String, Long> element) {	
                    return element.f2;	
                }	
            }	
        );	
        // join 操作	
        leftStream.coGroup(rigjhtStream)	
            .where(new LeftSelectKey()).equalTo(new RightSelectKey())	
            .window(TumblingEventTimeWindows.of(Time.seconds(windowSize)))	
            .apply(new LeftJoin())	
            .print();	
        env.execute("TimeWindowDemo");	
    }	
    public static class LeftJoin implements CoGroupFunction<Tuple3<String, String, Long>, Tuple3<String, String, Long>, Tuple5<String, String, String, Long, Long>> {	
        @Override	
        public void coGroup(Iterable<Tuple3<String, String, Long>> leftElements, Iterable<Tuple3<String, String, Long>> rightElements, Collector<Tuple5<String, String, String, Long, Long>> out) {	
            for (Tuple3<String, String, Long> leftElem : leftElements) {	
                boolean hadElements = false;	
                for (Tuple3<String, String, Long> rightElem : rightElements) {	
                    out.collect(new Tuple5<>(leftElem.f0, leftElem.f1, rightElem.f1, leftElem.f2, rightElem.f2));	
                    hadElements = true;	
                }	
                if (!hadElements) {	
                    out.collect(new Tuple5<>(leftElem.f0, leftElem.f1, "null", leftElem.f2, -1L));	
                }	
            }	
        }	
    }	
    public static class LeftSelectKey implements KeySelector<Tuple3<String, String, Long>, String> {	
        @Override	
        public String getKey(Tuple3<String, String, Long> w) {	
            return w.f0;	
        }	
    }	
    public static class RightSelectKey implements KeySelector<Tuple3<String, String, Long>, String> {	
        @Override	
        public String getKey(Tuple3<String, String, Long> w) {	
            return w.f0;	
        }	
    }

想看更多大厂技术干货分享?请关注下方公号,回复“spark”,“flink”,“机器学习”,“前端”即可获取海量学习资料。

640?wx_fmt=jpeg

  • 2
    点赞
  • 32
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
Flink 通过使用 `join()` 方法来实现两个join 操作。具体实现方式取决于 join 的类型,包括: 1. Inner Join:只输出两个 key 相同的元素。可以使用 `join()` 方法实现,也可以使用 `join()` 方法的简化版本 `joinWith()` 实现。 2. Left Join:输出左侧的所有元素以及与右侧 key 相同的元素。可以使用 `leftJoin()` 方法实现。 3. Right Join:输出右侧的所有元素以及与左侧 key 相同的元素。可以使用 `rightJoin()` 方法实现。 4. Full Outer Join:输出左右两侧的所有元素。可以使用 `fullOuterJoin()` 方法实现。 下面是一个使用 `join()` 方法实现 Inner Join 的代码示例: ```java DataStream<Tuple2<String, Integer>> stream1 = ... DataStream<Tuple2<String, String>> stream2 = ... DataStream<Tuple3<String, Integer, String>> result = stream1.join(stream2) .where(0) // 指定第一个的 key 为 join 条件 .equalTo(0) // 指定第二个的 key 为 join 条件 .map(new MapFunction<Tuple2<Tuple2<String, Integer>, Tuple2<String, String>>, Tuple3<String, Integer, String>>() { @Override public Tuple3<String, Integer, String> map(Tuple2<Tuple2<String, Integer>, Tuple2<String, String>> value) throws Exception { return new Tuple3<>(value.f0.f0, value.f0.f1, value.f1.f1); // 将 join 后的结果转换成新的 Tuple } }); ``` 上面的代码,我们首先从两个获取数据 `stream1` 和 `stream2`,然后使用 `join()` 方法进行 join 操作。在 join 操作,我们使用 `where()` 方法指定第一个的 key 为 join 条件,使用 `equalTo()` 方法指定第二个的 key 为 join 条件。最后,我们使用 `map()` 方法将 join 后的结果转换成新的 Tuple,最终得到的结果是一个三元组,其第一个元素是 key,第二个元素是 `stream1` 的 value,第三个元素是 `stream2` 的 value。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值