Flink 窗口计算Joining 链接

Window Join

窗⼝join将共享相同key并位于同⼀窗⼝中的两个流的元素联接在⼀起。可以使⽤窗⼝分配器定义这些窗⼝,并根据两个流中的元素对其进⾏评估。然后将双⽅的元素传递到⽤户定义的JoinFunction或FlatJoinFunction,在此⽤户可以发出满⾜联接条件的结果。

stream.join(otherStream)
	 .where(<KeySelector>)
	 .equalTo(<KeySelector>)
	 .window(<WindowAssigner>)
	 .apply(<JoinFunction>)

注意:

  • 创建两个流的元素的成对组合的⾏为就像⼀个内部联接,这意味着如果⼀个流中的元素没有与另⼀流中要连接的元素对应的元素,则不会发出该元素。
  • 那些确实加⼊的元素将以最⼤的时间戳(仍位于相应窗⼝中)作为时间戳。例如,以[5,10)为边界的窗⼝将导致连接的元素具有9作为其时间戳。

Tumbling Window Join 滚动窗口join

当执⾏滚动窗⼝联接时,所有具有公共键和公共滚动窗⼝的元素都按成对组合联接,并传递到JoinFunction或FlatJoinFunction。因为它的⾏为就像⼀个内部联接,所以在其滚动窗⼝中不发射⼀个流中没有其他流元素的元素!
在这里插入图片描述

def main(args: Array[String]): Unit = {
   val env = StreamExecutionEnvironment.getExecutionEnvironment
   env.setParallelism(1)
   env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
   //001 zhansgan 时间戳
   val stream1 = env.socketTextStream("CentOS", 9999)
     .map(_.split("\\s+"))
     .map(ts=>(ts(0),ts(1),ts(2).toLong))
     .assignTimestampsAndWatermarks(new TumblingAssignerWithPeriodicWatermarks)//设置水位线
   //apple 001 时间戳
   val stream2 = env.socketTextStream("CentOS", 8888)
     .map(_.split("\\s+"))
     .map(ts=>(ts(0),ts(1),ts(2).toLong))
     .assignTimestampsAndWatermarks(new
         TumblingAssignerWithPeriodicWatermarks)//设置水位线
   stream1.join(stream2)
     .where(t=>t._1) //stream1 ⽤户ID
     .equalTo(t=> t._2) //stream2 ⽤户ID
     .window(TumblingEventTimeWindows.of(Time.seconds(2)))
     .apply((s1,s2,out:Collector[(String,String,String)])=>{
       out.collect(s1._1,s1._2,s2._1)
     })
     .print("join结果")
   env.execute("FlinkTumblingWindowJoin")
 }

 }
 class TumblingAssignerWithPeriodicWatermarks extends AssignerWithPeriodicWatermarks[(String,String,Long)] {
   var maxAllowOrderness=2000L
   var maxSeenEventTime= 0L //不可以取Long.MinValue
   var sdf=new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
   //系统定期的调⽤ 计算当前的⽔位线的值
   override def getCurrentWatermark: Watermark = {
     new Watermark(maxSeenEventTime-maxAllowOrderness)
   }
   //更新⽔位线的值,同时提取EventTime
   override def extractTimestamp(element: (String,String, Long),
                                 previousElementTimestamp: Long): Long = {
     //始终将最⼤的时间返回
     maxSeenEventTime=Math.max(maxSeenEventTime,element._3)
     println("ET:"+(element._1,element._2,sdf.format(element._3))+" WM:"+sdf.format(maxSeenEventTime-maxAllowOrderness))
       element._3
   }

Sliding Window Join 滑动窗⼝连接

执⾏滑动窗⼝连接时,所有具有公共键和公共滑动窗⼝的元素都按成对组合进⾏连接,并传递给JoinFunction或FlatJoinFunction。在当前滑动窗⼝中,⼀个流中没有其他流元素的元素不会被发出!请注意,某些元素可能在⼀个滑动窗⼝中连接,但可能不能在另⼀个窗⼝中连接!
在这里插入图片描述

  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    //001 zhansgan 时间戳
    val stream1 = env.socketTextStream("CentOS", 9999)
      .map(_.split("\\s+"))
      .map(ts => (ts(0), ts(1), ts(2).toLong))
      .assignTimestampsAndWatermarks(new SlidlingAssignerWithPeriodicWatermarks)
    //apple 001 时间戳
    val stream2 = env.socketTextStream("CentOS", 8888)
      .map(_.split("\\s+"))
      .map(ts => (ts(0), ts(1), ts(2).toLong))
      .assignTimestampsAndWatermarks(new
          SlidlingAssignerWithPeriodicWatermarks)
    stream1.join(stream2)
      .where(t => t._1) //stream1 ⽤户ID
      .equalTo(t => t._2) //stream2 ⽤户ID
      .window(SlidingEventTimeWindows.of(Time.seconds(4), Time.seconds(2)))
      .apply((s1, s2, out: Collector[(String, String, String)]) => {
        out.collect(s1._1, s1._2, s2._1)
      })
      .print("join结果")
    env.execute("FlinkSlidlingWindowJoin")

  }

}

class SlidlingAssignerWithPeriodicWatermarks extends AssignerWithPeriodicWatermarks[(String, String, Long)] {
  var maxAllowOrderness = 2000L
  var maxSeenEventTime = 0L //不可以取Long.MinValue
  var sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")

  //系统定期的调⽤ 计算当前的⽔位线的值
  override def getCurrentWatermark: Watermark = {
    new Watermark(maxSeenEventTime - maxAllowOrderness)
  }

  //更新⽔位线的值,同时提取EventTime
  override def extractTimestamp(element: (String, String, Long),
                                previousElementTimestamp: Long): Long = {
    //始终将最⼤的时间返回
    maxSeenEventTime = Math.max(maxSeenEventTime, element._3)

    println("ET:" + (element._1, element._2, sdf.format(element._3)) + "WM:" + sdf.format(maxSeenEventTime - maxAllowOrderness))
    element._3
  }

Session Window Join 会话窗⼝连接

在执⾏会话窗⼝连接时,具有“组合”时满⾜会话条件的相同键的所有元素将以成对组合的⽅式连接在⼀起,并传递给JoinFunction或FlatJoinFunction。再次执⾏内部联接,因此,如果有⼀个会话窗⼝仅包含⼀个流中的元素,则不会发出任何输出!
在这里插入图片描述

def main(args: Array[String]): Unit = {
      val env = StreamExecutionEnvironment.getExecutionEnvironment
      env.setParallelism(1)
      env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
      //001 zhansgan 时间戳
      val stream1 = env.socketTextStream("CentOS", 9999)
        .map(_.split("\\s+"))
        .map(ts => (ts(0), ts(1), ts(2).toLong))
        .assignTimestampsAndWatermarks(new SessionAssignerWithPeriodicWatermarks)
      //apple 001 时间戳
      val stream2 = env.socketTextStream("CentOS", 8888)
        .map(_.split("\\s+"))
        .map(ts => (ts(0), ts(1), ts(2).toLong))
        .assignTimestampsAndWatermarks(new
            SessionAssignerWithPeriodicWatermarks)
      stream1.join(stream2)
        .where(t => t._1) //stream1 ⽤户ID
        .equalTo(t => t._2) //stream2 ⽤户ID
        .window(EventTimeSessionWindows.withGap(Time.seconds(2)))
        .apply((s1, s2, out: Collector[(String, String, String)]) => {
          out.collect(s1._1, s1._2, s2._1)
        })
        .print("join结果")
      env.execute("FlinkSessionWindowJoin")


    }

  }

  class SessionAssignerWithPeriodicWatermarks extends AssignerWithPeriodicWatermarks[(String, String, Long)] {
    var maxAllowOrderness = 2000L
    var maxSeenEventTime = 0L //不可以取Long.MinValue
    var sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")

    //系统定期的调⽤ 计算当前的⽔位线的值
    override def getCurrentWatermark: Watermark = {
      new Watermark(maxSeenEventTime - maxAllowOrderness)
    }

    //更新⽔位线的值,同时提取EventTime
    override def extractTimestamp(element: (String, String, Long),
                                  previousElementTimestamp: Long): Long = {
      //始终将最⼤的时间返回
      maxSeenEventTime = Math.max(maxSeenEventTime, element._3)

      println("ET:" + (element._1, element._2, sdf.format(element._3)) + "WM:" + sdf.format(maxSeenEventTime - maxAllowOrderness))
      element._3
    }

Interval Join(区间join)

间隔连接使⽤公共key连接两个流(现在将它们分别称为A和B)的元素,并且流B的元素时间位于流A的元素时间戳的间隔之中,则A和B的元素就可以join

------------------------------------------下界--------------------------------上界
b.timestamp ∈ [a.timestamp + lowerBound; a.timestamp + upperBound]
或者
a.timestamp + lowerBound <= b.timestamp <= a.timestamp + upperBound

其中a和b是a和b的元素,它们共用一个键。只要下界总是小于或等于上界,下界和上界都可以是负的或正的。间隔连接目前仅执行内部连接。
当将一对元素传递给ProcessJoinFunction时,它们将被分配更大的时间戳(可以通过ProcessJoinFunction.Context访问它)这两个元素。

  • 提醒: interval连接目前只支持事件时间。
    在这里插入图片描述
 def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    //001 zhansgan 时间戳
    val stream1 = env.socketTextStream("CentOS", 9999)
      .map(_.split("\\s+"))
      .map(ts=>(ts(0),ts(1),ts(2).toLong))
      .assignTimestampsAndWatermarks(new IntervaAssignerWithPeriodicWatermarks)
      .keyBy(t=>t._1)
    //apple 001 时间戳
    val stream2 = env.socketTextStream("CentOS", 8888)
      .map(_.split("\\s+"))
      .map(ts=>(ts(0),ts(1),ts(2).toLong))
      .assignTimestampsAndWatermarks(new
          IntervaAssignerWithPeriodicWatermarks)
      .keyBy(t=>t._2)
    stream1.intervalJoin(stream2)
      .between(Time.seconds(0),Time.seconds(2))//默认是边界包含
      //.lowerBoundExclusive() 排除下边界
      //.upperBoundExclusive() 排除上边界
      .process(new ProcessJoinFunction[(String,String,Long),
        (String,String,Long),String] {
        override def processElement(left: (String, String, Long),
                                    right: (String, String, Long),
                                    ctx: ProcessJoinFunction[(String, String,
                                      Long), (String, String, Long), String]#Context,
                                    out: Collector[String]): Unit = {
          println("l:"+ctx.getLeftTimestamp+""+"r:"+ctx.getRightTimestamp+",t:"+ctx.getTimestamp)
          out.collect(left._1+" "+left._2+" "+right._1)
        }
      })
      .print()
    env.execute("FlinkIntervalJoin")

  }

  }

  class IntervaAssignerWithPeriodicWatermarks extends AssignerWithPeriodicWatermarks[(String, String, Long)] {
    var maxAllowOrderness = 2000L
    var maxSeenEventTime = 0L //不可以取Long.MinValue
    var sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")

    //系统定期的调⽤ 计算当前的⽔位线的值
    override def getCurrentWatermark: Watermark = {
      new Watermark(maxSeenEventTime - maxAllowOrderness)
    }

    //更新⽔位线的值,同时提取EventTime
    override def extractTimestamp(element: (String, String, Long),
                                  previousElementTimestamp: Long): Long = {
      //始终将最⼤的时间返回
      maxSeenEventTime = Math.max(maxSeenEventTime, element._3)

      println("ET:" + (element._1, element._2, sdf.format(element._3)) + "WM:" + sdf.format(maxSeenEventTime - maxAllowOrderness))
      element._3
    }


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值