Flink时间语义与 Wartermark

EventTime:
是事件创建的时间。它通常由事件中的时间戳描述,例如采集的日志数据中,每一条日志都会记录自己的生成时间,Flink 通过时间戳分配器访问事件时间戳。例如:点击网站上的某个链接的时间
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
IngestionTime:
某个Flink节点的source operator接收到数据的时间,例如:某个source消费到kafka中的数据
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)
ProcessingTime:
是每一个执行基于时间操作的算子的本地系统时间,与机器相关,默认的时间属性就是 Processing Time。例如:timeWindow接收到数据的时间
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)

package flink.chapter5WaterMark
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.TimeCharacteristic
object WaterMark_Demo {
  def main(args: Array[String]): Unit = {
    def main(args: Array[String]): Unit = {
      val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
      //参数当中的TimeCharacteristic选择第二个
      env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    }
  }
}

水印机制窗口 代码 (与TimeWindow不一样,不要混淆) 左闭右开区间[a,2000)
滚动 .window(TumblingEventTimeWindows.of(Time.seconds(5)))
滑动 .window(SlidingEventTimeWindows.of(Time.seconds(10),Time.seconds(5)))
会话 .window(EventTimeSessionWindows.withGap(Time.seconds(5)))

水印机制滚动窗口案例

package flink.chapter5WaterMark
import org.apache.flink.api.java.tuple.Tuple
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala.{DataStream, KeyedStream, StreamExecutionEnvironment, WindowedStream}
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
import scala.collection.mutable

object TumblingWindow_Demo {
  def main(args: Array[String]): Unit = {
    //构建流处理程序入口
    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    //设置并行度
    env.setParallelism(1)
    //指定时间类型
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    //接收数据
    val data: DataStream[String] = env.socketTextStream("hadoop101",9999)
    //针对数据进行操作
    val file: DataStream[(String, Long, Int)] = data.map(text => {
      val arr: Array[String] = text.split(" ")
      (arr(0), arr(1).toLong, 1)
    })
    //设置水印机制(waterMark),小括号中参数为水印延迟时间
    val fileDataStream: DataStream[(String, Long, Int)] = file.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[(String, Long, Int)](Time.seconds(2)) {
      //获取传入数据的时间戳
      override def extractTimestamp(t: (String, Long, Int)): Long = {
        return t._2
      }
    })
    //针对相同的key进行分流
    val keyData: KeyedStream[(String, Long, Int), Tuple] = fileDataStream.keyBy(0)
    //打印输出
    keyData.print("keyed:")
    //设置窗口
    val window: WindowedStream[(String, Long, Int), Tuple, TimeWindow] = keyData.window(TumblingEventTimeWindows.of(Time.seconds(2)))
    //收集结果
    val result: DataStream[mutable.HashSet[Long]] = window.fold(new mutable.HashSet[Long]()) {
      case (set, (word, ts, count)) => set += ts
    }
    //打印输出
    result.print("window::")
    env.execute()
  }
}

水印机制滑动窗口案例

package flink.chapter5WaterMark
import org.apache.flink.api.java.tuple.Tuple
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala.{DataStream, KeyedStream, StreamExecutionEnvironment, WindowedStream}
import org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
import scala.collection.mutable

object SlidingWindow_Demo {
  def main(args: Array[String]): Unit = {
    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    //设置并行度
    env.setParallelism(1)
    //设置使用时间
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    //加载数据
    val data: DataStream[String] = env.socketTextStream("hadoop101",9999)
    //对数据进行操作
    val file: DataStream[(String, Long, Int)] = data.map(text => {
      val arr: Array[String] = text.split(" ")
      (arr(0), arr(1).toLong, 1)
    })
    //设置水印机制,小括号当中设置的是水印延迟时间
    val fileDataStream: DataStream[(String, Long, Int)] = file.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[(String, Long, Int)](Time.seconds(2)) {
      //获取传入数据的时间戳
      override def extractTimestamp(t: (String, Long, Int)): Long = {
        return t._2
      }
    })
    //按照key值进行分流
    val keyed: KeyedStream[(String, Long, Int), Tuple] = fileDataStream.keyBy(0)
    //打印输出传入的数据
    keyed.print("keyed:")
    //指定窗口大小和类型
    val windows: WindowedStream[(String, Long, Int), Tuple, TimeWindow] = keyed.window(SlidingEventTimeWindows.of(Time.seconds(2),Time.seconds(2)))
    //收集时间戳
    val result: DataStream[mutable.HashSet[Long]] = windows.fold(new mutable.HashSet[Long]()) {
      case (set, (word, ts, count)) => set += ts
    }
    //打印输出收集的时间戳
    result.print("windows:::")
    //调用execute方法
    env.execute()
  }
}

水印机制会话窗口案例
相邻两次数据的 EventTime 的时间差超过指定的时间间隔就会触发执行。如果加入 Watermark, 会在符合窗口触发的情况下进行延迟。到达延迟水位再进行窗口触发。

package flink.chapter5WaterMark
import org.apache.flink.api.java.tuple.Tuple
import org.apache.flink.api.scala._
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala.{DataStream, KeyedStream, StreamExecutionEnvironment, WindowedStream}
import org.apache.flink.streaming.api.windowing.assigners.EventTimeSessionWindows
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.streaming.api.windowing.windows.TimeWindow

object SessionWindow_Demo {
  def main(args: Array[String]): Unit = {
    //构建flink流处理执行环境
    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    //设置并行度
    env.setParallelism(1)
    //设置使用时间类型
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    //接收数据
    val data: DataStream[String] = env.socketTextStream("hadoop101",9999)
    //对数据进行操作
    val file: DataStream[(String, Long, Int)] = data.map(text => {
      val arr: Array[String] = text.split(" ")
      (arr(0), arr(1).toLong, 1)
    })
    //设置水印机制
    val fileDataStream: DataStream[(String, Long, Int)] = file.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[(String, Long, Int)](Time.seconds(2)) {
      override def extractTimestamp(t: (String, Long, Int)): Long = {
        return t._2
      }
    })
    //分流
    val keyed: KeyedStream[(String, Long, Int), Tuple] = fileDataStream.keyBy(0)
    //打印输出传入的数据
    keyed.print("keyed:")
    //设置会话窗口间隔
    val window: WindowedStream[(String, Long, Int), Tuple, TimeWindow] = keyed.window(EventTimeSessionWindows.withGap(Time.seconds(2)))
    //统计出现的次数,0L占位时间戳
    window.reduce((text1,text2)=>{
      (text1._1,0L,text1._3+text2._3)
    }).map(_._3).print("window::")
    //调用execute
    env.execute()
  }
}

间隔超过两秒,为一个会话窗口,延时2秒执行[左闭右开区间).
在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值