KedProcessFunction实现窗口定时计算

最新推荐文章于 2023-01-06 15:07:21 发布

乖乖猪001

最新推荐文章于 2023-01-06 15:07:21 发布

阅读量208

点赞数

分类专栏： flink 大数据文章标签： flink

本文链接：https://blog.csdn.net/xiaozhaoshigedasb/article/details/112238460

版权

大数据同时被 2 个专栏收录

73 篇文章 11 订阅

订阅专栏

flink

14 篇文章 1 订阅

订阅专栏

一、EventTime

// 数据进过清洗过滤后生成水位线
val ds=env.addSource(consumer)
      .map(x=>{
        val s=x.split(",")
        AdData(s(0).toInt,s(1),s(2).toLong)
      }).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[AdData](Time.minutes(1)) {
      override def extractTimestamp(element: AdData): Long = element.time
    })
      .keyBy(x=>{
        AdKey(x.id)
      })

class Distinct1ProcessFunction extends KeyedProcessFunction[AdKey, AdData, Void] {
  var devIdState: MapState[String, Int] = _
  var devIdStateDesc: MapStateDescriptor[String, Int] = _

  var countState: ValueState[Long] = _
  var countStateDesc: ValueStateDescriptor[Long] = _

  override def open(parameters: Configuration): Unit = {

    devIdStateDesc = new MapStateDescriptor[String, Int]("devIdState", TypeInformation.of(classOf[String]), TypeInformation.of(classOf[Int]))
    devIdState = getRuntimeContext.getMapState(devIdStateDesc)

    countStateDesc = new ValueStateDescriptor[Long]("countState", TypeInformation.of(classOf[Long]))
    countState = getRuntimeContext.getState(countStateDesc)
  }

// 每条数据执行
  override def processElement(value: AdData, ctx: KeyedProcessFunction[AdKey, AdData, Void]#Context, out: Collector[Void]): Unit = 
  // 获取当前数据中eventtime对应的开始时间
  // 如：x.time是10:20:40，Time.hours(1).toMilliseconds表示以1小时为跨度，那么这条数据的					//开始时间就是10:00:00,即：TimeWindow.getWindowStartWithOffset(x.time, 		0,Time.hours(1).toMilliseconds)
  // 最后加一小时则是这个事件事件的结束事件，即11:00:00
	  val endTime= TimeWindow.getWindowStartWithOffset(x.time, 0,
          Time.hours(1).toMilliseconds) + Time.hours(1).toMilliseconds
  // 获取当前水位线	
    val currW=ctx.timerService().currentWatermark()
  // 考虑可能会存在滞后的数据比较严重，会影响之前的计算结果，做了一个类似window机制里面的一个延时判断，将延时的数据过滤掉，也可以使用OutputTag 单独处理
  // 迟来的数据所在的窗口结束时间<=当前水位线，则为晚到的数据，丢弃晚到的数据
  // 如当前水位线11:01:00，日志事件时间为11:02:00,则endTime为12:00:00,即正常
  // 此时来了一条日志事件时间为10:58:00，则endTime为11:00:00，wartermark>endTime
  // 为什么+1，如果在水位线为10:00:00，来了一条endTime在10:00:00,也就是已经触发器计算了，+1后endTime为10:00:01，即可以正常计算
    if(endTime+1<=currW) {
        println("late data:" + value)
        return
      }

    val devId = value.devId
    devIdState.get(devId) match {
      case 1 => {
        //表示已经存在
      }
      case _ => {
        //表示不存在，即当前key第一条数据进入
        devIdState.put(devId, 1)
        val c = countState.value()
        countState.update(c + 1)
        //第一次需要注册一个定时器
        ctx.timerService().registerEventTimeTimer(endTime + 1)
      }
    }
    println(countState.value())
  }

  override def onTimer(timestamp: Long, ctx: KeyedProcessFunction[AdKey, AdData, Void]#OnTimerContext, out: Collector[Void]): Unit = {
    println(timestamp + " exec clean~~~")
    println(countState.value())
    devIdState.clear()
    countState.clear()
  }
}

二、ProcessTime

// 在KeyedProcessFunction中的定时器中使用如下代码生成对应的结束自然时间
// 如：当前时间是10:05:23，设置的窗口跨度是1分钟，那么窗口的开始时间则为10:05:00;
// 如：当前时间是10:13:21，这是的窗口跨度是10分钟，那么窗口的开始时间则为10:10:00
val ttlTime = TimeWindow.getWindowStartWithOffset(System.currentTimeMillis(), 0,
    Time.minutes(1).toMilliseconds) + Time.minutes(1).toMilliseconds
        AdKey(x.id,endTime)
ctx.timerService().registerProcessingTimeTimer(ttlTime)
// 后续计算，则只要设置ctx.timerService().registerProcessingTimeTimer(ttlTime)

三、不使用EventTime也不使用ProcessTime

// 这种比较简单，keyBy不需要带时间
xxx.keyBy(x=>{
        AdKey(x.id)
      })
//在后续KeyedProcessFunction中设置定时器的时候只要用当前时间 + 运行间隔即可
val ttlTime: Long = System.currentTimeMillis() + 15 * 60 * 1000 //15min 执行一次
ctx.timerService().registerProcessingTimeTimer(ttlTime)
// 和ProcessTime的区别在于，这种方式是当每个Key有第一条数据进来的时候开始计时15分钟，会出现一种情况：
// 如：key_a数据10:00:00来了，key_b数据10:01:23来
// 那么key_a的定时器触发时间则为10:15:00,key_b的触发时间则为10:16:23
// 这种方式保证了每个key都完整接收了15分钟数据，而ProcessTime则是自然时间到点就执行

乖乖猪001

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
KedProcessFunction实现窗口定时计算

一、EventTime// 数据进过清洗过滤后生成水位线val ds=env.addSource(consumer) .map(x=>{ val s=x.split(",") AdData(s(0).toInt,s(1),s(2).toLong) }).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[AdData](Time.minutes(1
复制链接

扫一扫

专栏目录