AggregateFuntion ,ProcessFuntion 使用

最新推荐文章于 2022-03-07 15:30:12 发布

zzz0286

最新推荐文章于 2022-03-07 15:30:12 发布

阅读量281

点赞数

分类专栏： Spark Flink

本文链接：https://blog.csdn.net/xz370057448/article/details/105746320

版权

Flink 同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

Spark

3 篇文章 0 订阅

订阅专栏

AggregateFuntion 是相当于累加器

import org.apache.flink.api.common.functions.AggregateFunction
import org.apache.flink.streaming.api.functions.source.SourceFunction
import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.api.scala._

object aggregateTest {
  def main(args: Array[String]): Unit = {

    //设置环境
    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironment()
    //设置并行度
    env.setParallelism(1)

    //设置source
    val sourcestream: DataStream[String] = env.addSource(new SourceFunction[String] {
      override def run(ctx: SourceFunction.SourceContext[String]): Unit = {
        while (true) {
          ctx.collect("hellow hellow hadoop spark hellow")
        }
      }

      Thread.sleep(10000)

      override def cancel(): Unit = ???
    })
    sourcestream.print("sorurce stream")

    sourcestream
      .flatMap(_.split(" "))
      .map((_, 1L))
      .keyBy(0)
      .timeWindow(Time.seconds(3), Time.seconds(3))

      .aggregate(new AggregateFunction[(String, Long), (String, Long), (String, Long)] {
        override def createAccumulator(): (String, Long) = ("", 0L)

        override def add(value: (String, Long), accumulator: (String, Long)): (String, Long) = {
          (value._1, accumulator._2 + value._2)
        }

        override def getResult(accumulator: (String, Long)): (String, Long) = accumulator

        override def merge(a: (String, Long), b: (String, Long)): (String, Long) = {

          (a._1, a._2 + b._2)
        }
      }).print()

    env.execute("word count")

  }
}

//Demo来自网上

ProcessFuntion 是Flink的最底层API, 相对于DataStreamAPI 能访问更多的信息如时间戳 ,wartermark等更多特定事件

ProcessFuntion 提供三个方法

open() 在内存创建一个ListState, 用来存储数据

processElement(v: IN, ctx: Context, out: Collector[OUT]), 每个元素都会调用这个方法,把元素添加到ListState当中

onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[OUT]) 触发一个定时器,输出结果

zzz0286

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
AggregateFuntion ,ProcessFuntion 使用

AggregateFuntion 是相当于累加器ProcessFuntion 是Flink的最底层API, 相对于DataStreamAPI 能访问更多的信息如时间戳 ,wartermark等更多特定事件
复制链接

扫一扫