AggregateFuntion 是相当于累加器
import org.apache.flink.api.common.functions.AggregateFunction import org.apache.flink.streaming.api.functions.source.SourceFunction import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment} import org.apache.flink.streaming.api.windowing.time.Time import org.apache.flink.api.scala._ object aggregateTest { def main(args: Array[String]): Unit = { //设置环境 val env: StreamExecutionEnvironment = StreamExecutionEnvironment.createLocalEnvironment() //设置并行度 env.setParallelism(1) //设置source val sourcestream: DataStream[String] = env.addSource(new SourceFunction[String] { override def run(ctx: SourceFunction.SourceContext[String]): Unit = { while (true) { ctx.collect("hellow hellow hadoop spark hellow") } } Thread.sleep(10000) override def cancel(): Unit = ??? }) sourcestream.print("sorurce stream") sourcestream .flatMap(_.split(" ")) .map((_, 1L)) .keyBy(0) .timeWindow(Time.seconds(3), Time.seconds(3)) .aggregate(new AggregateFunction[(String, Long), (String, Long), (String, Long)] { override def createAccumulator(): (String, Long) = ("", 0L) override def add(value: (String, Long), accumulator: (String, Long)): (String, Long) = { (value._1, accumulator._2 + value._2) } override def getResult(accumulator: (String, Long)): (String, Long) = accumulator override def merge(a: (String, Long), b: (String, Long)): (String, Long) = { (a._1, a._2 + b._2) } }).print() env.execute("word count") } } //Demo来自网上
ProcessFuntion 是Flink的最底层API, 相对于DataStreamAPI 能访问更多的信息 如时间戳 ,wartermark等更多特定事件
ProcessFuntion 提供三个方法
open() 在内存创建一个ListState, 用来存储数据
processElement(v: IN, ctx: Context, out: Collector[OUT]), 每个元素都会调用这个方法,把元素添加到ListState当中
onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[OUT]) 触发一个定时器,输出结果