fink DataStream算子及案例

目录

 

1、分类

DataStream

keyedStream

window Stream

重要案例

DataStream

ProcessFunction

WindowAllDataStream → AllWindowedStream

keyedStream

window Stream

(1)、分组和非分组Windows。

(2)、预定义窗口分配器

滚动窗口

滑动窗口

 案例:滚动处理时间窗口

Window Apply(window窗口时间到了处理类型转换)

Window Reduce    或者 如果需要window值可以Window Reduce+ProcessWindowFunction

Window Fold       或者 如果需要window值可以Window Reduce+ProcessWindowFunction

Window  Reduce+ProcessWindowFunction

Window  Fold+ProcessWindowFunction

Window  Aggreate+ProcessWindowFunction


 

1、分类

DataStream

Map
DataStream → DataStream

FlatMap
DataStream → DataStream

Filter
DataStream → DataStream

KeyBy
DataStream → KeyedStream

WindowAll
DataStream → AllWindowedStream

Process 实例化ProcessFunction,对每个元素进行处理

DataStream → DataStream

keyedStream[都是有状态的聚合]

Reduce   对比(Window Reduce)
KeyedStream → DataStream

Fold   对比(Window Fold)
KeyedStream → DataStream

Aggregations(包括sum/min/max)   对比(Aggregations on windows)
KeyedStream → DataStream

Window
KeyedStream → WindowedStream

Process

KeyedStream → DataStream

window Stream[无状态的聚合]

Window Reduce    或者 如果需要window值可以Window Reduce+ProcessWindowFunction
WindowedStream → DataStream

Window Fold       或者 如果需要window值可以Window Reduce+ProcessWindowFunction
WindowedStream → DataStream

Aggregations on windows   或者 如果需要window值可以Window Reduce+ProcessWindowFunction
WindowedStream → DataStream

window Process(window窗口时间到了处理每个元素)

WindowedStream → DataStream

Window Apply(window窗口时间到了处理每个元素,属于传统的ProcessWindowFunction,没有 per-window keyed state
WindowedStream → DataStream
AllWindowedStream → DataStream

重要案例

DataStream

Process

DataStream → DataStream

val processStream: DataStream[result] = dataStream
      .process(new getAllFunction)

//将UserBehavior类转成result类
class getAllFunction extends ProcessFunction[UserBehavior, result] {
  override def processElement(value: UserBehavior,
                              ctx: ProcessFunction[UserBehavior, result]#Context,
                              out: Collector[result]): Unit = {
    //对每一个元素处理
    value match {
      case behavior: UserBehavior => {
        out.collect(result(behavior.itemId, behavior.count))
      }
      case _ => print("no way")
    }
  }
}

WindowAll
DataStream → AllWindowedStream

    val resultDataStream: DataStream[String] = processStream
      .windowAll(TumblingProcessingTimeWindows.of(Time.seconds(5)))
      .apply((_: TimeWindow, input: Iterable[result], out: Collector[String]) => {
        out.collect(input.mkString(","))
      })
    resultDataStream.print()
    //输出结果:result(1715,1),result(1715,1),result(1715,1),result(1716,1),result(1716,1)

keyedStream

参考window Stream中的keyby

window Stream

(1)、分组和非分组Windows。

keyby和windowAll,分组数据流将你的window计算通过多任务并发执行,以为每一个逻辑分组流在执行中与其他的逻辑分组流是独立地进行的。在windowAll非分组数据流中,你的原始数据流并不会拆分成多个逻辑流并且所有的window逻辑将在一个任务中执行,并发度为1。

(2)、预定义窗口分配器

滚动窗口

滚动事件时间窗口
input
    .keyBy(<key selector>)
    .window(TumblingEventTimeWindows.of(Time.seconds(5)))
    .<windowed transformation>(<window function>); 

滚动处理时间窗口
input
    .keyBy(<key selector>)
    .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
    .<windowed transformation>(<window function>);

滑动窗口

滑动事件时间窗口
input
    .keyBy(<key selector>)
    .window(SlidingEventTimeWindows.of(Time.seconds(10), Time.seconds(5)))
    .<windowed transformation>(<window function>);
滑动处理时间窗口
input
    .keyBy(<key selector>)
    .window(SlidingProcessingTimeWindows.of(Time.seconds(10), Time.seconds(5)))
    .<windowed transformation>(<window function>);

 案例:滚动处理时间窗口

    //windowStream
    val windowStream: WindowedStream[(String, Long, Int), Tuple, TimeWindow] = textKeyStream.
      window(TumblingProcessingTimeWindows.of(Time.seconds(10)))

    //    textKeyStream.print("windowStream:")
    //windowStream:> (000002,1461756879000,1)
    //windowStream:> (000002,1461756879001,1)
    //windowStream:> (000002,1461756879002,1)

Window Reduce    或者 如果需要window值可以Window Reduce+ProcessWindowFunction


WindowedStream → DataStream

    val reduceValue: DataStream[result] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      .reduce { (v1, v2) => result(v1.itemId, v1.count + v2.count) }
    reduceValue.print()

Window  Reduce+ProcessWindowFunction


WindowedStream → DataStream  :增加了window参数并转换了DataStream类型

    val reduceWindowFunctionData: DataStream[String] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      .reduce((v1, v2) => result(v1.itemId, v1.count + v2.count)
        , (key: Tuple, window: TimeWindow, input: Iterable[result], out: Collector[String]) => {
          input.foreach(ele=>out.collect((s"${window.getStart}, $ele")))
        }
      )

Window Fold       或者 如果需要window值可以Window

Fold+ProcessWindowFunction


WindowedStream → DataStream

    val foldValue: DataStream[result] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      .fold(result(111,333)){(original:result,ele:result)=>{
        result(ele.itemId,original.count+ele.count)
      }}
    foldValue.print()

Window  Fold+ProcessWindowFunction


WindowedStream → DataStream 每个初始化值只在一个key中最开始使用一次

    val foldWindowFunctionData: DataStream[String] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      .fold(result(111, 333), (original: result, ele: result) => {
        result(ele.itemId, original.count + ele.count)
      }, (key: Tuple, window: TimeWindow, input: Iterable[result], out: Collector[String]) => {
        var ele = input.iterator.next()
        out.collect((s"${window.getEnd}, $ele"))
      })

Window  Aggreate+ProcessWindowFunction


main{
    val aggregateData: DataStream[String] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      .aggregate(new CountAggregate,new MyProcessWindowFunction)

    aggregateData.print()
}

case class result(itemId: Long, count: Long)

//AggregateFunction<IN, ACC, OUT>
//ACC createAccumulator(); 迭代状态的初始值
//ACC add(IN value, ACC accumulator); 每一条输入数据,和迭代数据如何迭代
//ACC merge(ACC a, ACC b); add方法后的迭代数据如何合并
//OUT getResult(ACC accumulator); 当前分区返回数据,对最终的迭代数据如何处理,并返回结果。

class CountAggregate extends AggregateFunction[result, Long, String] {
  override def createAccumulator() = 6L

  override def add(value: result, accumulator:Long) =
    value.count+accumulator

  override def getResult(accumulator: Long) = "windows count is:"+accumulator.toString

  override def merge(a: Long, b: Long) =
    a+b
}

class MyProcessWindowFunction extends ProcessWindowFunction[String, String, Tuple, TimeWindow] {

  def process(key: Tuple, context: Context, input: Iterable[String], out: Collector[String]) = {
    val count = input.iterator.next()
    out.collect("window end is :"+context.window.getEnd+"key is :"+key+count)
  }
}

输出结果:

window end is :1575213345000key is :(1715)windows count is:7
window end is :1575213345000key is :(1713)windows count is:7
window end is :1575213345000key is :(1716)windows count is:8

Window  process  采用processWindowFunction

//windows process
val processData: DataStream[String] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      //每个分区的数据进行聚合
      .process(new ProcessProcessWindowFunction)
    processData.print()


//key是keyBy的那个key
class ProcessProcessWindowFunction extends ProcessWindowFunction[result, String, Tuple, TimeWindow] {
  def process(key: Tuple, context: Context, input: Iterable[result], out: Collector[String]) = {
    input.foreach(ele => {
      out.collect("window end is :" + context.window.getEnd + "key is :" + key)
    })
  }
}

Window  apply 采用windowFunction 也是对每个元素进行处理,这是老版本的process方法,没有per-window-keyed-state
 

//windows apply
    val windowApplyData: DataStream[String] = dataStream
      .process(new getLastFunction)
      .keyBy("itemId")
      .window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
      .apply(new ApplyWindowFunction)
    windowApplyData.print()

class ApplyWindowFunction extends WindowFunction[result, String, Tuple, TimeWindow] {
  override def apply(key: Tuple, window: TimeWindow, input: Iterable[result], out: Collector[String]): Unit = {
    input.foreach(ele => {
      out.collect("window end is :" + window.getEnd + "key is :" + key)
    })
  }
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值