目录
WindowAllDataStream → AllWindowedStream
Window Apply(window窗口时间到了处理类型转换)
Window Reduce 或者 如果需要window值可以Window Reduce+ProcessWindowFunction
Window Fold 或者 如果需要window值可以Window Reduce+ProcessWindowFunction
Window Reduce+ProcessWindowFunction
Window Fold+ProcessWindowFunction
Window Aggreate+ProcessWindowFunction
1、分类
DataStream
Map
DataStream → DataStream
FlatMap
DataStream → DataStream
Filter
DataStream → DataStream
KeyBy
DataStream → KeyedStream
WindowAll
DataStream → AllWindowedStream
Process 实例化ProcessFunction,对每个元素进行处理
DataStream → DataStream
keyedStream[都是有状态的聚合]
Reduce 对比(Window Reduce)
KeyedStream → DataStream
Fold 对比(Window Fold)
KeyedStream → DataStream
Aggregations(包括sum/min/max) 对比(Aggregations on windows)
KeyedStream → DataStream
Window
KeyedStream → WindowedStream
Process
KeyedStream → DataStream
window Stream[无状态的聚合]
Window Reduce 或者 如果需要window值可以Window Reduce+ProcessWindowFunction
WindowedStream → DataStream
Window Fold 或者 如果需要window值可以Window Reduce+ProcessWindowFunction
WindowedStream → DataStream
Aggregations on windows 或者 如果需要window值可以Window Reduce+ProcessWindowFunction
WindowedStream → DataStream
window Process(window窗口时间到了处理每个元素)
WindowedStream → DataStream
Window Apply(window窗口时间到了处理每个元素,属于传统的ProcessWindowFunction,没有 per-window keyed state)
WindowedStream → DataStream
AllWindowedStream → DataStream
重要案例
DataStream
Process
DataStream → DataStream
val processStream: DataStream[result] = dataStream
.process(new getAllFunction)
//将UserBehavior类转成result类
class getAllFunction extends ProcessFunction[UserBehavior, result] {
override def processElement(value: UserBehavior,
ctx: ProcessFunction[UserBehavior, result]#Context,
out: Collector[result]): Unit = {
//对每一个元素处理
value match {
case behavior: UserBehavior => {
out.collect(result(behavior.itemId, behavior.count))
}
case _ => print("no way")
}
}
}
WindowAll
DataStream → AllWindowedStream
val resultDataStream: DataStream[String] = processStream
.windowAll(TumblingProcessingTimeWindows.of(Time.seconds(5)))
.apply((_: TimeWindow, input: Iterable[result], out: Collector[String]) => {
out.collect(input.mkString(","))
})
resultDataStream.print()
//输出结果:result(1715,1),result(1715,1),result(1715,1),result(1716,1),result(1716,1)
keyedStream
参考window Stream中的keyby
window Stream
(1)、分组和非分组Windows。
keyby和windowAll,分组数据流将你的window计算通过多任务并发执行,以为每一个逻辑分组流在执行中与其他的逻辑分组流是独立地进行的。在windowAll非分组数据流中,你的原始数据流并不会拆分成多个逻辑流并且所有的window逻辑将在一个任务中执行,并发度为1。
(2)、预定义窗口分配器
滚动窗口
滚动事件时间窗口
input
.keyBy(<key selector>)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.<windowed transformation>(<window function>);
滚动处理时间窗口
input
.keyBy(<key selector>)
.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
.<windowed transformation>(<window function>);
滑动窗口
滑动事件时间窗口
input
.keyBy(<key selector>)
.window(SlidingEventTimeWindows.of(Time.seconds(10), Time.seconds(5)))
.<windowed transformation>(<window function>);
滑动处理时间窗口
input
.keyBy(<key selector>)
.window(SlidingProcessingTimeWindows.of(Time.seconds(10), Time.seconds(5)))
.<windowed transformation>(<window function>);
案例:滚动处理时间窗口
//windowStream
val windowStream: WindowedStream[(String, Long, Int), Tuple, TimeWindow] = textKeyStream.
window(TumblingProcessingTimeWindows.of(Time.seconds(10)))
// textKeyStream.print("windowStream:")
//windowStream:> (000002,1461756879000,1)
//windowStream:> (000002,1461756879001,1)
//windowStream:> (000002,1461756879002,1)
Window Reduce 或者 如果需要window值可以Window Reduce+ProcessWindowFunction
WindowedStream → DataStream
val reduceValue: DataStream[result] = dataStream
.process(new getLastFunction)
.keyBy("itemId")
.window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
.reduce { (v1, v2) => result(v1.itemId, v1.count + v2.count) }
reduceValue.print()
Window Reduce+ProcessWindowFunction
WindowedStream → DataStream :增加了window参数并转换了DataStream类型
val reduceWindowFunctionData: DataStream[String] = dataStream
.process(new getLastFunction)
.keyBy("itemId")
.window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
.reduce((v1, v2) => result(v1.itemId, v1.count + v2.count)
, (key: Tuple, window: TimeWindow, input: Iterable[result], out: Collector[String]) => {
input.foreach(ele=>out.collect((s"${window.getStart}, $ele")))
}
)
Window Fold 或者 如果需要window值可以Window
Fold+ProcessWindowFunction
WindowedStream → DataStream
val foldValue: DataStream[result] = dataStream
.process(new getLastFunction)
.keyBy("itemId")
.window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
.fold(result(111,333)){(original:result,ele:result)=>{
result(ele.itemId,original.count+ele.count)
}}
foldValue.print()
Window Fold+ProcessWindowFunction
WindowedStream → DataStream 每个初始化值只在一个key中最开始使用一次
val foldWindowFunctionData: DataStream[String] = dataStream
.process(new getLastFunction)
.keyBy("itemId")
.window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
.fold(result(111, 333), (original: result, ele: result) => {
result(ele.itemId, original.count + ele.count)
}, (key: Tuple, window: TimeWindow, input: Iterable[result], out: Collector[String]) => {
var ele = input.iterator.next()
out.collect((s"${window.getEnd}, $ele"))
})
Window Aggreate+ProcessWindowFunction
main{
val aggregateData: DataStream[String] = dataStream
.process(new getLastFunction)
.keyBy("itemId")
.window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
.aggregate(new CountAggregate,new MyProcessWindowFunction)
aggregateData.print()
}
case class result(itemId: Long, count: Long)
//AggregateFunction<IN, ACC, OUT>
//ACC createAccumulator(); 迭代状态的初始值
//ACC add(IN value, ACC accumulator); 每一条输入数据,和迭代数据如何迭代
//ACC merge(ACC a, ACC b); add方法后的迭代数据如何合并
//OUT getResult(ACC accumulator); 当前分区返回数据,对最终的迭代数据如何处理,并返回结果。
class CountAggregate extends AggregateFunction[result, Long, String] {
override def createAccumulator() = 6L
override def add(value: result, accumulator:Long) =
value.count+accumulator
override def getResult(accumulator: Long) = "windows count is:"+accumulator.toString
override def merge(a: Long, b: Long) =
a+b
}
class MyProcessWindowFunction extends ProcessWindowFunction[String, String, Tuple, TimeWindow] {
def process(key: Tuple, context: Context, input: Iterable[String], out: Collector[String]) = {
val count = input.iterator.next()
out.collect("window end is :"+context.window.getEnd+"key is :"+key+count)
}
}
输出结果:
window end is :1575213345000key is :(1715)windows count is:7
window end is :1575213345000key is :(1713)windows count is:7
window end is :1575213345000key is :(1716)windows count is:8
Window process 采用processWindowFunction
//windows process
val processData: DataStream[String] = dataStream
.process(new getLastFunction)
.keyBy("itemId")
.window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
//每个分区的数据进行聚合
.process(new ProcessProcessWindowFunction)
processData.print()
//key是keyBy的那个key
class ProcessProcessWindowFunction extends ProcessWindowFunction[result, String, Tuple, TimeWindow] {
def process(key: Tuple, context: Context, input: Iterable[result], out: Collector[String]) = {
input.foreach(ele => {
out.collect("window end is :" + context.window.getEnd + "key is :" + key)
})
}
}
Window apply 采用windowFunction 也是对每个元素进行处理,这是老版本的process方法,没有per-window-keyed-state
//windows apply
val windowApplyData: DataStream[String] = dataStream
.process(new getLastFunction)
.keyBy("itemId")
.window(TumblingProcessingTimeWindows.of(Time.seconds(15)))
.apply(new ApplyWindowFunction)
windowApplyData.print()
class ApplyWindowFunction extends WindowFunction[result, String, Tuple, TimeWindow] {
override def apply(key: Tuple, window: TimeWindow, input: Iterable[result], out: Collector[String]): Unit = {
input.foreach(ele => {
out.collect("window end is :" + window.getEnd + "key is :" + key)
})
}
}