Flink学习之窗口函数
window function定义了要对窗口中收集的数据做一个计算操作,主要可以分为两类:
-
增量聚合函数(Incremental aggregate function):
-
窗口不维护原始数据,只维护中间结果,每次基于中间结果合增量数据进行聚合。
-
每条数据到来就计算,保持一个简单的状态。
-
本文主要介绍AggregateFunction。

-
全窗口函数(Full window function)
-
窗口需要维护全部原始数据,窗口触发时进行全量聚合。
-
先把窗口的数据收集起来,等到计算的时候会遍历所有数据。
-
可以实现对窗口内的数据进行排序等需求。
-
本文主要介绍ProcessWindowFunction。

增量聚合函数:AggregateFunction案例
需求:每隔5秒钟,计算经过每个红外线测温仪的所有人的平均温度。
/**
*
* @param id 红外线测温仪ID
* @param name 通过每个红外线测温仪的人的姓名
* @param temp 通过红外线测温仪每个人的温度
*/
case class Person(id: String, name: String, temp: Double)
import org.apache.flink.api.scala._
val env = StreamExecutionEnvironment.getExecutionEnvironment
val ds = env.socketTextStream("bigdata1", 9999)
ds.filter(_ != "").map(x => {
val arr = x.split(",")
val id = arr(0)
val name = arr(1)
val temp = arr(2).toDouble
Person(id,name,temp)
}).keyBy(_.id).timeWindow(Time.seconds(5)).aggregate(
/**
* * @param <IN> The type of the values that are aggregated (input values) -> 被聚合的值的类型(输入值):Person
* * @param <ACC> The type of the accumulator (intermediate aggregate state). -> 累加器的类型(中间聚合状态):(String,Int,Double)
* * @param <OUT> The type of the aggregated result -> 聚合结果的类型 (String,Double)
*/
new AggregateFunction[Person,(String,Int,Double),(String,Double)] {
//初始化累加器
override def createAccumulator(): (String, Int, Double) = ("",0,0.0)
//将给定的输入值添加到给定的累加器,返回新的累加器
override def add(value: Person, accumulator: (String, Int, Double)): (String, Int, Double) = {
//通过每个红外线测温仪的总人数
val cnt1 = accumulator._2 + 1
//通过每个红外线测温仪的总温度
val cnt2 = accumulator._3 + value.temp
(value.id,cnt1,cnt2)
}
//定义聚合结果
override def getResult(accumulator: (String, Int, Double)): (String, Double) = {
//返回平均温度
val avg_temp = (accumulator._3/accumulator._2).formatted("%.2f").toDouble
(accumulator._1,avg_temp)
}
//合并两个累加器,返回一个具有合并状态的累加器。
override def merge(a: (String, Int, Double), b: (String, Int, Double)): (String, Int, Double) = {
val mergeCnt = a._2 + b._2
val mergeTemp = a._3 + b._3
(a._1,mergeCnt,mergeTemp)
}
}).print("每隔5秒钟,经过每个红外线测温仪的所有人的平均温度为:")
测试数据:
一号测温仪,A,38.0
一号测温仪,B,34.0
二号测温仪,F,36.0
一号测温仪,C,38.3
二号测温仪,D,37.1
二号测温仪,E,35.0
测试结果:

全窗口函数:ProcessWindowFunction案例
需求:计算5秒钟内通过每个红外线测温仪的所有人的平均温度。
/**
*
* @param id 红外线测温仪ID
* @param name 通过每个红外线测温仪的人的姓名
* @param temp 通过红外线测温仪每个人的温度
*/
case class Person(id: String, name: String, temp: Double)
import org.apache.flink.api.scala._
val env = StreamExecutionEnvironment.getExecutionEnvironment
val ds = env.socketTextStream("bigdata1", 9999)
ds.filter(_ != "").map(x => {
val arr = x.split(",")
val id = arr(0)
val name = arr(1)
val temp = arr(2).toDouble
Person(id, name, temp)
}).keyBy(_.id).timeWindow(Time.seconds(3))
.process[(String,Double)](
/**
* Base abstract class for functions that are evaluated over keyed (grouped)
* windows using a context for retrieving extra information.
*
* @tparam IN The type of the input value. 输入值类型 Person
* @tparam OUT The type of the output value. 输出值类型 (String,Double)
* @tparam KEY The type of the key. keyby分组的key类型
* @tparam W The type of the window. timeWindow窗口类型
*/
new ProcessWindowFunction[Person,(String,Double),String,TimeWindow] {
/**
* Evaluates the window and outputs none or several elements.
*
* @param key The key for which this window is evaluated. keyby分组的key类型
* @param context The context in which the window is being evaluated. 上下文类型
* @param elements The elements in the window being evaluated. 计算窗口中元素集合
* @param out A collector for emitting elements. 输出值类型
* @throws Exception The function may throw exceptions to fail the program and trigger recovery.
*/
override def process(key: String, context: Context, elements: Iterable[Person], out: Collector[(String, Double)]): Unit = {
//初始化总体温
var temp = 0.0
//通过每个测温仪的总人数
val cnt = elements.size
//遍历累加器
elements.foreach(per=>{
temp = temp + per.temp
})
//计算平均温度
val avg_temp = (temp/cnt).formatted("%.2f").toDouble
//输出结果
out.collect((key,avg_temp))
}
}).print("5秒内通过每个红外线测温仪的所有人的平均温度为:")
测试数据:
一号测温仪,A,38.0
一号测温仪,B,34.0
二号测温仪,F,36.0
一号测温仪,C,38.3
二号测温仪,D,37.1
二号测温仪,E,35.0
测试结果:
2927

被折叠的 条评论
为什么被折叠?



