- flink三种窗口函数(时间窗口函数、计数窗口函数、会话窗口函数)
- 时间窗口函数是整点起,左闭右开时间窗口,默认是早上08:00。如果想统计昨天一天00:00-24:00整点时间,可以使用偏移量设置:window(TumblingEventTimeWindows.of(Time.days(1), Time.hours(-8)))
- 会话窗口不受起点时间限制,可以记录一段间隔时间内发生的事情
package com.transform
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.EventTimeSessionWindows
import org.apache.flink.streaming.api.windowing.time.Time
object WindowDemo {
case class User(id: Int, sex: String, name: String, age: Int, ts: Long)
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(1)
val dataStream = env.readTextFile("src/main/resources/window.csv")
.map(x => {
val arr = x.split(",")
User(arr(0).toInt, arr(1), arr(2), arr(3).toInt, arr(4).toLong)
})
.assignAscendingTimestamps(_.ts * 1000L)
.keyBy(_.sex)
val tumblingWindowDataStream = dataStream
.timeWindow(Time.seconds(3))
.sum(3)
.print("时间滚动窗口")
val sliceWindowDataStream = dataStream
.timeWindow(Time.seconds(10), Time.seconds(5))
.sum(3)
val cTumblingWindowDataStream = dataStream
.timeWindow(Time.seconds(5))
.sum(3)
val cSliceWindowDataStream = dataStream
.timeWindow(Time.seconds(10), Time.seconds(5))
.sum(3)
val sectionWindowDataStream = dataStream
.window(EventTimeSessionWindows.withGap(Time.seconds(1)))
.sum(3)
env.execute("Window Job")
}
}
1,男,张三,10,1609776001
2,男,刚刚,10,1609776002
3,男,熊熊,10,1609776003
4,男,李四,10,1609776004
5,男,王五,10,1609776005
6,男,小明,10,1609776006
7,男,小明,10,1609776007
8,男,小明,10,1609776008
9,男,小明,10,1609776009
10,男,小明,10,1609776010
11,女,红红,20,1609776011
时间滚动窗口> User(1,男,张三,20,1609776001)
时间滚动窗口> User(3,男,熊熊,30,1609776003)
时间滚动窗口> User(6,男,小明,30,1609776006)
时间滚动窗口> User(9,男,小明,20,1609776009)
时间滚动窗口> User(11,女,红红,20,1609776011)