最近在开发程序的时候,有需要实时的计数处理,以及需要按照设计的窗口来补时长,在要求选用Flink处理后,程序大致架构设计入下:
其设计思路为:从kafk读取数据生成DataStream[Message]这样一个中间变量,然后将这个中间流分两个流,一个流用于实时计数,另一个流,按照互动窗口,按照EventTime将消息划分到不同的窗口,然后取出整个窗口的数据以及取出历史数据,形成完整窗口的数据,惊醒排序,然后对消息遍历,按窗口计算计算时间.
object CtiReportRealTime {
def main(args: Array[String]): Unit = {
val parameterTool = ParameterTool.fromArgs(args)
// 获取flink运行环境对象
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
// 设置sys.out 打印功能失效
env.getConfig.enableSysoutLogging()
// 设置flink重启策略
env.getConfig.setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 1000))
env.setStateBackend(new FsStateBackend("file:///tmp/flink-checkpoints"))
//设置每5s一个checkpoint
env.enableCheckpointing(5000)
// 设置参数全局可用
env.getConfig.setGlobalJobParameters(parameterTool)
// 设置时间特性为eventTime
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val flinkKafkaConsumer = new FlinkKafkaConsumer010[Message[Object]](
parameterTool.getRequired("input-topic"),
new MessageSchema,
parameterTool.getProperties
).assignTimestampsAndWatermarks(new CustomWatermarkExtractor)
// 从kafka中读取数据流实现
val inputStream: DataStream[Message[Object]] = env.addSource(flinkKafkaConsumer)
//计数数据流
val counterMap: DataStream[EventMap] =
inputStream.keyBy("mainType", "extType")
.flatMap((message: Message[Object], collector) => {
val proxy: ProxyImpl = new ProxyImpl
val eventMap: EventMap = proxy.count(message)
if (eventMap != null && !eventMap.isEmpty) {
collector.collect(eventMap)
}
}
)
val timerMap: DataStream[EventMap] =
inputStream.keyBy("mainType", "extType", "vccId", "modeParm")
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.apply(function = (tuple, timeWindow,
iterable: Iterable[Message[Object]], collect) => {
// 窗口起始时间
val startWindow = timeWindow.getStart
println(DateUtil.getFormatHour(new Date(startWindow)))
//窗口结束时间
val endWidow = timeWindow.getEnd
// 初始化代理工具类
val proxy = new ProxyImpl
val messages: java.util.ArrayList[Message[_]] = new java.util.ArrayList[Message[_]]()
iterable.foreach(message => messages.add(message))
val cacheMessages = proxy.read(messages.get(0), startWindow)
if (cacheMessages != null && cacheMessages.size() > 0) {
messages.addAll(cacheMessages)
}
var lastMessage: Message[_] = null
for (msg <- messages) {
val eventMap = proxy.timer(startWindow, endWidow, msg, lastMessage)
if (eventMap != null && !eventMap.isEmpty) {
collect.collect(eventMap)
}
lastMessage = msg
}
if (lastMessage != null) {
val eventMap = proxy.timer(startWindow, endWidow, proxy.createMessage(lastMessage), lastMessage)
if (eventMap != null && !eventMap.isEmpty) {
collect.collect(eventMap)
}
proxy.save(lastMessage, endWidow)
}
}
)
counterMap.print()
timerMap.print()
counterMap.addSink(new BeamSink[EventMap](new SimpleEventBeamFactory))
timerMap.addSink(new BeamSink[EventMap](new SimpleEventBeamFactory))
env.execute("cti report real time stream")
}
由于代码设计大量业务逻辑,本初只是自己对flink的应用,以及对重点代码进行记录,并不能直接运行.