点击上方蓝
字关注~
场景案例
先从一个实际业务场景理解Flink SQL中的撤回机制:设备状态上线/下线数量统计,上游采集设备状态发送到Kafka中,最开始是一个上线状态,此时统计到上线数量+1,过了一段时间该设备下线了,收到的下线的状态,那么此时应该是上线数量-1,下线数量+1,现在需要实现这样一个需求,看一下在Flink SQL里面如何实现
val env=StreamExecutionEnvironment.getExecutionEnvironment
val tabEnv=TableEnvironment.getTableEnvironment(env)
tabEnv.registerFunction("latestTimeUdf",newLatestTimeUdf())
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
val kafkaConfig=newProperties()
kafkaConfig.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092")
kafkaConfig.put(ConsumerConfig.GROUP_ID_CONFIG,"test1")
val consumer=newFlinkKafkaConsumer011[String]("topic1",newSimpleStringSchema,kafkaConfig)
val ds=env.addSource(consumer)
.map(x=>{
val a=x.split(",")
DevData(a(0),a(1).toInt,a(2).toLong)
}).assignTimestampsAndWatermarks(newBoundedOutOfOrdernessTimestampExtractor[DevData](Time.milliseconds(1000)){
overridedef extractTimestamp(element:DevData):Long= element.times
})
tabEnv.registerDataStream("tbl1",ds,'devId,'status,'times,'rt.rowtime)
val dw=tabEnv.sqlQuery(
"""
select st,count(*) from (
select latestTimeUdf(status,times) st,devId from tbl1 group by devId
) a group by st
""".stripMargin)
dw.writeToSink(newPaulRetractStreamTableSink)
env.execute()
自定义udf获取最新的设备状态
publicclassLates