flink sql 使用过程中rowtime与proctime的问题

flink sql 版本1.12.3

第一步:创建kafka table

String desc = "CREATE TABLE KafkaTable (\n" +
        "  ts as LOCALTIMESTAMP,\n" +
        "  `body` STRING ,\n"+
        "   WATERMARK FOR ts AS ts - INTERVAL '1' SECOND \n"+
        ") WITH (\n" +
        "  'connector' = 'kafka',\n" +
        "  'topic' = 'IRT_Dev_Status_STD',\n" +
        " 'properties.bootstrap.servers' = '10.70.27.2:9092',\n" +
        " 'properties.group.id' = 'testGroup'," +
        "  'value.format' = 'raw'\n" +
        ")";
 bsTableEnv.executeSql(desc );

第二步:从body中抽取出 deviceId,tenantId, deviceTypeId, Timestamp_CloudM2M

DataStream<Tuple1<String>> dss = bsTableEnv.toAppendStream(bsTableEnv.sqlQuery("select body from KafkaTable "), tupleType);
DataStream<Tuple4<String, String, String, Long>> tranDS = dss.map(new MapFunction<Tuple1<String>, Tuple4<String, String, String, Long>>() {
    DateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
    @Override
    public Tuple4<String, String, String, Long> map(Tuple1<String> stringTuple1) throws Exception {
        JSONObject js = new JSONObject(stringTuple1.f0);
        return new Tuple4<>(js.getString("deviceId"), js.getString("tenantId"), js.getString("deviceTypeId"), df.parse(js.getJSONObject("status").getString("Timestamp_CloudM2M")).getTime());
    }
})
bsTableEnv.createTemporaryView("deviceInfo",tranDS,$("deviceId"),$("tenantId"),$("deviceTypeId"),$("ts"));

第三步:按照时间窗口 每固定10s,统计一下没台deviceId上传的数据记录数

Table rs = bsTableEnv.sqlQuery("select count(*),deviceId,TUMBLE_START(ts, INTERVAL '10' SECOND) from  deviceInfo GROUP BY TUMBLE(ts, INTERVAL '10' SECOND),deviceId");
rs.execute().print();

以上是按照自己思路写的例子,但是得到正确结果的过程是坎坷的

执行第一个异常1:

Exception in thread "main" org.apache.flink.table.api.ValidationException: SQL validation failed. From line 1, column 91 to line 1, column 122: Cannot apply '$TUMBLE' to arguments of type '$TUMBLE(<BIGINT>, <INTERVAL SECOND>)'. Supported form(s): '$TUMBLE(<DATETIME>, <DATETIME_INTERVAL>)'
'$TUMBLE(<DATETIME>, <DATETIME_INTERVAL>, <TIME>)'
    at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:152)
    at org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:111)
    at org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:193)
    at org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78)
    at org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:639)
    at flink.demo.FlinkSqlKafka01.main(FlinkSqlKafka01.java:58)
Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1, column 91 to line 1, column 122: Cannot apply '$TUMBLE' to arguments of type '$TUMBLE(<BIGINT>, <INTERVAL SECOND>)'. Supported form(s): '$TUMBLE(<DATETIME>, <DATETIME_INTERVAL>)'

问题分析:按照异常的提示,修改TUMBLE中时间的类型为datatime

DataStream<Tuple4<String, String, String, Timestamp>> tranDS = dss.map(new MapFunction<Tuple1<String>, Tuple4<String, String, String, Timestamp>>() {
    DateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
    @Override
    public Tuple4<String, String, String, Timestamp> map(Tuple1<String> stringTuple1) throws Exception {
        JSONObject js = new JSONObject(stringTuple1.f0);
        return new Tuple4<>(js.getString("deviceId"), js.getString("tenantId"), js.getString("deviceTypeId"), Timestamp.valueOf(js.getJSONObject("status").getString("Timestamp_CloudM2M")));
    }
})

执行抛第二个异常2:

Exception in thread "main" org.apache.flink.table.api.TableException: Window aggregate can only be defined over a time attribute column, but TIMESTAMP(3) encountered.
    at org.apache.flink.table.planner.plan.rules.logical.StreamLogicalWindowAggregateRule.getInAggregateGroupExpression(StreamLogicalWindowAggregateRule.scala:50)
    at org.apache.flink.table.planner.plan.rules.logical.LogicalWindowAggregateRuleBase.onMatch(LogicalWindowAggregateRuleBase.scala:81)
    at org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
    at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)

问题分析2:看了下底层的逻辑发现TUMBLE中的字段类型必须是TimeIndicatorRelDataType 类型

val timeAttribute = windowExpression.operands.get(0)
if (!FlinkTypeFactory.isTimeIndicatorType(timeAttribute.getType)) {
  throw new TableException(s"Window aggregate can only be defined over a " +
    s"time attribute column, but ${timeAttribute.getType} encountered.")
}

def isTimeIndicatorType(relDataType: RelDataType): Boolean = relDataType match { case _: TimeIndicatorRelDataType => true case _ => false }

有通过看官方的使用例子

tableEnv.createTemporaryView("Orders", ds, $("user"), $("product"), $("amount"), $("proctime").proctime(), $("rowtime").rowtime());

然后接着寻找问题的方案,修改代码如下

bsTableEnv.createTemporaryView("deviceInfo",tranDS,$("deviceId"),$("tenantId"),$("deviceTypeId"),$("ts"),$("proctime").proctime(),$("rowtime").rowtime());
Table rs = bsTableEnv.sqlQuery("select count(*),deviceId,TUMBLE_START(proctime, INTERVAL '10' SECOND) from  deviceInfo GROUP BY TUMBLE(proctime, INTERVAL '10' SECOND),deviceId");

继续运行,结果还是异常3:

java.lang.NullPointerException
    at SourceConversion$11.processElement(Unknown Source)
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71)
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
    at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
    at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
    at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71)
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
    at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
    at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
    at SinkConversion$4.processElement(Unknown Source)
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71)
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
    at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
    at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
    at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
    at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollectWithTimestamp(StreamSourceContexts.java:310)
    at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collectWithTimestamp(StreamSourceContexts.java:409)
    at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordsWithTimestamps(AbstractFetcher.java:365)
    at org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:183)
    at org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.runFetchLoop(KafkaFetcher.java:142)
    at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:826)
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
    at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:215)

问题分析:可能是自己用的不对造成的,或者对proctime处理时间或rowtime事件时间理解的不到位造成的,然后接着寻找问题的答案

将代码修改为

bsTableEnv.createTemporaryView("deviceInfo",tranDS,$("deviceId"),$("tenantId"),$("deviceTypeId"),$("ts").rowtime(),$("proctime").proctime());

结果还是报异常4,然后继续尝试,当读到一篇文档收到了启示,

在使用 Event Time 时则有一个限制因为 Event Time 不像 Processing Time 那样是随拿随用。如果你要从 DataStream 去转化得到一个 Table,必须要提前保证原始的 DataStream 里面已经存在了 Record Timestamp 和 watermark。如果你想通过 TableSource 生成的,也一定要保证你要接入的一个数据里面存在一个类型为 long 或者 timestamp 的这么一个时间字段。具体来说,如果你要从 DataStream 去注册一个表,和 proctime 类似,你只需要加上“列名.rowtime”就可以。需要注意的是,如果你要用 Processing Time,必须保证你要新加的字段是整个 schema 中的最后一个字段,而 Event Time 的时候你其实可以去替换某一个已有的列,然后 Flink 会自动的把这一列转化成需要的 rowtime 这个类型。 如果是通过 TableSource 生成的,只需要实现 DefinedRowtimeAttributes 接口就可以了。需要说明的一点是,在 DataStream API 这一侧其实不支持同时存在多个 Event Time(rowtime),但是在 Table 这一层理论上可以同时存在多个 rowtime。因为 DefinedRowtimeAttributes 接口的返回值是一个对于 rowtime 描述的 List,即其实可以同时存在多个 rowtime 列,在将来可能会进行一些其他的改进,或者基于去做一些相应的优化。

然后继续修改代码的逻辑:

第一:对转换的datastream增加watermark

     

DataStream<Tuple4<String, String, String, Long>> tranDS = dss.map(new MapFunction<Tuple1<String>, Tuple4<String, String, String, Long>>() {
    DateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
    @Override
    public Tuple4<String, String, String, Long> map(Tuple1<String> stringTuple1) throws Exception {
        JSONObject js = new JSONObject(stringTuple1.f0);
        return new Tuple4<>(js.getString("deviceId"), js.getString("tenantId"), js.getString("deviceTypeId"), df.parse(js.getJSONObject("status").getString("Timestamp_CloudM2M")).getTime());
    }
}).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple4<String, String, String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(0))
        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple4<String, String, String, Long>>() {
            @Override
            public long extractTimestamp(Tuple4<String, String, String, Long> stringStringStringLongTuple4, long l) {
                return stringStringStringLongTuple4.f3;
            }
        }));

第二:创建table的时候增加事件时间和处理时间

  bsTableEnv.createTemporaryView("deviceInfo",tranDS,$("deviceId"),$("tenantId"),$("deviceTypeId"),$("ts"),$("proctime").proctime(),$("rowtime").rowtime());
//group by TUMBLE(proctime,INTERVAL '10' SECOND) ,deviceId)
  Table rs = bsTableEnv.sqlQuery("select count(*),deviceId,TUMBLE_START(rowtime, INTERVAL '10' SECOND) from  deviceInfo GROUP BY TUMBLE(rowtime, INTERVAL '10' SECOND),deviceId");
  rs.execute().print();

没想到奇迹发生了:

| +I |                    1 |          shipTest1_1_shipTest1 |     2021-05-18T01:27:10 |
| +I |                    1 |          shipTest6_1_shipTest6 |     2021-05-18T01:27:10 |
| +I |                    1 |          shipTest5_1_shipTest5 |     2021-05-18T01:27:10 |
| +I |                    1 |  3192200500001_1_3192200500001 |     2021-05-18T01:27:10 |
| +I |                    2 |          shipTest8_1_shipTest8 |     2021-05-18T01:27:10 |
| +I |                    1 |          shipTest4_1_shipTest4 |     2021-05-18T01:27:10 |
| +I |                    1 |          shipTest9_1_shipTest9 |     2021-05-18T01:27:10 |
| +I |                   12 |  1217740007674_1_1217740007674 |     2021-05-18T01:27:10 |
| +I |                    1 |          shipTest2_1_shipTest2 |     2021-05-18T01:27:10 |
| +I |                    1 |          shipTest7_1_shipTest7 |     2021-05-18T01:27:10 |
| +I |                    1 |        shipTest10_1_shipTest10 |     2021-05-18T01:27:10 |
| +I |                    1 |          shipTest3_1_shipTest3 |     2021-05-18T01:27:10 |
| +I |                    1 |  3192200500001_1_3192200500001 |     2021-05-18T01:27:20 |
| +I |                   10 |  1217740007674_1_1217740007674 |     2021-05-18T01:27:20 |

执行成功了,

 

总结:使用一项新技术总会遇到一些坑或者问题,只要坚持不懈,总会找到解决方案

 

 

 

 

  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值