文章目录
一、Window报错问题描述
在流计算项目中,使用Flink 1.9.0 blink-planner(即Blink SQL解析引擎)时,
我们添加了一个Kafka010TableSource数据源,以下是创建DDL(包括EventTime和watermark指定):
create table kafka_source(
`messageKey` VARBINARY,
message VARBINARY,
topic VARCHAR,
`partition` INT,
`offset` BIGINT,
id VARCHAR,
TsField TIMESTAMP,
WATERMARK FOR TsField AS withOffset(TsField, 1000)
) with (
type ='kafka010',
topic = 'Test',
`group.id` = 'groupId',
format = 'json'
);
在Connector Descript中指定RowTime Schema并创建Source:
var schemaDesc = new Schema()
schemaDesc = schemaDesc.field(wmAlias.getOrElse("watermark"), Types.SQL_TIMESTAMP).rowtime(new Rowtime()
.timestampsFromField(wmField.get)
.watermarksPeriodicBounded(wmOffet.get.toLong))
在执行以下window语句时报错:
SELECT id, TUMBLE_START(TsField, INTERVAL '4' SECOND) as windowStart, COUNT(*) AS countA FROM $tableName GROUP BY TUMBLE(TsField, INTERVAL '4' SECOND), id
错误信息:
org.apache.flink.table.api.ValidationException: Window can only be defined over a time attribute column.
这个报错信息是在源码中StreamLogicalWindowAggregateRule#getTimeFieldReference()方法里抛出来的,大致意思是说,调用window方法时,输入的字段必须是time类型的,接下来我对问题的原因进行分析
二、报错相关Flink源码分析
1、StreamLogicalWindowAggregateRule的作用
StreamLogicalWindowAggregateRule本身是一个Rule(作用于逻辑节点的规则),对calcite解析出来的逻辑计划,在父类LogicalWindowAggregateRuleBase#onMatch()方法中,进行一些聚合、window信息的解析和转换,最后仍然生成逻辑计划RelNode(转换后的)
StreamLogicalWindowAggregateRule规则的定义在FlinkStreamRuleSets的DEFAULT_REWRITE_RULES中,相关规则如下:
val DEFAULT_REWRITE_RULES: RuleSet = RuleSets.ofList((
PREDICATE_SIMPLIFY_EXPRESSION_RULES.asScala ++
REWRITE_COALESCE_RULES.asScala ++
REDUCE_EXPRESSION_RULES.asScala ++
List(
StreamLogicalWindowAggregateRule.INSTANCE,
ProjectToWindowRule.PROJECT,
WindowPropertiesRules.WINDOW_PROPERTIES_RULE,
WindowPropertiesRules.WINDOW_PROPERTIES_HAVING_RULE,
...
)