使用SQL 查询获取一个table结果,但通过JsonRowSerializationSchem.Builder(string schema).build()构造出来的table schema 无法与之相匹配
背景 做一个统计,随将Datastream 注册成了Table,并使用SQL获取一个简单的查询结果。最后将数据写入Es。
上代码:
public class AggregationFunction {
public static void main(String[] args) {
String sql = "SELECT count(*) as cnt, TUMBLE_START(rowtime, INTERVAL '10' SECOND) as tumTime FROM abc GROUP BY TUMBLE(rowtime, INTERVAL '10' SECOND)";
StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment();
senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
StreamTableEnvironment tenv = StreamTableEnvironment.create(senv);
DataStream<User> source = senv.addSource(new SourceFunction<User>() {
@Override
public void run(SourceContext<User> sourceContext) throws Exception {
int i = 1000;
String[] names = {"Hanmeimei", "Lilei"};
while (i > 1) {
sourceContext.collect(new User(names[i%2], i, new Timestamp(System.currentTimeMillis())));
Thread.sleep(10);
i--;
}
}
@Override
public void cancel() {
}
});
tenv.registerDataStream("abc", source, "name, age, timestamp, rowtime.rowtime");
Table table = tenv.sqlQuery(sql);
List<Host> hosts = Arrays.asList(new Host("10.20.128.210", 19201, "http"));
TypeInformation<Row> typeInformation = JsonRowSchemaConverter.convert("{" +
" type:'object'," +
" properties:{" +
" cnt: {" +
" type: 'number'" +
" }," +
" tumTime:{" +
" type:'string'," +
" format:'date-time'" +
" }" +
" }" +
"}");
RowTypeInfo typeInfo = (RowTypeInfo) typeInformation;
TypeInformation<?>[] typeInformations = typeInfo.getFieldTypes();
String[] fieldNames = typeInfo.getFieldNames();
TableSchema.Builder builder = TableSchema.builder();
for (int i = 0; i < typeInformations.length; i ++) {
builder.field(fieldNames[i], typeInformations[i]);
}
Elasticsearch6UpsertTableSink establesink = new Elasticsearch6UpsertTableSink(
true,
builder.build(),
hosts,
"aggregation",
"data",
"$",
"n/a",
new JsonRowSerializationSchema.Builder(typeInformation).build(),
XContentType.JSON,
new IgnoringFailureHandler(),
new HashMap<>()
);
tenv.registerTableSink("aggregationTableSink", establesink);
table.insertInto("aggregationTableSink");
}
@Data
@AllArgsConstructor
@NoArgsConstructor
public static class User {
private String name;
private Integer age;
private Timestamp timestamp;
}
}
直接运行看看结果
Exception in thread "main" org.apache.flink.table.api.ValidationException: Field types of query result and registered TableSink [aggregationTableSink] do not match.
Query result schema: [cnt: Long, tumTime: Timestamp]
TableSink schema: [cnt: BigDecimal, tumTime: Timestamp]
抛出异常schema类型不匹配,从日志也可以看出来,SQL
SELECT count(*) as cnt, TUMBLE_START(rowtime, INTERVAL '10' SECOND) as tumTime FROM abc GROUP BY TUMBLE(rowtime, INTERVAL '10' SECOND)
schema [cnt: Long, tumTime: Timestamp]
而使用的json描述的
{
type:'object',
properties:{
cnt: {
type: 'number'
},
tumTime:{
type:'string',
format:'date-time'
}
}
}
schema [cnt: BigDecimal, tumTime: Timestamp]
查看官网的JSON Format
JSON schema | Flink SQL |
---|---|
object | ROW |
boolean | BOOLEAN |
array | ARRAY[_] |
number | DECIMAL |
integer | DECIMAL |
string | VARCHAR |
string | with format: date-time TIMESTAMP |
string | with format: date DATE |
string | with format: time TIME |
string | with encoding: base64 ARRAY[TINYINT] |
null | NULL (unsupported yet) |
我好像发现不得了的事情,不管是什么类型的数字(虽然只有两种number,integer),都被转成了DECIMAL。这么说来我没法设置Long类型的schema字段咯?我当然不服我又去源码看看,没有贴出全部,贴出重要代码。有需要的可以查看源码JsonRowSchemaConverter.class#TypeInformation<?> convertType(String location, JsonNode node, JsonNode root)
private static final String TYPE_NULL = "null";
private static final String TYPE_BOOLEAN = "boolean";
private static final String TYPE_OBJECT = "object";
private static final String TYPE_ARRAY = "array";
private static final String TYPE_NUMBER = "number";
private static final String TYPE_INTEGER = "integer";
private static final String TYPE_STRING = "string";
for (String type : types) {
// set field type
switch (type) {
case TYPE_NULL:
typeSet.add(Types.VOID);
break;
case TYPE_BOOLEAN:
typeSet.add(Types.BOOLEAN);
break;
case TYPE_STRING:
if (node.has(FORMAT)) {
typeSet.add(convertStringFormat(location, node.get(FORMAT)));
} else if (node.has(CONTENT_ENCODING)) {
typeSet.add(convertStringEncoding(location, node.get(CONTENT_ENCODING)));
} else {
typeSet.add(Types.STRING);
}
break;
case TYPE_NUMBER:
typeSet.add(Types.BIG_DEC);
break;
case TYPE_INTEGER:
// use BigDecimal for easier interoperability
// without affecting the correctness of the result
typeSet.add(Types.BIG_DEC);
break;
case TYPE_OBJECT:
typeSet.add(convertObject(location, node, root));
break;
case TYPE_ARRAY:
typeSet.add(convertArray(location, node, root));
break;
default:
throw new IllegalArgumentException(
"Unsupported type '" + node.get(TYPE).asText() + "' in node: " + location);
}
}
}
果然跟文档上说的一样,TYPE_NUMBER,TYPE_INTEGER都被转成的decimal。现在我说它是Bug不过分吧?
如何解决
1 在SQL中使用cast()函数
SELECT cast(count(*) as decimal) as cnt, TUMBLE_START(rowtime, INTERVAL '10' SECOND) as tumTime FROM abc GROUP BY TUMBLE(rowtime, INTERVAL '10' SECOND)
2 不使用JSON string 来构造schema Table table = tenv.sqlQuery(sql);
返回的table可以通过table.getSchema()
获取schema。但注意有些类型是内部使用的,如InternalTypeInfo.注意转换(目前这个问题在1.8存在,1.9修改了使用DataType替代TypeInformation描述字段类型)。