Table 和 tableSink schema不匹配问题

使用SQL 查询获取一个table结果,但通过JsonRowSerializationSchem.Builder(string schema).build()构造出来的table schema 无法与之相匹配

背景 做一个统计,随将Datastream 注册成了Table,并使用SQL获取一个简单的查询结果。最后将数据写入Es。
上代码:

public class AggregationFunction {
    public static void main(String[] args) {
        String sql = "SELECT count(*) as cnt, TUMBLE_START(rowtime, INTERVAL '10' SECOND) as tumTime FROM abc GROUP BY TUMBLE(rowtime, INTERVAL '10' SECOND)";
        StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment();
        senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        StreamTableEnvironment tenv = StreamTableEnvironment.create(senv);
        DataStream<User> source = senv.addSource(new SourceFunction<User>() {
            @Override
            public void run(SourceContext<User> sourceContext) throws Exception {
                int i = 1000;
                String[] names = {"Hanmeimei", "Lilei"};
                while (i > 1) {
                    sourceContext.collect(new User(names[i%2], i, new Timestamp(System.currentTimeMillis())));
                    Thread.sleep(10);
                    i--;
                }
            }
            @Override
            public void cancel() {

            }
        });
        tenv.registerDataStream("abc", source, "name, age, timestamp, rowtime.rowtime");
        Table table = tenv.sqlQuery(sql);
        List<Host> hosts = Arrays.asList(new Host("10.20.128.210", 19201, "http"));
        TypeInformation<Row> typeInformation = JsonRowSchemaConverter.convert("{" +
                "    type:'object'," +
                "    properties:{" +
                "        cnt: {" +
                "            type: 'number'" +
                "        }," +
                "        tumTime:{" +
                "            type:'string'," +
                "            format:'date-time'" +
                "        }" +
                "    }" +
                "}");
        RowTypeInfo typeInfo = (RowTypeInfo) typeInformation;
        TypeInformation<?>[] typeInformations = typeInfo.getFieldTypes();

        String[] fieldNames = typeInfo.getFieldNames();
        TableSchema.Builder builder = TableSchema.builder();
        for (int i = 0; i < typeInformations.length; i ++) {
            builder.field(fieldNames[i], typeInformations[i]);
        }
        Elasticsearch6UpsertTableSink establesink = new Elasticsearch6UpsertTableSink(
                true,
                builder.build(),
                hosts,
                "aggregation",
                "data",
                "$",
                "n/a",
                new JsonRowSerializationSchema.Builder(typeInformation).build(),
                XContentType.JSON,
                new IgnoringFailureHandler(),
                new HashMap<>()
        );
        tenv.registerTableSink("aggregationTableSink", establesink);
        table.insertInto("aggregationTableSink");
    }


    @Data
    @AllArgsConstructor
    @NoArgsConstructor
    public static class User {
        private String name;

        private Integer age;

        private Timestamp timestamp;
    }

}

直接运行看看结果

Exception in thread "main" org.apache.flink.table.api.ValidationException: Field types of query result and registered TableSink [aggregationTableSink] do not match.
Query result schema: [cnt: Long, tumTime: Timestamp]
TableSink schema:    [cnt: BigDecimal, tumTime: Timestamp]

抛出异常schema类型不匹配,从日志也可以看出来,SQL

SELECT count(*) as cnt, TUMBLE_START(rowtime, INTERVAL '10' SECOND) as tumTime FROM abc GROUP BY TUMBLE(rowtime, INTERVAL '10' SECOND)

schema [cnt: Long, tumTime: Timestamp]
而使用的json描述的

{
    type:'object',
    properties:{
        cnt: {
            type: 'number'
        },
        tumTime:{
            type:'string',
            format:'date-time'
        }
    }
}

schema [cnt: BigDecimal, tumTime: Timestamp]
查看官网的JSON Format

JSON schemaFlink SQL
objectROW
booleanBOOLEAN
arrayARRAY[_]
numberDECIMAL
integerDECIMAL
stringVARCHAR
stringwith format: date-time TIMESTAMP
stringwith format: date DATE
stringwith format: time TIME
stringwith encoding: base64 ARRAY[TINYINT]
nullNULL (unsupported yet)

我好像发现不得了的事情,不管是什么类型的数字(虽然只有两种number,integer),都被转成了DECIMAL。这么说来我没法设置Long类型的schema字段咯?我当然不服我又去源码看看,没有贴出全部,贴出重要代码。有需要的可以查看源码JsonRowSchemaConverter.class#TypeInformation<?> convertType(String location, JsonNode node, JsonNode root)

private static final String TYPE_NULL = "null";
private static final String TYPE_BOOLEAN = "boolean";
private static final String TYPE_OBJECT = "object";
private static final String TYPE_ARRAY = "array";
private static final String TYPE_NUMBER = "number";
private static final String TYPE_INTEGER = "integer";
private static final String TYPE_STRING = "string";


        for (String type : types) {
				// set field type
				switch (type) {
					case TYPE_NULL:
						typeSet.add(Types.VOID);
						break;
					case TYPE_BOOLEAN:
						typeSet.add(Types.BOOLEAN);
						break;
					case TYPE_STRING:
						if (node.has(FORMAT)) {
							typeSet.add(convertStringFormat(location, node.get(FORMAT)));
						} else if (node.has(CONTENT_ENCODING)) {
							typeSet.add(convertStringEncoding(location, node.get(CONTENT_ENCODING)));
						} else {
							typeSet.add(Types.STRING);
						}
						break;
					case TYPE_NUMBER:
						typeSet.add(Types.BIG_DEC);
						break;
					case TYPE_INTEGER:
						// use BigDecimal for easier interoperability
						// without affecting the correctness of the result
						typeSet.add(Types.BIG_DEC);
						break;
					case TYPE_OBJECT:
						typeSet.add(convertObject(location, node, root));
						break;
					case TYPE_ARRAY:
						typeSet.add(convertArray(location, node, root));
						break;
					default:
						throw new IllegalArgumentException(
							"Unsupported type '" + node.get(TYPE).asText() + "' in node: " + location);
				}
			}
		}

果然跟文档上说的一样,TYPE_NUMBER,TYPE_INTEGER都被转成的decimal。现在我说它是Bug不过分吧?

如何解决

1 在SQL中使用cast()函数

SELECT cast(count(*) as decimal) as cnt, TUMBLE_START(rowtime, INTERVAL '10' SECOND) as tumTime FROM abc GROUP BY TUMBLE(rowtime, INTERVAL '10' SECOND)

2 不使用JSON string 来构造schema Table table = tenv.sqlQuery(sql);返回的table可以通过table.getSchema()获取schema。但注意有些类型是内部使用的,如InternalTypeInfo.注意转换(目前这个问题在1.8存在,1.9修改了使用DataType替代TypeInformation描述字段类型)。

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值