Table 和 tableSink schema不匹配问题

最新推荐文章于 2023-02-01 13:43:00 发布

wuxuyang_7788

最新推荐文章于 2023-02-01 13:43:00 发布

阅读量3.6k

点赞数 1

分类专栏： Flink 大数据

本文链接：https://blog.csdn.net/wuxuyang_7788/article/details/103636950

版权

Flink 同时被 2 个专栏收录

13 篇文章 0 订阅

订阅专栏

大数据

13 篇文章 0 订阅

订阅专栏

使用SQL 查询获取一个table结果，但通过JsonRowSerializationSchem.Builder(string schema).build()构造出来的table schema 无法与之相匹配

背景做一个统计，随将Datastream 注册成了Table，并使用SQL获取一个简单的查询结果。最后将数据写入Es。
上代码：

public class AggregationFunction {
    public static void main(String[] args) {
        String sql = "SELECT count(*) as cnt, TUMBLE_START(rowtime, INTERVAL '10' SECOND) as tumTime FROM abc GROUP BY TUMBLE(rowtime, INTERVAL '10' SECOND)";
        StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment();
        senv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        StreamTableEnvironment tenv = StreamTableEnvironment.create(senv);
        DataStream<User> source = senv.addSource(new SourceFunction<User>() {
            @Override
            public void run(SourceContext<User> sourceContext) throws Exception {
                int i = 1000;
                String[] names = {"Hanmeimei", "Lilei"};
                while (i > 1) {
                    sourceContext.collect(new User(names[i%2], i, new Timestamp(System.currentTimeMillis())));
                    Thread.sleep(10);
                    i--;
                }
            }
            @Override
            public void cancel() {

            }
        });
        tenv.registerDataStream("abc", source, "name, age, timestamp, rowtime.rowtime");
        Table table = tenv.sqlQuery(sql);
        List<Host> hosts = Arrays.asList(new Host("10.20.128.210", 19201, "http"));
        TypeInformation<Row> typeInformation = JsonRowSchemaConverter.convert("{" +
                "    type:'object'," +
                "    properties:{" +
                "        cnt: {" +
                "            type: 'number'" +
                "        }," +
                "        tumTime:{" +
                "            type:'string'," +
                "            format:'date-time'" +
                "        }" +
                "    }" +
                "}");
        RowTypeInfo typeInfo = (RowTypeInfo) typeInformation;
        TypeInformation<?>[] typeInformations = typeInfo.getFieldTypes();

        String[] fieldNames = typeInfo.getFieldNames();
        TableSchema.Builder builder = TableSchema.builder();
        for (int i = 0; i < typeInformations.length; i ++) {
            builder.field(fieldNames[i], typeInformations[i]);
        }
        Elasticsearch6UpsertTableSink establesink = new Elasticsearch6UpsertTableSink(
                true,
                builder.build(),
                hosts,
                "aggregation",
                "data",
                "$",
                "n/a",
                new JsonRowSerializationSchema.Builder(typeInformation).build(),
                XContentType.JSON,
                new IgnoringFailureHandler(),
                new HashMap<>()
        );
        tenv.registerTableSink("aggregationTableSink", establesink);
        table.insertInto("aggregationTableSink");
    }


    @Data
    @AllArgsConstructor
    @NoArgsConstructor
    public static class User {
        private String name;

        private Integer age;

        private Timestamp timestamp;
    }

}

直接运行看看结果

Exception in thread "main" org.apache.flink.table.api.ValidationException: Field types of query result and registered TableSink [aggregationTableSink] do not match.
Query result schema: [cnt: Long, tumTime: Timestamp]
TableSink schema:    [cnt: BigDecimal, tumTime: Timestamp]

抛出异常schema类型不匹配，从日志也可以看出来，SQL

SELECT count(*) as cnt, TUMBLE_START(rowtime, INTERVAL '10' SECOND) as tumTime FROM abc GROUP BY TUMBLE(rowtime, INTERVAL '10' SECOND)

schema [cnt: Long, tumTime: Timestamp]
而使用的json描述的

{
    type:'object',
    properties:{
        cnt: {
            type: 'number'
        },
        tumTime:{
            type:'string',
            format:'date-time'
        }
    }
}

schema [cnt: BigDecimal, tumTime: Timestamp]
查看官网的JSON Format

JSON schema	Flink SQL
object	ROW
boolean	BOOLEAN
array	ARRAY[_]
number	DECIMAL
integer	DECIMAL
string	VARCHAR
string	with format: date-time TIMESTAMP
string	with format: date DATE
string	with format: time TIME
string	with encoding: base64 ARRAY[TINYINT]
null	NULL (unsupported yet)

我好像发现不得了的事情，不管是什么类型的数字(虽然只有两种number，integer)，都被转成了DECIMAL。这么说来我没法设置Long类型的schema字段咯？我当然不服我又去源码看看,没有贴出全部，贴出重要代码。有需要的可以查看源码JsonRowSchemaConverter.class#TypeInformation<?> convertType(String location, JsonNode node, JsonNode root)

private static final String TYPE_NULL = "null";
private static final String TYPE_BOOLEAN = "boolean";
private static final String TYPE_OBJECT = "object";
private static final String TYPE_ARRAY = "array";
private static final String TYPE_NUMBER = "number";
private static final String TYPE_INTEGER = "integer";
private static final String TYPE_STRING = "string";


        for (String type : types) {
				// set field type
				switch (type) {
					case TYPE_NULL:
						typeSet.add(Types.VOID);
						break;
					case TYPE_BOOLEAN:
						typeSet.add(Types.BOOLEAN);
						break;
					case TYPE_STRING:
						if (node.has(FORMAT)) {
							typeSet.add(convertStringFormat(location, node.get(FORMAT)));
						} else if (node.has(CONTENT_ENCODING)) {
							typeSet.add(convertStringEncoding(location, node.get(CONTENT_ENCODING)));
						} else {
							typeSet.add(Types.STRING);
						}
						break;
					case TYPE_NUMBER:
						typeSet.add(Types.BIG_DEC);
						break;
					case TYPE_INTEGER:
						// use BigDecimal for easier interoperability
						// without affecting the correctness of the result
						typeSet.add(Types.BIG_DEC);
						break;
					case TYPE_OBJECT:
						typeSet.add(convertObject(location, node, root));
						break;
					case TYPE_ARRAY:
						typeSet.add(convertArray(location, node, root));
						break;
					default:
						throw new IllegalArgumentException(
							"Unsupported type '" + node.get(TYPE).asText() + "' in node: " + location);
				}
			}
		}

果然跟文档上说的一样，TYPE_NUMBER，TYPE_INTEGER都被转成的decimal。现在我说它是Bug不过分吧？

如何解决

1 在SQL中使用cast()函数

SELECT cast(count(*) as decimal) as cnt, TUMBLE_START(rowtime, INTERVAL '10' SECOND) as tumTime FROM abc GROUP BY TUMBLE(rowtime, INTERVAL '10' SECOND)

2 不使用JSON string 来构造schema Table table = tenv.sqlQuery(sql);返回的table可以通过table.getSchema()获取schema。但注意有些类型是内部使用的，如InternalTypeInfo.注意转换(目前这个问题在1.8存在，1.9修改了使用DataType替代TypeInformation描述字段类型)。

wuxuyang_7788

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Table 和 tableSink schema不匹配问题

使用SQL 查询获取一个table结果，但通过JsonRowSerializationSchem.Builder(string schema).build()构造出来的table schema 无法与之相匹配背景做一个统计，随将Datastream 注册成了Table，并使用SQL获取一个简单的查询结果。最后将数据写入Es。上代码：public class AggregationFunct...
复制链接

扫一扫