1.系统表概述
系统表包含每个表的元数据信息,例如创建的快照和使用的配置,可以通过批处理查询访问系统表。
目前,Flink、Spark和Trino支持查询系统表,在某些情况下,表名需要用反引号括起来,以避免语法解析冲突:
SELECT * FROM my_catalog.my_db.`MyTable$snapshots`;
2.Snapshots Table
可以通过快照表查询表的快照历史信息,包括快照中的record数量。
SELECT * FROM MyTable$snapshots;
/*
+--------------+------------+-----------------+-------------------+--------------+-------------------------+--------------------------------+------------------------------- +--------------------------------+---------------------+---------------------+-------------------------+-------------------+--------------------+----------------+
| snapshot_id | schema_id | commit_user | commit_identifier | commit_kind | commit_time | base_manifest_list | delta_manifest_list | changelog_manifest_list | total_record_count | delta_record_count | changelog_record_count | added_file_count | delete_file_count | watermark |
+--------------+------------+-----------------+-------------------+--------------+-------------------------+--------------------------------+------------------------------- +--------------------------------+---------------------+---------------------+-------------------------+-------------------+--------------------+----------------+
| 2 | 0 | 7ca4cd28-98e... | 2 | APPEND | 2022-10-26 11:44:15.600 | manifest-list-31323d5f-76e6... | manifest-list-31323d5f-76e6... | manifest-list-31323d5f-76e6... | 2 | 2 | 0 | 2 | 0 | 1666755855600 |
| 1 | 0 | 870062aa-3e9... | 1 | APPEND | 2022-10-26 11:44:15.148 | manifest-list-31593d5f-76e6... | manifest-list-31593d5f-76e6... | manifest-list-31593d5f-76e6... | 1 | 1 | 0 | 1 | 0 | 1666755855148 |
+--------------+------------+-----------------+-------------------+--------------+-------------------------+--------------------------------+------------------------------- +--------------------------------+---------------------+---------------------+-------------------------+-------------------+--------------------+----------------+
2 rows in set
*/
通过查询快照表,可以了解有关该表的提交和过期信息以及通过时间旅行查询数据的筛选信息。
3.Schemas Table
可以通过模式表查询表的历史模式。
SELECT * FROM MyTable$schemas;
/*
+-----------+--------------------------------+----------------+--------------+---------+---------+-------------------------+
| schema_id | fields | partition_keys | primary_keys | options | comment | update_time |
+-----------+--------------------------------+----------------+--------------+---------+---------+-------------------------+
| 0 | [{"id":0,"name":"word","typ... | [] | ["word"] | {} | | 2022-10-28 11:44:20.600 |
| 1 | [{"id":0,"name":"word","typ... | [] | ["word"] | {} | | 2022-10-27 11:44:15.600 |
| 2 | [{"id":0,"name":"word","typ... | [] | ["word"] | {} | | 2022-10-26 11:44:10.600 |
+-----------+--------------------------------+----------------+--------------+---------+---------+-------------------------+
3 rows in set
*/
可以 Join 快照表和模式表,以获取给定快照的字段信息。
SELECT s.snapshot_id, t.schema_id, t.fields
FROM MyTable$snapshots s JOIN MyTable$schemas t
ON s.schema_id=t.schema_id where s.snapshot_id=100;
4.Options Table
可以通过选项表查询从DDL指定的表的配置信息,未显示的配置将是默认值。
SELECT * FROM MyTable$options;
/*
+------------------------+--------------------+
| key | value |
+------------------------+--------------------+
| snapshot.time-retained | 5 h |
+------------------------+--------------------+
1 rows in set
*/
5.Audit log Table
如果需要查询表的changelog,可以使用audit_log系统表,通过audit_log表,可以在获取表的增量数据时获取rowkind列,可以使用此列进行筛选和其他操作。
rowkind
有四个值:
+I
:插入操作。-U
:使用更新行的上一个内容更新操作。+U
:使用更新行的新内容更新操作。-D
:删除操作。
SELECT * FROM MyTable$audit_log;
/*
+------------------+-----------------+-----------------+
| rowkind | column_0 | column_1 |
+------------------+-----------------+-----------------+
| +I | ... | ... |
+------------------+-----------------+-----------------+
| -U | ... | ... |
+------------------+-----------------+-----------------+
| +U | ... | ... |
+------------------+-----------------+-----------------+
3 rows in set
*/
6.Read-optimized Table
如果需要极端的读取性能,并且可以接受读取到略微旧的数据【不是最新】,可以使用ro
(读取优化)系统表,读取优化系统表仅通过扫描不需要合并的文件来提高读取性能。
对于主键表,ro
系统表仅扫描最高级别的文件,也就是说,ro
系统表只产生最新的full compaction的结果【即新来的数据可能还没合并所以无法读取到】。
注意:不同的桶可能在不同的时间进行full compaction,因此不同键的值可能来自不同的快照。
对于append tables,由于所有文件都可以在不合并的情况下读取,ro
系统表的作用就像普通的append tables一样。
SELECT * FROM MyTable$ro;
7.Files Table
可以使用特定的快照查询表的文件。
-- Query the files of latest snapshot
SELECT * FROM MyTable$files;
/*
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| partition | bucket | file_path | file_format | schema_id | level | record_count | file_size_in_bytes | min_key | max_key | null_value_counts | min_value_stats | max_value_stats | min_sequence_number | max_sequence_number | creation_time |
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| [3] | 0 | data-8f64af95-29cc-4342-adc... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=3, val=33, word=c} | {cnt=3, val=33, word=c} | 1691551246234 | 1691551246637 |2023-02-24T16:06:21.166|
| [2] | 0 | data-8b369068-0d37-4011-aa5... | orc | 0 | 0 | 1 | 593 | [b] | [b] | {cnt=0, val=0, word=0} | {cnt=2, val=22, word=b} | {cnt=2, val=22, word=b} | 1691551246233 | 1691551246732 |2023-02-24T16:06:21.166|
| [2] | 0 | data-83aa7973-060b-40b6-8c8... | orc | 0 | 0 | 1 | 605 | [d] | [d] | {cnt=0, val=0, word=0} | {cnt=2, val=32, word=d} | {cnt=2, val=32, word=d} | 1691551246267 | 1691551246798 |2023-02-24T16:06:21.166|
| [5] | 0 | data-3d304f4a-bcea-44dc-a13... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=5, val=51, word=c} | {cnt=5, val=51, word=c} | 1691551246788 | 1691551246152 |2023-02-24T16:06:21.166|
| [1] | 0 | data-10abb5bc-0170-43ae-b6a... | orc | 0 | 0 | 1 | 595 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=1, val=11, word=a} | {cnt=1, val=11, word=a} | 1691551246722 | 1691551246273 |2023-02-24T16:06:21.166|
| [4] | 0 | data-2c9b7095-65b7-4013-a7a... | orc | 0 | 0 | 1 | 593 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=4, val=12, word=a} | {cnt=4, val=12, word=a} | 1691551246321 | 1691551246109 |2023-02-24T16:06:21.166|
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
6 rows in set
*/
-- You can also query the files with specific snapshot
SELECT * FROM MyTable$files /*+ OPTIONS('scan.snapshot-id'='1') */;
/*
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| partition | bucket | file_path | file_format | schema_id | level | record_count | file_size_in_bytes | min_key | max_key | null_value_counts | min_value_stats | max_value_stats | min_sequence_number | max_sequence_number | creation_time |
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
| [3] | 0 | data-8f64af95-29cc-4342-adc... | orc | 0 | 0 | 1 | 593 | [c] | [c] | {cnt=0, val=0, word=0} | {cnt=3, val=33, word=c} | {cnt=3, val=33, word=c} | 1691551246234 | 1691551246637 |2023-02-24T16:06:21.166|
| [2] | 0 | data-8b369068-0d37-4011-aa5... | orc | 0 | 0 | 1 | 593 | [b] | [b] | {cnt=0, val=0, word=0} | {cnt=2, val=22, word=b} | {cnt=2, val=22, word=b} | 1691551246233 | 1691551246732 |2023-02-24T16:06:21.166|
| [1] | 0 | data-10abb5bc-0170-43ae-b6a... | orc | 0 | 0 | 1 | 595 | [a] | [a] | {cnt=0, val=0, word=0} | {cnt=1, val=11, word=a} | {cnt=1, val=11, word=a} | 1691551246267 | 1691551246798 |2023-02-24T16:06:21.166|
+-----------+--------+--------------------------------+-------------+-----------+-------+--------------+--------------------+---------+---------+------------------------+-------------------------+-------------------------+---------------------+---------------------+-----------------------+
3 rows in set
*/
8.Tags Table
可以通过标签表查询表的标签历史信息,包括哪些快照是基于标签的快照以及快照的一些历史信息,还可以按名称获取所有标签名称和特定标签数据的时间旅行。
SELECT * FROM MyTable$tags;
/*
+----------+-------------+-----------+-------------------------+--------------+
| tag_name | snapshot_id | schema_id | commit_time | record_count |
+----------+-------------+-----------+-------------------------+--------------+
| tag1 | 1 | 0 | 2023-06-28 14:55:29.344 | 3 |
| tag3 | 3 | 0 | 2023-06-28 14:58:24.691 | 7 |
+----------+-------------+-----------+-------------------------+--------------+
2 rows in set
*/
9.Consumers Table
可以查询包含下一个快照的所有消费者。
SELECT * FROM MyTable$consumers;
/*
+-------------+------------------+
| consumer_id | next_snapshot_id |
+-------------+------------------+
| id1 | 1 |
| id2 | 3 |
+-------------+------------------+
2 rows in set
*/
10.Manifests Table(清单表)
可以查询最新快照或当前表的指定快照中包含的所有清单文件。
-- Query the manifest of latest snapshot
SELECT * FROM MyTable$manifests;
/*
+--------------------------------+-------------+------------------+-------------------+---------------+
| file_name | file_size | num_added_files | num_deleted_files | schema_id |
+--------------------------------+-------------+------------------+-------------------+---------------+
| manifest-f4dcab43-ef6b-4713... | 12365| 40 | 0 | 0 |
| manifest-f4dcab43-ef6b-4713... | 1648 | 1 | 0 | 0 |
+--------------------------------+-------------+------------------+-------------------+---------------+
2 rows in set
*/
-- You can also query the manifest with specified snapshot
SELECT * FROM MyTable$manifests /*+ OPTIONS('scan.snapshot-id'='1') */;
/*
+--------------------------------+-------------+------------------+-------------------+---------------+
| file_name | file_size | num_added_files | num_deleted_files | schema_id |
+--------------------------------+-------------+------------------+-------------------+---------------+
| manifest-f4dcab43-ef6b-4713... | 12365| 40 | 0 | 0 |
+--------------------------------+-------------+------------------+-------------------+---------------+
1 rows in set
*/
11.Aggregation fields Table
可以通过聚合字段表查询表的历史聚合信息。
SELECT * FROM MyTable$aggregation_fields;
/*
+------------+-----------------+--------------+--------------------------------+---------+
| field_name | field_type | function | function_options | comment |
+------------+-----------------+--------------+--------------------------------+---------+
| product_id | BIGINT NOT NULL | [] | [] | <NULL> |
| price | INT | [true,count] | [fields.price.ignore-retrac... | <NULL> |
| sales | BIGINT | [sum] | [fields.sales.aggregate-fun... | <NULL> |
+------------+-----------------+--------------+--------------------------------+---------+
3 rows in set
*/
12.Partitions Table
可以查询表的分区文件。
SELECT * FROM MyTable$partitions;
/*
+---------------+----------------+--------------------+
| partition | record_count | file_size_in_bytes|
+---------------+----------------+--------------------+
| [1] | 1 | 645 |
+---------------+----------------+--------------------+
*/
13.Global System Table
a) 概述
全局系统表包含paimon中存在的所有表的统计信息,为了方便搜索,创建了一个名为sys
的数据库。
USE sys;
SHOW TABLES;
b) ALL Options Table
显示数据库的表配置。
SELECT * FROM sys.all_table_options;
/*
+---------------+--------------------------------+--------------------------------+------------------+
| database_name | table_name | key | value |
+---------------+--------------------------------+--------------------------------+------------------+
| my_db | Orders_orc | bucket | -1 |
| my_db | Orders2 | bucket | -1 |
| my_db | Orders2 | sink.parallelism | 7 |
| my_db2| OrdersSum | bucket | 1 |
+---------------+--------------------------------+--------------------------------+------------------+
7 rows in set
*/
c) Catalog Options Table
可以通过Catalog配置表查询Catalog的选项信息,未显示的选项将是默认值。
SELECT * FROM sys.catalog_options;
/*
+-----------+---------------------------+
| key | value |
+-----------+---------------------------+
| warehouse | hdfs:///path/to/warehouse |
+-----------+---------------------------+
1 rows in set
*/