Hive Metastore 表结构

文章详细展示了HiveMetaStore中关于Catalogs、DBS、TBLS、PARTITION_KEYS、PARTITIONS、SDS等表的结构和内容,包括数据库和表的组织、分区信息以及序列化详情,揭示了Hive管理元数据的方式。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Hive MetaStore 的ER 图如下。
在这里插入图片描述
部分表结构和说明。

CTLGS(CATALOGS)

catalogs 可以隔离元数据。默认只有1行。一个 CATALOG 可以有多个数据库。

mysql> DESC CTLGS;
+--------------+---------------+------+-----+---------+-------+
| Field        | Type          | Null | Key | Default | Extra |
+--------------+---------------+------+-----+---------+-------+
| CTLG_ID      | bigint(20)    | NO   | PRI | NULL    |       |
| NAME         | varchar(256)  | YES  | UNI | NULL    |       |
| DESC         | varchar(4000) | YES  |     | NULL    |       |
| LOCATION_URI | varchar(4000) | NO   |     | NULL    |       |
+--------------+---------------+------+-----+---------+-------+

记录示例

mysql> SELECT * FROM CTLGS \G
*************************** 1. row ***************************
     CTLG_ID: 1
        NAME: hive
        DESC: Default catalog for Hive
LOCATION_URI: bos://bmr-rd-wh/houzhizhen/warehouse
1 row in set (0.00 sec)

DBS(数据库表)

每个数据库一条记录。

+-----------------+---------------+------+-----+---------+-------+
| Field           | Type          | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| DB_ID           | bigint(20)    | NO   | PRI | NULL    |       |
| DESC            | varchar(4000) | YES  |     | NULL    |       |
| DB_LOCATION_URI | varchar(4000) | NO   |     | NULL    |       |
| NAME            | varchar(128)  | YES  | MUL | NULL    |       |
| OWNER_NAME      | varchar(128)  | YES  |     | NULL    |       |
| OWNER_TYPE      | varchar(10)   | YES  |     | NULL    |       |
| CTLG_NAME       | varchar(256)  | NO   | MUL | hive    |       |
+-----------------+---------------+------+-----+---------+-------+
  • 记录示例
mysql> SELECT * FROM DBS WHERE NAME='tpcds_bos_orc_3' \G;
*************************** 1. row ***************************
          DB_ID: 7
           DESC: NULL
DB_LOCATION_URI: bos://bmr-rd-wh/houzhizhen/warehouse/tpcds_bos_orc_3.db
           NAME: tpcds_bos_orc_3
     OWNER_NAME: hive
     OWNER_TYPE: USER
      CTLG_NAME: hive

TBLS(表记录)

  • 表结构
mysql> DESC TBLS;
+--------------------+--------------+------+-----+---------+-------+
| Field              | Type         | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| TBL_ID             | bigint(20)   | NO   | PRI | NULL    |       |
| CREATE_TIME        | int(11)      | NO   |     | NULL    |       |
| DB_ID              | bigint(20)   | YES  | MUL | NULL    |       |
| LAST_ACCESS_TIME   | int(11)      | NO   |     | NULL    |       |
| OWNER              | varchar(767) | YES  |     | NULL    |       |
| OWNER_TYPE         | varchar(10)  | YES  |     | NULL    |       |
| RETENTION          | int(11)      | NO   |     | NULL    |       |
| SD_ID              | bigint(20)   | YES  | MUL | NULL    |       |
| TBL_NAME           | varchar(256) | YES  | MUL | NULL    |       |
| TBL_TYPE           | varchar(128) | YES  |     | NULL    |       |
| VIEW_EXPANDED_TEXT | mediumtext   | YES  |     | NULL    |       |
| VIEW_ORIGINAL_TEXT | mediumtext   | YES  |     | NULL    |       |
| IS_REWRITE_ENABLED | bit(1)       | NO   |     | b'0'    |       |
+--------------------+--------------+------+-----+---------+-------+
  • 记录示例
mysql> select * from TBLS WHERE DB_ID=7 AND TBL_NAME='store_sales' \G
*************************** 1. row ***************************
            TBL_ID: 42
       CREATE_TIME: 1688096043
             DB_ID: 7
  LAST_ACCESS_TIME: 0
             OWNER: hive
        OWNER_TYPE: USER
         RETENTION: 0
             SD_ID: 42
          TBL_NAME: store_sales
          TBL_TYPE: MANAGED_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
IS_REWRITE_ENABLED:  
1 row in set (0.00 sec)

PARTITION_KEYS(表的分区字段)

mysql> SELECT * FROM PARTITION_KEYS WHERE TBL_ID=42 ORDER BY INTEGER_IDX;
+--------+--------------+-----------------+-----------+-------------+
| TBL_ID | PKEY_COMMENT | PKEY_NAME       | PKEY_TYPE | INTEGER_IDX |
+--------+--------------+-----------------+-----------+-------------+
|     42 | NULL         | ss_sold_date_sk | bigint    |           0 |
+--------+--------------+-----------------+-----------+-------------+

PARTITIONS(分区信息)

+------------------+--------------+------+-----+---------+-------+
| Field            | Type         | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| PART_ID          | bigint(20)   | NO   | PRI | NULL    |       |
| CREATE_TIME      | int(11)      | NO   |     | NULL    |       |
| LAST_ACCESS_TIME | int(11)      | NO   |     | NULL    |       |
| PART_NAME        | varchar(767) | YES  | MUL | NULL    |       |
| SD_ID            | bigint(20)   | YES  | MUL | NULL    |       |
| TBL_ID           | bigint(20)   | YES  | MUL | NULL    |       |
+------------------+--------------+------+-----+---------+-------+
  • 记录示例
    可以看到,同一张表,不同分区的SD_ID 不一样。表的 SD_ID 是42,这两个分区的SD_ID分别为 43 和 44。
mysql> select * from PARTITIONS where TBL_ID=42 limit 2 \G
*************************** 1. row ***************************
         PART_ID: 1
     CREATE_TIME: 0
LAST_ACCESS_TIME: 0
       PART_NAME: ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
           SD_ID: 43
          TBL_ID: 42
*************************** 2. row ***************************
         PART_ID: 2
     CREATE_TIME: 0
LAST_ACCESS_TIME: 0
       PART_NAME: ss_sold_date_sk=2452641
           SD_ID: 44
          TBL_ID: 42

PARTITION_KEY_VALS

mysql> DESC PARTITION_KEY_VALS;
+--------------+--------------+------+-----+---------+-------+
| Field        | Type         | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| PART_ID      | bigint(20)   | NO   | PRI | NULL    |       |
| PART_KEY_VAL | varchar(256) | YES  |     | NULL    |       |
| INTEGER_IDX  | int(11)      | NO   | PRI | NULL    |       |
+--------------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
  • 记录示例
    以下记录说明,分区的值是 2452641,由于分区 KEY 是 ss_sold_date_sk。
mysql> select * from PARTITION_KEY_VALS where PART_ID=2 order by INTEGER_IDX;
+---------+--------------+-------------+
| PART_ID | PART_KEY_VAL | INTEGER_IDX |
+---------+--------------+-------------+
|       2 | 2452641      |           0 |
+---------+--------------+-------------+
1 row in set (0.00 sec)

SDS(序列化信息)

mysql> DESC SDS;
+---------------------------+---------------+------+-----+---------+-------+
| Field                     | Type          | Null | Key | Default | Extra |
+---------------------------+---------------+------+-----+---------+-------+
| SD_ID                     | bigint(20)    | NO   | PRI | NULL    |       |
| CD_ID                     | bigint(20)    | YES  | MUL | NULL    |       |
| INPUT_FORMAT              | varchar(4000) | YES  |     | NULL    |       |
| IS_COMPRESSED             | bit(1)        | NO   |     | NULL    |       |
| IS_STOREDASSUBDIRECTORIES | bit(1)        | NO   |     | NULL    |       |
| LOCATION                  | varchar(4000) | YES  |     | NULL    |       |
| NUM_BUCKETS               | int(11)       | NO   |     | NULL    |       |
| OUTPUT_FORMAT             | varchar(4000) | YES  |     | NULL    |       |
| SERDE_ID                  | bigint(20)    | YES  | MUL | NULL    |       |
+---------------------------+---------------+------+-----+---------+-------+
  • 记录示例
    只要表的字段不变,分区对应的 CD_ID 不变。
mysql> select * from SDS WHERE SD_ID in(42,43,44)\G
*************************** 1. row ***************************
                    SD_ID: 42
                    CD_ID: 42
             INPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
            IS_COMPRESSED:  
IS_STOREDASSUBDIRECTORIES:  
                 LOCATION: bos://bmr-rd-wh/houzhizhen/warehouse/tpcds_bos_orc_3.db/store_sales
              NUM_BUCKETS: -1
            OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                 SERDE_ID: 42
*************************** 2. row ***************************
                    SD_ID: 43
                    CD_ID: 42
             INPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
            IS_COMPRESSED:  
IS_STOREDASSUBDIRECTORIES:  
                 LOCATION: bos://bmr-rd-wh/houzhizhen/warehouse/tpcds_bos_orc_3.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
              NUM_BUCKETS: -1
            OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                 SERDE_ID: 43
*************************** 3. row ***************************
                    SD_ID: 44
                    CD_ID: 42
             INPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
            IS_COMPRESSED:  
IS_STOREDASSUBDIRECTORIES:  
                 LOCATION: bos://bmr-rd-wh/houzhizhen/warehouse/tpcds_bos_orc_3.db/store_sales/ss_sold_date_sk=2452641
              NUM_BUCKETS: -1
            OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
                 SERDE_ID: 44

CDS

从 SDS 表的每条记录都有一个 CD_ID。cds 表非常简单,只有一个字段。

+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| CD_ID | bigint(20) | NO   | PRI | NULL    |       |
+-------+------------+------+-----+---------+-------+

COLUMNS_V2

表的列信息

mysql> desc COLUMNS_V2;
+-------------+--------------+------+-----+---------+-------+
| Field       | Type         | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| CD_ID       | bigint(20)   | NO   | PRI | NULL    |       |
| COMMENT     | varchar(256) | YES  |     | NULL    |       |
| COLUMN_NAME | varchar(128) | NO   | PRI | NULL    |       |
| TYPE_NAME   | mediumtext   | YES  |     | NULL    |       |
| INTEGER_IDX | int(11)      | NO   |     | NULL    |       |
+-------------+--------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
mysql> select * from COLUMNS_V2 where CD_ID=42 ORDER BY INTEGER_IDX;
+-------+---------+-----------------------+--------------+-------------+
| CD_ID | COMMENT | COLUMN_NAME           | TYPE_NAME    | INTEGER_IDX |
+-------+---------+-----------------------+--------------+-------------+
|    42 | NULL    | ss_sold_time_sk       | bigint       |           0 |
|    42 | NULL    | ss_item_sk            | bigint       |           1 |
|    42 | NULL    | ss_customer_sk        | bigint       |           2 |
|    42 | NULL    | ss_cdemo_sk           | bigint       |           3 |
|    42 | NULL    | ss_hdemo_sk           | bigint       |           4 |
|    42 | NULL    | ss_addr_sk            | bigint       |           5 |
|    42 | NULL    | ss_store_sk           | bigint       |           6 |
|    42 | NULL    | ss_promo_sk           | bigint       |           7 |
|    42 | NULL    | ss_ticket_number      | bigint       |           8 |
|    42 | NULL    | ss_quantity           | int          |           9 |
|    42 | NULL    | ss_wholesale_cost     | decimal(7,2) |          10 |
|    42 | NULL    | ss_list_price         | decimal(7,2) |          11 |
|    42 | NULL    | ss_sales_price        | decimal(7,2) |          12 |
|    42 | NULL    | ss_ext_discount_amt   | decimal(7,2) |          13 |
|    42 | NULL    | ss_ext_sales_price    | decimal(7,2) |          14 |
|    42 | NULL    | ss_ext_wholesale_cost | decimal(7,2) |          15 |
|    42 | NULL    | ss_ext_list_price     | decimal(7,2) |          16 |
|    42 | NULL    | ss_ext_tax            | decimal(7,2) |          17 |
|    42 | NULL    | ss_coupon_amt         | decimal(7,2) |          18 |
|    42 | NULL    | ss_net_paid           | decimal(7,2) |          19 |
|    42 | NULL    | ss_net_paid_inc_tax   | decimal(7,2) |          20 |
|    42 | NULL    | ss_net_profit         | decimal(7,2) |          21 |
+-------+---------+-----------------------+--------------+-------------+
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值