1. 数据类型
官网文档:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
Hive数据类型 | 长度 | 例子 |
---|---|---|
TINYINT | -128 to 127 | 20 |
SMALINT | -32,768 to 32,767 | 20 |
INT | -2,147,483,648 to 2,147,483,647 | 20 |
BIGINT | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | 20 |
BOOLEAN | 布尔类型 | TRUE FALSE |
FLOAT | 4-byte single precision floating point number | 3.14159 |
DOUBLE | 8-byte double precision floating point number | 3.14159 |
STRING | 字符 | ‘lalal” |
TIMESTAMP | 时间 | |
DATE | 时间 | |
BINARY | 字节数组 |
复杂类型数据:
arrays: ARRAY<data_type> (Note: negative values and non-constant expressions are allowed as of Hive 0.14.)
maps: MAP<primitive_type, data_type> (Note: negative values and non-constant expressions are allowed as of Hive 0.14.)
structs: STRUCT<col_name : data_type [COMMENT col_comment], …>
2. DDL
2.1 create table
创建的表和分区信息分别存储在元数据库的tbls和partitions中
mysql> select * from tbls;
+--------+-------------+-------+------------------+-------+------------+-----------+-------+-----------------+----------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | OWNER_TYPE | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+-----------------+----------------+--------------------+--------------------+
| 8 | 1577179125 | 6 | 0 | wzj | USER | 0 | 8 | test_w | EXTERNAL_TABLE | NULL | NULL |
| 9 | 1577179135 | 6 | 0 | wzj | USER | 0 | 9 | test_n | EXTERNAL_TABLE | NULL | NULL |
| 10 | 1577179616 | 6 | 0 | wzj | USER | 0 | 10 | test_cpy | MANAGED_TABLE | NULL | NULL |
| 11 | 1577179667 | 6 | 0 | wzj | USER | 0 | 11 | test_copy | MANAGED_TABLE | NULL | NULL |
| 12 | 1577181315 | 6 | 0 | wzj | USER | 0 | 12 | emp | MANAGED_TABLE | NULL | NULL |
| 16 | 1577243920 | 6 | 0 | wzj | USER | 0 | 16 | partitiom_test | MANAGED_TABLE | NULL | NULL |
| 17 | 1577252656 | 6 | 0 | wzj | USER | 0 | 18 | order_partition | MANAGED_TABLE | NULL | NULL |
+--------+-------------+-------+------------------+-------+------------+-----------+-------+-----------------+----------------+--------------------+--------------------+
7 rows in set (0.00 sec)
mysql> select * from partitions;
+---------+-------------+------------------+--------------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+--------------------------+-------+--------+
| 1 | 1577244373 | 0 | month=20191101/org=henan | 17 | 16 |
| 2 | 1577252735 | 0 | event_month=2014-05 | 19 | 17 |
+---------+-------------+------------------+--------------------------+-------+--------+
2 rows in set (0.00 sec)
- 默认创建的是内部表
- 内部表删除时: HDFS + META 都被删除
- 外部表删除时: HDFS不删除 仅META被删除
CREATE [EXTERNAL(外部表)] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type # 列的名称及类型 [column_constraint_specification] # 列的约束 [COMMENT col_comment], ...# 列的注释
[constraint_specification])] # 表的约束
[COMMENT table_comment]表的注释
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]# 分区
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]#创建分桶表
[SKEWED BY (col_name, col_name, ...) -- (No