创建表
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
[(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]]
[
[ROW FORMAT row_format]
[STORED AS file_format]
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
[AS select_statement];
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION
| STRING
| BINARY
| TIMESTAMP
| DECIMAL
| DECIMAL(precision, scale)
| DATE
| VARCHAR
| CHAR
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
: UNIONTYPE < data_type, data_type, ... >
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char]
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
: SEQUENCEFILE
| TEXTFILE
| RCFILE
| ORC
| PARQUET
| AVRO
| JSONFILE
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
column_constraint_specification:
: [ PRIMARY KEY|UNIQUE|NOT NULL|DEFAULT [default_value]|CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]
default_value:
: [ LITERAL|CURRENT_USER()|CURRENT_DATE()|CURRENT_TIMESTAMP()|NULL ]
constraint_specification:
: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
[, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE
[, CONSTRAINT constraint_name UNIQUE (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
[, CONSTRAINT constraint_name CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]
CREATE TABLE 创建一个具有给定名称的表。
如果已存在同名的表或视图,则会引发错误。可以使用 IF NOT EXISTS 跳过错误
表名和列名不区分大小写,但属性名称区分大小写。
EXTERNAL:创建外部表
PARTITIONED BY:创建分区表
CLUSTERED BY (col_name, col_name, …) [SORTED BY (col_name [ASC|DESC], …)] INTO num_buckets BUCKETS:创建分桶(排序)表
SKEWED BY (col_name, col_name, …) ON ((col_value, col_value, …), (col_value, col_value, …), …) [STORED AS DIRECTORIES] :建表解决数据倾斜,STORED AS DIRECTORIES是可选择的,它指定了使用列桶表
ROW FORMAT row_format:定义行的格式
STORED AS file_format:定义文件格式
LOCATION hdfs_path:指定表的位置信息
TBLPROPERTIES (property_name=property_value, …):额外属性
AS select_statement:通过查询语句建表
管理表和外部表
管理表
默认创建的表都是管理表,也称为内部表。Hive默认情况下会将数据存储在hive.metastore.warehouse.dir所定义目录的子目录下。当删除一个管理表时,Hive会删除这个表中的数据及元数据。
外部表
删除外部表时,不会删除这份数据,但是描述表的元数据信息会被删除。
管理表与外部表的相互转换
查询表的类型
desc formatted table_name;
修改管理表为外部表
alter table table_name set tblproperties('EXTERNAL'='TRUE');
修改外部表为管理表
alter table table_name set tblproperties('EXTERNAL'='FALSE');
ROW FORMAT row_format
创建数据格式是每个字段以,为分隔符的表table1
CREATE TABLE IF NOT EXISTS TABLE1(id int,name string) DELIMITED FIELDS TERMINATED BY ',';