Hive入门之DDL-Table语句汇总1

DDL-table

CREATE

语句

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

说明
Table的类型
CREATE TEMPORARY TABLE:临时表,只存在于当前会话的表,一旦会话关闭,则表被删除
CREATE EXTERNAL TABLE:外部表,需要单独指定文件路径,删除表的时候,数据不会被删除
CREATE TABLE:内部表,删除表的时候,hdfs中数据文件也删除了

ROW FORMAT

row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]]
[COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] – (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES
(property_name=property_value, property_name=property_value, …)]

表示表中每一行的格式,可以取DELIMITED和SERDE两个值
ROW FORMAT DELIMITED :表示每一行是按照一定的分隔符分隔开的格式
例如: ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ —> 表示每一行的每一个字段是按照逗号分隔开的
ROW FORMAT SERDE: 表示自定义每一行的格式,支持内置Avro、ORC、RegEx、Thrift、Parquet、CSV、JsonSerDe及继承自定义类型(继承抽象类AbstractSerDe, 实现initialize, deserialize)

SerDe 是 “Serializer and Deserializer.” 的缩写
Hive uses SerDe (andFileFormat) to read and write table rows.
读的过程步骤:HDFS files --> InputFileFormat --> <key, value> --> Deserializer --> Row object
写的过程步骤:Row object --> Serializer --> <key, value> --> OutputFileFormat --> HDFS files

Storage Format
默认的存储格式是text format
其他存储格式见下方官方文档引用

file_format: : SEQUENCEFILE
| TEXTFILE – (Default, dependingon hive.default.fileformat configuration)
| RCFILE – (Note:Available in Hive 0.6.0 and later)
| ORC – (Note: Available in Hive 0.11.0 and later)
| PARQUET – (Note: Available in Hive 0.13.0 and later)
| AVRO – (Note: Available in Hive 0.14.0 and later)
| JSONFILE – (Note: Available in Hive 4.0.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

Partitioned By
创建分区表
分区字段对应Hdfs路径的目录层级
例如:PARTITIONED BY(year string, month string, day string)
对应hdfs路径为hdfs:///year=2019/month=06/day=01 (三级目录)
PARTITIONED BY( day string)
对应hdfs路径为hdfs://
/day=2019-06-01 (一级目录)

分区操作命令总结
1、查看分区
命令:SHOW PARTITIONS 库名.表名;
可以查看表对应的所有分区
2、分区重命名
命令:ALTER TABLE 库名.表名 PARTITION (day=2019-06-01) RENAME TO PARTITION (day=2018-06-01);
外部表分区重命名,不会改变数据文件的路径
内部表的话,会改变数据文件的路径
3、修复分区
命令:msck repair table 库名.表名
修复在对应HDFS路径上有文件而在metastore中没有对应分区的分区
修复参数:
修复中还可以设置批量修复分区数的参数(set hive.msck.repair.batch.size=10)来提高修复效率,参数默认值是0,表示一次修复所有分区,当分区很多时会比较慢。
4、删除分区
命令:ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, …] [PURGE];

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值