002、DML
符号解释:
( ):里面的参数选择一个作为此次命令参数。
[ ]: 里面的参数可选可不选。
| :
几个优化设置:
set hive.cli.print.header=true; : 查询语句显示表头
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
参数解释:
LOCAL:添加此参数代表是从本地文件装载数据,不添加代表从HDFS上面装载
filepath:如果这里是本地路径,那么数据被上传到到HDFS上来。而如果这是HDFS路径,则数据只是被转移到新的路径下。
OVERWRITE:是否覆盖表中原有数据
PARTITION:是否包含分区信息,如果表具有分区,而load命令没有分区,则加载将转换为INSERT AS SELECT,并假设最后一组列是分区列。
dfs -get <tablePath> <localPath>
因为HIVe表的数据就保存对应的HDFS目录下,直接复制出就行
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 [ROW FORMAT row_format] [STORED AS file_format] SELECT ... FROM ...
LOCAL:指定以后就是存储到本地目录
ROW FORMAT row_format: 行的格式化
STORED AS file_format: 文件的格式化
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
select_statement: select .... where....
INSERT (OVERWRITE|INTO) TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
参数解释:
OVERWRITE|INTO: 选择overwrite会覆盖全表
PARTITION (partcol1=val1, partcol2=val2 ...):选择分区信息
IF NOT EXISTS:哈哈哈哈哈
select_statement1 FROM from_statement:查询语句
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
参数不解释..
作用: 如果每次插入分区数据都要手动设定分区类型,操作实在非常麻烦,在设置动态分区以后,就可以根据字段值自动创建分区.在动态插入分区前,需要设置以下参数.
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
Hive里面的数据更新都是以列为基本单位,不推荐使用
DELETE FROM tablename [WHERE expression]
Hive里面的数据删除都是以表为基本单位,不推荐使用
SELECT ... FROM .... TABLESAMPLE (BUCKET bucketId OUT OF bucketNum [ON colName])
参数解释:
bucketId:抽取的bucket的编号
bucketNum:表示bucket的数量
ON colName: 列名,也可以用.rand()随机列
SELECT ... FROM .... TABLESAMPLE(num percent);
SELECT ... FROM .... TABLESAMPLE(size);
SELECT ... FROM .... TABLESAMPLE(num row);