一、文件数据插入表(Loading files into tables)
只是复制/移动,不做任何变换
1.1 语法(Syntax)
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
1.2 概述(Synopsis)
- filepath:文件路径(文件或者文件夹)
- 相对路径: project/data1
- 绝对路径:/user/hive/project/data1
- URI: hdfs://namenode:9000/user/hive/project/data1
- target:目标表
- 非分区表
- 分区表的话必须指定某一个分区
- 有LOCAL:本地路径模式(文件或文件夹)
- 相对路径,相对于当前工作目录
- 绝对路径
- URI: file:///user/hive/project/data1
- 有LOCAL是copy,没LOCAL是move
- 无LOCAL:
- 全URI:这个好说
- hdfs还是了local参考Namenode下配置文件fs.default.name的URI
- 相对路径:相对于/user/
- 从以上两部组成的地址中move数据
- OVERWRITE:覆盖写
- 有OVERWRITE覆盖写,没有OVERWRITE追加写
1.3 注意(Notes)
- filepath不允许包含子目录
- 没有LOCAL时,数据和表应该位于同一个文件系统
二、查询结果插入表(Inserting data into Hive Tables from queries)
2.1 语法(Syntax)
标准语法:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
2.2 概述(Synopsis)
- INSERT OVERWRITE:覆盖表或分区
- INSERT INTO:追加表或分区
- IF NOT EXISTS:避免错误
- OutputFormat:按照表的定义
- PARTITION:分区表必须指定分区
- Multiple insert:一次扫描多表插入
2.3 动态分区(Dynamic Partition)
FROM page_view_stg pvs
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.cnt
- 插入时不指定分区的具体值,而是指定一个列
- 默认不支持动态分区的插入操作,需要hive.exec.dynamic.partition
三、查询数据到文件(Writing data into the filesystem from queries)
3.1 语法(Syntax)
标准语法:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
SELECT ... FROM ...
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)
3.2 概述(Synopsis)
- DIRECTORY:可以是URI,不是的参考Namenode下配置文件fs.default.name的URI
- LOCAL:使用本地文件系统
- row_format:默认^A分隔,也可也可自己指定
四、插入值到表(Inserting values into tables from SQL)
4.1 语法(Syntax)
标准语法:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
Where values_row is:
( value [, value ...] )
where a value is either null or any valid SQL literal
4.2 概述(Synopsis)
- 不支持部分列插入
- 不支持复杂类型插入
- 分区表:静态用时不用插入分区列,动态用时需要插入分区列
4.3 举例
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
INSERT INTO TABLE students
VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
CREATE TABLE pageviews (userid VARCHAR(64), link STRING, came_from STRING)
PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;
INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')
VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null);
INSERT INTO TABLE pageviews PARTITION (datestamp)
VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');
五、升级(Update)
5.1 语法(Syntax)
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
5.2 概述(Synopsis)
- 分区表不支持升级
- 分桶表不支持升级
- 只删除where语句匹配的行
六、删除(DELETE)
6.1 语法(Syntax)
DELETE FROM tablename [WHERE expression]
6.2 概述(Synopsis)
- 只删除where语句匹配的行