Hive Data Manipulation Language
Hive数据操作语言
Loading files into tables
Hive does not do any transformation while loading data into tables. Load operations are currently pure copy/move operations that move datafiles into locations corresponding to Hive tables.
在将数据加载到表中时,Hive不会进行任何转换。加载操作当前是纯复制/移动操作,它将数据文件移动到与Hive表对应的位置。
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
例:
LOCAL:本地系统,如果没有local那么就是指的是HDFS的路径
OVERWRTITE:是否数据覆盖,如果没有那么就是数据追加
LOAD DATA LOCAL INPATH '/home/haoop/data/emp.txt' OVERWRITE INTO TABLE emp;
LOAD DATA INPATH 'hdfs://hadoop000:8020/data/emp.txt' INTO TABLE emp;
LOAD DATA LOCAL INPATH ''/home/haoop/data/emp.txt' OVERWRITE INTO TABLE emp;
filepath可以是:
-
相对路径,如 project/data1
-
一条绝对的道路,如 /user/hive/project/data1
-
带有scheme的完整URI和(可选)权限,例如 hdfs://namenode:9000/user/hive/project/data1
-
如果指定了关键字LOCAL,则:
load命令将在本地文件系统中查找filepath。如果指定了相对路径,则将相对于用户的当前工作目录进行解释。用户也可以为本地文件指定完整的URI
- 例如:file:///user/hive/project/data1 load命令将尝试将filepath所寻址的所有文件复制到目标文件系统。通过查看表的location属性来推断目标文件系统。然后,复制的数据文件将移动到表中。
注意:如果对HiveServer2实例运行此命令,则本地路径引用HiveServer2实例上的路径。HiveServer2必须具有访问该文件的适当权限。 -
如果未指定关键字LOCAL ,则Hive将使用filepath的完整URI(如果指定了一个),或者将应用以下规则:
如果未指定scheme或authority,Hive将使用fs.default.name指定Namenode URI
的hadoop配置变量中的方案和权限。 如果路径不是绝对的,那么Hive将相对于它解释它 /user/
Hive会将filepath所寻址的文件移动到表(或分区)中 -
如果使用OVERWRITE关键字,则将删除目标表(或分区)的内容,并替换为filepath引用的文件;
否则filepath引用的文件将被添加到表中。
Inserting data into Hive Tables from queries
Query Results can be inserted into tables by using the insert clause.
Standard syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
create table emp1 as select * from emp;
create table emp2 as select empno,ename,job,deptno from emp;
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/hive/' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select empno,ename,sal,deptno from emp;
总结:hive处理数据非常慢。只是适合离线处理数据。