QQ交流群:335671559
1、Loading Data into Managed Tables 加载数据
本地数据:
LOAD DATA LOCAL INPATH '${env:HOME}/california-employees' //一般是目录
OVERWRITE INTO TABLE employees
PARTITION (country = 'US', state = 'CA');
如果输入目录下的文件名和表employees下已有的文件名一致,则该文件不会被覆盖,而是把新加载的文件名改掉,如下所示
hive中表test目录下有awk文件![]()
从本地加载一个awk文件![]()
hive中表test目录下有两个文件,awk、awk_copy_1![]()
![]()
hdfs数据
LOAD DATA INPATH
'${env:HOME}/california-employees' //是hdfs目录
OVERWRITE INTO TABLE employees
PARTITION (country = 'US', state = 'CA');
2、Inserting Data into Tables from Queries 用查询插入数据
(1) INSERT OVERWRITE TABLE employees
PARTITION (country = 'US', state = 'OR')
SELECT * FROM staged_employees se
WHERE se.cnty = 'US' AND se.st = 'OR';
(2)
只扫描一次原始数据,完成插入到不同的分区或者表
FROM staged_employees se
INSERT OVERWRITE TABLE employeesPARTITION (country = 'US', state = 'OR')SELECT * WHERE se.cnty = 'US' AND se.st = 'OR'INSERT OVERWRITE TABLE employeesPARTITION (country = 'US', state = 'CA')SELECT * WHERE se.cnty = 'US' AND se.st = 'CA'INSERT OVERWRITE TABLE employeesPARTITION (country = 'US', state = 'IL')SELECT * WHERE se.cnty = 'US' AND se.st = 'IL';
(3)Dynamic Partition Inserts:
动态添加分区,不需要为不同的分区添加不同的插入语句
(2)中的插入语句就可以改为如下方式:
INSERT OVERWRITE TABLE employees
3、Creating Tables and Loading Them in One Query使用一个查询创建表同时加载数据PARTITION (country, state)SELECT ..., se.cnty, se.stFROM staged_employees se;或者INSERT OVERWRITE TABLE employeesPARTITION (country = 'US', state)SELECT ..., se.cnty, se.stFROM staged_employees seWHERE se.cnty = 'US';注意:The static partition keys(country ) must come before the dynamic partition keys(state)![]()
![]()
CREATE TABLE ca_employees
AS SELECT name, salary, addressFROM employeesWHERE se.state = 'CA';This feature can’t be used with external tables.
4、Exporting Data导出数据
(1)如果hive表里面的数据和结构都需要,则可以直接复制文件导出hive数据
hadoop fs -cp source_path target_path
(2)可以通过
INSERT … DIRECTORY方式
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
SELECT name, salary, addressFROM employeesWHERE se.state = 'CA';