《hive编程指南》阅读笔记摘要（四）

最新推荐文章于 2020-10-16 13:28:30 发布

will的成长之路

最新推荐文章于 2020-10-16 13:28:30 发布

阅读量984

点赞数

分类专栏：大数据文章标签： hive权威指南笔记 hive hadoop

本文链接：https://blog.csdn.net/matthewei6/article/details/65029043

版权

大数据专栏收录该内容

28 篇文章 1 订阅

订阅专栏

第5章 HiveQL：数据操作
既然hive没有行级别的增删改操作，那么往表中装载数据的唯一途径就是“大量”的数据装载操作，或者通过其他方式仅仅将文件写到正确的目录下。
一、装载数据
LOAD DATA LOCAL INPATH '${env:HOME}/california-employees'
OVERWRITE INTO TABLE employees
PARTITION (country='US', state='CA');
1、如果分区目录不存在，会先创建分区目录，然后再将数据拷贝到该目录下；
2、如果目标表不是分区表，省略partition子句；
3、load data local...是拷贝本地目录中的数据到hdfs上的目标位置，load data...是转移hdfs上的数据到目标位置（inpath后的路径就是hdfs上的路径），目标位置都是一个hdfs目录。
4、overwrite关键字表示先删除目标文件夹中已经存在的数据，然后再拷贝新数据进来。如果不使用这个关键字，就不会删除已有的数据，即使有同名文件，会使用文件名_序列号的方式拷贝。
5、inpath子句中的文件路径下，不能再有文件夹。
二、通过查询语句向表中插入数据
insert overwrite/into table employees
partition (country='US', state='CA')
select * from staged_employees se
where se.cnty='US' and se.st='OR';
1、overwrite关键字表示先删除目标目录下已有的数据。
2、into关键字表示追加数据，不会删除数据；
3、 如果要向50个州分区中插入数据，就要执行50次操作，就会对staged_employees 表扫描50次；

from staged_employees se
insert overwrite/into table employees
    partition (country='US', state='OR')
    select * where se.cnty='US' and se.st='OR'
insert overwrite/into table employees1
    partition (country='US', state='CA')
    select * where se.cnty='US' and se.st='CA'
insert overwrite/into table employees2
    partition (country='US', state='IL')
    select * where se.cnty='US' and se.st='IL'

1、以上操作只会对staged_employees表扫描一次；
2、这些并列的insert子句不是非此即彼的关系，所以一条数据可能同时满足多个insert从而被插入表中；

动态分区：基于查询参数推断出需要创建的分区名称，自动创建分区。
insert overwrite/into table employees
partition (country,state)
select ......,se.cnty,se.st
from stated_employees se;
根据位置匹配分区字段：country,state匹配select子句的最后两个字段

混合使用动态和静态分区
insert overwrite/into table employees
partition (country = 'US',state)
select ......,se.cnty,se.st
from stated_employees se
where se.cnty='US';
country是静态分区，state是动态分区

动态分区功能默认是关闭的，开启后，默认是strict模式执行（避免因设计错误导致生成大量分区）：至少有一列分区是静态的，即必须是动态和静态混合使用。

三、通过查询语句创建表，同时插入数据
create table ca_employees
as select name,salary,address
from employees
where se.state = 'CA';
主要应用场景是从一个大的宽表中选取部分需要的数据。
这个功能不能用于外表：外表本身就可以直接location到不同的文件。

四、导出数据
1、完整的拷贝文件
hadoop fs -cp source_path target_path
2、导出部分字段
insert overwrite/into [local] directory '/.....'
select name,salary,address
from employees
where state='CA';
3、类似导入到多个表，导出数据也可以导出到多个文件
from staged_employees se
insert overwrite/into directory '/...../or_employees'
    select * where se.cty='US' and se.st='OR'
insert overwrite/into directory '/...../ca_employees'
    select * where se.cty='US' and se.st='CA'
insert overwrite/into directory '/...../il_employees'

select * where se.cty='US' and se.st='IL';

--------------------------
微信公众号：IT人成长关注
大数据技术QQ群：485681776

will的成长之路

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
《hive编程指南》阅读笔记摘要（四）

第5章 HiveQL：数据操作既然hive没有行级别的增删改操作，那么往表中装载数据的唯一途径就是“大量”的数据装载操作，或者通过其他方式仅仅将文件写到正确的目录下。一、装载数据LOAD DATA LOCAL INPATH '${env:HOME}/california-employees' OVERWRITE INTO TABLE employeesPARTITION (co
复制链接

扫一扫