HIVE的四种表

最新推荐文章于 2024-07-13 10:47:35 发布

followmefollowme

最新推荐文章于 2024-07-13 10:47:35 发布

阅读量2.9k

点赞数 44

分类专栏：大数据文章标签： hive

本文链接：https://blog.csdn.net/followmefollowme/article/details/108592687

版权

大数据专栏收录该内容

4 篇文章 0 订阅

订阅专栏

HIVE的四种表

1.内部表
内部表：又称受控表，hive默认创建的表类型为内部表
特点：当表定义被删除的时候（如：drop table stu），表中的数据一并被删除（HDFS数据目录，源数据库表文件一并删除）；
使用场景：多用来存储一些非业务类型数据，如：各省及对应区号表，而每天收集的业务数据尽量不要定义成内部表，这样即使表删除了，数据还在。
语法如下：
create table tableNameA (fieldOne dataType,。。。。。)
row format delimited
fields terminated by ‘字段间隔符号’;
应用：创建内部表，并加载数据：
create table order2 (id int,name string,totalvalue,double)row format delimited fields terminated by’,’;
//命令行插入
insert into order2(id,name,totalvalue) values (1.‘zhangsan’,3000.35);
//加载本地文件
load data local inpath ‘/opt/mysoft/order’ into table order2;
//加载HDFS文件
load data inpath ‘hdfs:qiku2:9000/order’ into table order2;

2.外部表
外部表：只是对hdfs上相应文件的一个引用，当删除表定义时（drop table stu），表中的数据依然存在。
特点：当表定义被删除时，数据仍存在。
使用场景：多用来存储一些业务类型数据，如：收集的业务数据，即使表删除了，数据还在
语法如下：
create external table tableNameA (fieldOne dataType,。。。。。)
row format delimited
fields terminated by ‘字段间隔符号’
location ‘HDFS目录’;
应用:创建外部表
create external table kong (id int,name string) row format delimited fields terminated by ‘,’ location ‘/guangzhou’;
//插入数据
load data local inpath ‘/opt/mysoft/steu2.txt’ into table kong;
//插入HDFS 文件
load data inpath ‘hdfs:qiku2:9000/order’ into table kong;

3.分区表
分区表:根据插入数据的标识（partitioned）创建不同的目录
使用场景：可以通过分区表，将每天搜集的数据进行区分，查询统计的时候通过指定分区，提高查询效率
语法：
create table order3 (id int,name string)
partitioned by (riqi string)
row format delimited
fields terminated by ‘,’;
应用：创建分区表：
create table order3 (id int,name string) partitioned by (riqi string) row format delimited fields terminated by ‘,’;
//插入数据
load data local inpath ‘/opt/mysoft/3.txt’ into table order3 partition (riqi=‘20191009’);
load data local inpath ‘/opt/mysoft/3.txt’ into table order3 partition (riqi=‘20191008’);
//查看表已经生成了哪些分区
show partitions order3
//手动创建一个分区(空目录)
alter table order3 add partition(riqi=‘20190308’);
//删除一个分区(数据也删掉)
alter table order3 drop partition(riqi=‘20190308’);
//将分区标识作为条件进行查询
select * from order3 where riqi=‘20190409’;
select * from order3 where riqi=‘20190409’ and id=102;

4.桶表
桶表：通过对数据进行哈希取值，然后放到不同文件中存储，每个桶文件 = 数据的哈希值 % 总桶数量。
特点：根据数据哈希值分别放入不同的文件目录中。
使用场景：可以根据每行数据的的哈希值，分别放入不同的目录文件中，多用于数据抽查。
语法：
create table sheng3(id int,name string) clustered by (id) into 5 buckets row format delimited fields terminated by ‘,’;
//clustered by （id） :根据数据中的哪个字段进行分桶
//into 5 buckets 表示共分成5个桶文件，分别存储不同的数据
应用：创建桶表：
//间隔一定要与源表的数据间隔一致
create table sheng3 (id int,name string ) clustered by (id) into 5 buckets row format delimited fields terminated by ‘,’;
插入：
//不能用load data方式，因为不会将数据按要求分配到不同桶文件里，（尽管数据也能插入进去，未按照规则插入）如图：
load data local inpath ‘/opt/mysoft/3.txt’ into table sheng3;
//插入之前，先开启数据分桶开关，否则数据不分桶操作
set hive.enforce.bucketing=true;
insert into table sheng3 select * from kong;