hive的使用

最新推荐文章于 2024-07-24 21:14:44 发布

quwei1114

最新推荐文章于 2024-07-24 21:14:44 发布

阅读量1.4k

点赞数 2

文章标签： hive

本文链接：https://blog.csdn.net/quwei1114/article/details/123811806

版权

一、表的创建（外部表和内部表）

1、内部表与外部表的区别

外部表：hive中删除外部表时，数据不会被删除，即hive不会控制外部表数据的生命周期

内部表（管理表）：hive中删除内部表时，数据会被删除掉，即hive会控制内部表数据的生命周期。

查询表的详情

desc formatted student5;

删除表：drop table student;

对于一个公司来说，数据非常重要，原始数据会放在外部表还是管理表中？外部表

2、创建表的方式（三种）

方式一：（普通创建表）（最常用的一种）

create [external] table student(

id int,

name string

)

row format delimited fields terminated by '\t'

stored as textfile //存储类型

location '/user/zkpk/input'; //存储位置

方式二：（根据查询结果创建表）关键字：as select

create table student2 as select id,name from student;

方式三：（复制表的结构，不复制数据）关键字：like

create table student3 like student;

二、分区表

1、创建分区表,关键字：partitioned by

语法：create table student(

id int,

name string

)

partitioned by(month string)

row format delimited fields terminated by '\t';

2、分区表中导入数据

load data local inpath '/home/zkpk/datas/student.txt' into table student partition(month='202203');

3、查询分区表

select * from student where month='202203';

联合查询（查询二月份和三月份的数据）关键字：union

select * fron student where month='202202'

union select * from student where month='202203';

4、创建分区表的意义：减少查询数据，提高查询效率。

2022年3月3日内容：

一、分区表的操作

5、修改分区表（增加分区），关键字：alter...add

增加一个分区：

alter table dept_partition add partition(month='202204');

增加多个分区，分区间用空格隔开

alter table dept_partition add partition(month='202205') partition(month='202206');

6、删除分区，关键字：alter...drop

删除一个分区

alter table dept_partition drop partition(month='202206');

删除多个分区，删除的分区间用逗号隔开

alter table dept_partition drop partition(month='202204'),partition(month='202205');

7、查询分区

show partitions dept_partition;

二、创建二级分区：

create table dept_partition2(

deptno int,

dname string,

loc string

)

partitioned by(month string,day string)

row format delimited fields terminated by '\t';

二级分区表中导入数据

load data local inpath '/home/zkpk/datas/dept.txt' into table dept_partition2

partition(month='202203',day='02');

三、分区表关联数据（三种）

方式一：上传数据后修复分区 msck repair table +表名

1、创建目录

hive(default)>dfs -mkdir -p /user/hive/warehouse/student2/month=202203/day=03;

2、hive中创建表student2

hive(default)>create table student2(

id int,

name string

)

partitioned by(month string,day string)

row format delimited fields terminated by '\t';

3、上传数据

hive(default)>dfs -put /home/zkpk/datas/student.txt /user/hive/warehouse/student2/month=202203/day=03;

4、查询数据(查不到)

hive(default)>select * from student2;

5、修复分区

hive(default)>msck repair table student2;

方式二：上传数据后添加分区 alter table +表名+add partition(month string)

1、创建目录

[zkpk@master ~]$hadoop fs -mkdir -p /user/hive/warehouse/dept_partition3/month=202203/day=03

2、hive中创建表dept_partition3

hive(default)>create table dept_partition3(

deptno int,

dname string,

loc string

)

partitioned by(month string,day string)

row format delimited fields terminated by '\t';

3、上传数据

[zkpk@master ~]$hadoop fs -put /home/zkpk/datas/dept.txt /user/hive/warehouse/dept_partition3/month=202203/day=03

4、查询数据，查不到<

最低0.47元/天解锁文章

quwei1114

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
hive的使用

一、表的创建（外部表和内部表）1、内部表与外部表的区别外部表：hive中删除外部表时，数据不会被删除，即hive不会控制外部表数据的生命周期内部表（管理表）：hive中删除内部表时，数据会被删除掉，即hive会控制内部表数据的生命周期。查询表的详情desc formatted student5;删除表：drop table student;对于一个公司来说，数据非常重要，原始数据会放在外部表还是管理表中？外部表2、创建表的方式（三种）方式一：（普通创建表）（最
复制链接

扫一扫