hive命令深入理解---------分区表和分桶表

最新推荐文章于 2023-11-19 12:48:20 发布

Crazy_Clown

最新推荐文章于 2023-11-19 12:48:20 发布

阅读量821

点赞数 1

分类专栏：大数据文章标签： hive分区和分桶

本文链接：https://blog.csdn.net/Crazy_Clown/article/details/93137965

版权

大数据专栏收录该内容

11 篇文章 0 订阅

订阅专栏

一、导入数据操作

1.load data local inpath ‘本地路径’ into table 数据库名.表名; （直接表名好像也可以）
将本地的数据导入到hive仓库中

例如： load data local inpath ‘/root/test.txt’ into table t.user01;

2.load data inpath ‘hdfs://本机名:端口号及路径’ into table 数据库名.表名;
从hdfs集群中导入到hive仓库中

例如：load data inpath ‘hdfs://node01:9000/user/test.txt’ into table t.user01;

3.load data命令，可分为load data local inpath和load data inpath。两者的区别在于local导入的是本地文件而不加local的导入的是HDFS文件—相当于直接将文件进行相应的上传

4.insert into--------添加数据，适合内、外部表，不适合分区

二、hive分区partition（分成不同的文件目录进行存储）

- 静态分区

1.必须在表定义的时候先去指定对应的分区字段（分区字段一定不能与表中的字段重复）。

2.单分区建表

create table user01(
id int,
name string
)
partitioned by (one int);
上传数据：load data local inpath '/root/test.txt' into table t.user01 partition(one=10);

单分区表，按one分区，在表结构中存在id,name,one三列，以one为文件呀区分。

3.双分区建表

创建表
create table user02(
id int,
name string
)
partitioned by (one int,sex string);
上传数据：load data local inpath '/root/test2.txt' into table t.user02 partition(one=10,sex='man');

双分区表，按one和sex分区，在表结构中存在id,name,one,sex四列，先以one为文件呀区分，在以sex子文件夹区分。

- 动态分区

1.修改权限

方法一：修改配置conf/hive-site.xml(彻底)
方法二：在hive内部使用set命令，hive.exec.dynamic.partiton=true //开启动态分区
set hive.exec.dynamic.partiton.mode=nostrict //默认strict。至少有一个静态分区
方法三：hive启动的时候设置（和方法二相似）set换成hive --hiveconf

2.双分区创建
我person.txt文档中的数据格式
在这里插入图片描述

创建表
create table person01(
id int,
name string,
hobbys array<String>,
address map<string ,string>
)
partitioned by (age int ,sex string)
row format delimited 
fields terminated by ','
collention items terminated by '-'
map keys terminated by ':'
lines terminated by '\n'

上传数据：load data local inpath '/root/person.txt' into table t.person01 partition(age=18,sex='man');

克隆一个与上面表架构一样的表
create table person02 like person01;(表中无数据)

写入数据
from person01                                    
insert overwrite table from pserson02                  
partition (age,sex)                       
select * distribute by age,sex            
sort by (id asc);                   //这是一条语句，这样就能添加数据，完成操作


***一个表写入另一个表中数据操作***
from table_name1                                       //已经存在的表格并且要有数据
insert overwrite table table_name2                    //overwrite代表覆盖
partition (分区一,分区二...)                           //分区
select * distribute by 分区一,分区二,...              //distribute by进行划分
sort by (表中字段,排序类型);                          //进行排序（可写可不写，看业务是否需要）
注：这是一条语句。

- 分区操作

添加分区
alter table 表名 add partition(分区一,分区二,…);
也就是说添加分区的时候不能直接添加，而是需要将原来的分区也要包含其中，完成相应的排序

删除分区
alter table 表名drop partition (分区一,分区二,…)
删除分区的时候，会将所有存在的分区都删除

三、hive分桶cluster（分成几个桶进行存储）

我的文档数据
在这里插入图片描述
1.开启分桶

set hive.enforce.bucketing=true

2.创建桶

创建表
create table psnbucket (
id int,
name string,
age int
)
clustered by (id) into 4 buckets
row format delimited 
fields terminated by ',';

克隆一个与上面表架构一样的表
create table psnbucket02 like psnbucket;(表中无数据)

加载数据
insert into  table psnbucket select * from psnbucket02;

抽样
select * from psnbucket tablesample(bucket 1 out of 4 on id);   //查看第一个桶里的数据
select * from psnbucket tablesample(bucket 2 out of 4 on id);   //查看第二个桶里的数据
select * from psnbucket tablesample(bucket 3 out of 4 on id);   //查看第三个桶里的数据
select * from psnbucket tablesample(bucket 4 out of 4 on id);   //查看第四个桶里的数据

在这里插入图片描述

Crazy_Clown

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
hive命令深入理解---------分区表和分桶表

一、导入数据操作1.load data local inpath ‘本地路径’ into table 数据库名.表名; （直接表名好像也可以）将本地的数据导入到hive仓库中例如： load data local inpath ‘/root/test.txt’ into table t.user01;2.load data inpath ‘hdfs://本机名:端口号及路径’ in...
复制链接

扫一扫