hive 分区表静态动态分区

塞上江南o

已于 2024-03-08 23:29:00 修改

阅读量4.2k

点赞数 6

分类专栏： Hive 文章标签： hive

于 2019-09-22 14:51:20 首次发布

本文链接：https://blog.csdn.net/qq_43192537/article/details/101157026

版权

Hive 专栏收录该内容

46 篇文章 4 订阅

订阅专栏

hive outline

链接

分区表的好处

避免查询时全表扫描

hive 静态分区

静态分区指的是分区字段值在加载数据时，是由用户手动指定的

创建分区表

（1）关键语句：partitioned by (field string)

（2）分区字段不能是表中已经存在的字段
（3）分区字段是虚拟字段，其数据并不存储在底层的文件中
（4）分区不同，对应数据存放的文件夹不同

create table dept_partition(
 deptno int,
 dname string, 
 loc string
)
partitioned by (month string) -- month 字段不能是表中已经存在的字段
row format delimited fields terminated by '\t';

手动加载数据到分区表中（load data）

load data local inpath '/opt/modules/input/dept.txt' into table dept_partition partition(month='2021-02-11');  
-- 还可以通过 insert + select 的方式，向表中加载数据，参考<查询旧分区数据，插入到新分区>

在这里插入图片描述

查询旧分区数据，插入到新分区（insert+select）

insert overwrite table dept_partition partition (month = '2021-02-12')
select deptno,
       dname,
       loc
from dept_partition
where month = '2021-02-11';

在这里插入图片描述

查询有哪些分区

-- dept_partition是分区表表名
show partitions dept_partition;

在这里插入图片描述

查询分区表中数据

单分区查询

select * from dept_partition where month='2021-02-11';

多分区查询

select * from dept_partition where month='2021-02-11'
union 
select * from dept_partition where month='2021-02-12';

增加分区

alter table dept_partition add  partition(month='2021-02-13');

手动增加分区后，此时新增加的分区没有数据，如果想要让分区有数据，有2种方案

使用（load data）或者（insert+select）【上边有】
使用以下修复分区的方式

hive msck 修复分区

1.创建分区目录

dfs -mkdir -p /user/hive/warehouse/dept_partition/month=2021-02-21;

/user/hive/warehouse是我的hive的仓库路径

dfs -put /opt/modules/input/dept.txt  /user/hive/warehouse/dept_partition/month=2021-02-21;

查询数据

select * from dept_partition where month='2021-02-21';

结果：查询不到数据

修复关系

msck repair table dept_partition;

再次查询数据

删除分区数据

删除单个分区

-- 这将删除该分区的数据和元数据
alter table dept_partition drop  partition(month='2021-02-16');

删除多个分区

alter table dept_partition 
drop
partition ( month = '2021-02-11' ),
partition(month = '2021-02-12');

或者

alter table dept_partition drop partition ( month <= '2021-02-12' );

重命名分区表名

-- dept_partition 旧表名 ---> dept_partition2 是新表名
alter table dept_partition rename to dept_partition2;

更改分区文件存储格式

alter table table_name partition (dt='2008-08-09') set fileformat file_format;

更改分区位置

alter table table_name partition (dt='2008-08-09') set location "new location";

hive中的2级分区表的创建

2级分区表简介：

多级分区表实际就是多级文件夹，一级分区是一级文件夹，二级分区时二级文件夹

创建二级分区表

1．sql

create table dept_partition2
(
    deptno int,
    dname  string,
    loc    string
)
    partitioned by (month string, day string)
    row format delimited fields terminated by '\t';

2．加载数据到分区表中

load data local inpath'/opt/modules/input/dept.txt' into table dept_partition2 partition (month='2021-03-11',day='01');

查询多级分区表的数据

select * from dept_partition2 where month='2021-03-11' and day='01';

hive 动态分区

动态分区指的是分区的字段值是基于查询结果自动推断出来的。要借助核心语法（insert+select）

使用动态分区前，必须做以下2步

1）开启动态分区功能（默认 true，开启）

hive (default)> set hive.exec.dynamic.partition=true;

2）设置为非严格模式

动态分区的模式，默认 strict，表示必须指定至少一个分区为静态分区
nonstrict 模式表示允许所有的分区字段都可以使用动态分区

hive (default)>set hive.exec.dynamic.partition.mode=nonstrict;

1、创建分区表

和创建静态分区表的方式一样

create table dept_par
(
    deptno string, -- 部门编号
    dname  string -- 部门名
)
    partitioned by (change_dt string)  -- 部门改头换面日期
    row format delimited fields terminated by '\t';

2、从表dept 中查询数据，动态插入到表dept_par 的不同分区中

insert into table dept_par partition (change_dt)
select deptno ,
       dname  ,
       change_dt 
from dept;

① 从表dept 中查询出列deptno，列dname 中的数据，然后插入到表dept_par 中
② 再从表dept 中查询出列change_dt，让这一列作为表dept_par 的分区字段

塞上江南o

关注

6
点赞
踩
29

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

hive 分区表 静态 动态分区

目录

hive outline

分区表的好处

hive 静态分区

创建分区表

手动加载数据到分区表中（load data）

查询旧分区数据，插入到新分区（insert+select）

查询有哪些分区

查询分区表中数据

增加分区

hive msck 修复分区

删除分区数据

重命名分区表名

更改分区文件存储格式

更改分区位置

hive中的2级分区表的创建

2级分区表简介：

创建二级分区表

查询多级分区表的数据

hive 动态分区

hive 分区表静态动态分区