hive------分区表-CSDN博客

本文链接：https://blog.csdn.net/luoyunfan6/article/details/98877346

本文探讨了在Hive中建立分区表的原因，主要是为了避免全表扫描。对比了Hive分区与MySQL分区的主要区别，Hive分区使用的是表外字段，而MySQL使用表内字段。详细介绍了Hive分区的特性，包括分区字段是伪字段，每个分区以目录形式存在。此外，文章还讨论了一级和二级分区，包括静态、动态和混合分区的概念，并提供了动态分区的创建数据表、分区表及加载数据的实例，同时提到了与动态分区相关的配置属性。

摘要由CSDN通过智能技术生成

建立分区表的原因： 单表数据量随着时间越来越大。为了避免全表扫描，引入分区。

hive分区和mysql分区表的区别？？

hive分区使用表外字段，mysql使用表内字段。

hive分区表细节？

1.hive分区的字段是一个伪字段，它不会在表中真实存在，可以用来过滤查询等

2.一个表或者一个分区可以有多个分区，而每一个分区都是以目录的形式存在。

怎么分区？

根据业务，地域，年，月，日，性别等。

关键字：partitioned by

一级分区：

首先使用use databasename; 转到相应的数据库。


//查看分区信息：
    show partitions comm;
//创建分区表
    create table if not exists comm(
	    id int,
	    comment String,
	    dt String
    )
    partitioned by(year String)
    row format delimited fields terminated by '\t'
;

//从本地文件加载数据到comm表的year=2019的分区当中
load data local inpath 'youFilepath' into comm partition(year='2019'); 

//查看分区内容

select * from comm where year = 2019;

分区操作

//增加分区

//创建单个分区：

alter table comm add partition(year='2018');

//同时创建多个分区

alter table comm  add partition(year='2020') partition(year='2017');

//修改分区名字
alter table comm partition(year='2020') rename to partition(year='2016');

//指定分区对应到已有数据：

alter table comm partition(year='2016') set location 'hdfs://xxx/user/hive/warehouse/xxx.db/xxx' 
(这是在hdfs中想对应的文件路径，分区表其实就是hdfs的某个文件)

//查看分区

show partitions comm ;

//删除分区

删除单个分区：alter table comm drop partition(year='2018');

同时删除多个分区：alter table comm drop partition(year='2018'),partition(year='2019');

//查看分区表的结构

desc formatted comm;

二级分区：

//创建
create table if not exists comm(
	id int,
	comment String,
	dt String
)
partitioned by(year String,month String)
row format delimited fields terminated by '\t'
;

静态分区、动态分区、混合分区

静态分区：对分区已经知道，并可以使用load方式加载

动态分区：对于分区未知，同时不能使用load方式加载

混合分区：静态和动态同时有

动态分区案例：

创建数据表：

create table if not exists comm_tmpl(

 	id int,

 	comment String,

 	year String,

 	month String

)

row format delimited fields terminated by '\t'

;
//向分区表加入数据

load data local inpath  ‘ ’ into table comm_tmpl;

创建分区表：

create table if not comm(

 	id int,

 	comment String,

)

partitioned by(year String,month int)

rot format delimited fields terminated by '\t'

;

动态向分区表加入数据：

首先要把hive.exec.dynamic.partition.mode=strict/nostrict 改为nostrict

命令：set hive.exec.dynamic.partition.mode=nostrict

//添加数据

insert into table comm3 partition(year,month)

select id,comment,year,month from comm_tmpl 

;
//命令结束后，会按照comm_tmpl中year和month来对数据进行分区

动态分区的相关属性：

hive.exec.dynamic.partition=true 是否允许动态分区;

hive.exec.dynamic.partition.mode=strict/nostrict

hive. exec .max. dynamic. partitions=1000最大动态分区数量

hive. exec . max.dynamic. partitions.pernode=100 单个节点允许最大分区数量

严格模式下不让执行的语句：

1.笛卡尔积查询
select
c3.*,
c4.*
from comm c4
join   comm1 c3
;

2.分区不带where条件，并且where条件中不带分区字段来过滤

(可以)
select *
from comm
where year = 2016
;
（不可以）
select *
from comm
;

3.排序不带limit
select *
from comm
where year = 2019
order by id desc
limit 2
;