hive的分区(二)

最新推荐文章于 2024-08-28 21:04:12 发布

杨大大慌

最新推荐文章于 2024-08-28 21:04:12 发布

阅读量165

点赞数

分类专栏： hive 文章标签： hive的分区(二)

本文链接：https://blog.csdn.net/e3hhhh/article/details/100714041

版权

hive 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

动态分区的设置

分区的类型
静态分区：加载数据到指定的分区的值
动态分区：数据未知，根据分区的值确定创建分区
混合分区：静态和动态都有
动态分区的设置
hive.exec.dynamic.partition=true/false --是否支持动态分区
hive.exec.dynamic.partition.mode=strict/nostrict --严格/非严格
hive.exec.max.dynamic.partitions=1000 --默认一个
hive.exec.max.dynamic.partitions.pernode=100
严格模式下会阻止以下三种查询：

对分区表查询，where条件中过滤字段不是分区字段
笛卡尔积的join不使用on条件或者where条件
对order by 查询不带limit语句

创建动态分区
create table dy_part1(
id int,
name string
)
partitioned by (dt string)
row format delimited fields terminated by ‘,’
;
. 先创建临时表：

create table temp_part(
id int,
name string,
dt string
)
row format delimited fields terminated by ‘,’;
. 导入数据到临时表：
load data local inpath ‘/hivedata/t4.txt’ into table temp_part;
. 动态加载到表
insert into dy_part1 partition(dt) select id,name,dt from temp_part;

混合分区

建表：

create table dy_part2(
id int,
name string
)
partitioned by (year string,month string,day string)
row format delimited fields terminated by ‘,’
;

创建临时表

  create table temp_part2(
 id int,
 name string,
 year string,
 month string,
 day string
 )
 row format delimited fields terminated by ','
 ;
 导入数据
 insert into dy_part2 partition (year='2018',month,day)
 select id,name,month,day from temp_part2;

分区表注意事项

hive的分区使用的是表外字段，分区字段是一个伪列，但是分区字段是可以做查询过滤。
分区字段不建议使用中文
一般不建议使用动态分区，因为动态分区会使用mapreduce来进行查询数据，如果分区数据过多，导致namenode和resourcemanager的性能瓶颈。所以建议在使用动态分区前尽可能预知分区数量。
分区属性的修改都可以使用修改元数据和hdfs数据内容。
Hive分区和Mysql分区的区别
mysql分区字段用的是表内字段；而hive分区字段采用表外字段。