Hive 分区表 动态分区开启 创建 hive.exec.dynamic.partition=true;

 

1.分区表

还可以创建二级分区: PARTITIONED BY (event_month string, step string)
创建分区表:
CREATE TABLE order_created_partition (
    order_number STRING
  , event_time  STRING
)
PARTITIONED BY (event_month string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
分区类型:
  • 静态分区:加载数据到指定的分区
  • 动态分区:数据未知,根据分区的值确定创建分区
 

2.分区表DDL

(1)显示分区
show partitions part_table;
(2)增加分区
增加分区
alter table part_table add partition(country='india');
增加多个分区
alter table part_table add partition(country='india') partition(country='korea');·
增加分区并设置数据
alter table part_table add partition(country='india') location '/user/hive/warehouse/xxx' ;
(3)删除分区
alter table part_table drop partition(country='india');
删除多个分区
alter table part_table drop partition(country='india'),partition(country='korea');

例子:其中store_sales是TPC-DS的事实大表:
alter table store_sales drop partition(ss_sold_date_sk=2451181),partition(ss_sold_date_sk=2451546),partition(ss_sold_date_sk=2452277);
 

3.分区加载数据

(1)从本地/HDFS目录加载
 
load data local inpath '/home/spark/software/data/order_created.txt' overwrite into table order_created_partition PARTITION(event_month='2014-05');
select * from order_created_partition where event_month='2014-05' limit 5;
 
 
(2)手动上传文件到hdfs上,然后将数据添加到分区表指定的分区
创建hdfs目录:在hdfs录:/user/hive/warehouse/order_created_partition目录下创建event_month=2014-06
hadoop fs -mkdir /user/hive/warehouse/order_created_partition/event_month=2014-06

拷贝数据到新创建的目录下,发现查询结果是空的
hadoop fs -put /home/spark/software/data/order_created.txt /user/hive/warehouse/order_created_partition/event_month=2014-06

添加新分区数据到元数据信息中:
msck repair table order_created_partition;
或者
alter table order_created_partition add partition(dt='2014-06');
 

4.动态分区

数据位置,根据分区的值确定创建分区
CREATE TABLE order_created_dynamic_partition (
    order_number STRING
  , event_time  STRING
)
PARTITIONED BY (event_month string);
开启动态分区参数:
 
set hive.exec.dynamic.partition=true; //默认:false
set hive.exec.dynamic.partition.mode = nonstrict; //默认:strict
set hive.exec.dynamic.partitions=10000; //分区最大数:如果按天分区的话设置10000足够了

动态分区不能使用load data加载数据,使用insert into方式加载数据
insert into table order_created_dynamic_partition PARTITION (event_month) 
select orderNumber, event_time, substr(event_time, 1, 7) as event_month from order_created;
常用分区表参数设置:
 <property>   
    <name>hive.exec.dynamic.partition</name>
    <value>true</value>
    <description>Whether or not to allow dynamic partitions in DML/DDL.</description>
  </property>
  <property>
    <name>hive.exec.dynamic.partition.mode</name>
    <value>nonstrict</value>
    <description>
      In strict mode, the user must specify at least one static partition
      in case the user accidentally overwrites all partitions.
      In nonstrict mode all partitions are allowed to be dynamic.
    </description>
  </property>
  <property>
    <name>hive.exec.max.dynamic.partitions</name>
    <value>20000</value>
    <description>Maximum number of dynamic partitions allowed to be created in total.</description>
  </property>
  <property>
    <name>hive.exec.max.dynamic.partitions.pernode</name>
    <value>10000</value>
    <description>Maximum number of dynamic partitions allowed to be created in each mapper/reducer node.</description>
  </property>
  <property>
    <name>hive.exec.max.created.files</name>
    <value>100000</value>
    <description>Maximum number of HDFS files created by all mappers/reducers in a MapReduce job.</description>
  </property>

 

 
 
  • 1
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值