需求场景:hive原始表tbl1,三级分区分别是dt、hour、proj_id,现在要求将dt=‘20180305’下的所有数据导入一张新表tbl2
操作如下:
一、创建新表tbl2
create table tbl2 like tbl1;
二、开启动态分区,指定资源队列
set mapreduce.job.queuename=root.offline.hdp_teu_dpd.normal;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=6000;
set hive.exec.max.dynamic.partitions=60000;
三、将旧表数据导入新表(动态分区,查tbl1最后三个字段,即三个分区字段,会自动作为tbl2的分区字段)
insert overwrite table tbl2 partition(dt,hour,proj_id) select * from tbl1 where dt='20180305' and hour='01' and proj_id='3289916654594';
ps:要求导入dt='20180305'和proj_id='3289916654594',然后hour='00'~'19'的数据,写一个执行脚本即可
#!/bin/bash
for((hour='00';hour<='19';hour++))
do
hive -e
"insert overwrite table tbl2 partition(dt,hour,proj_id) select * from tbl1 where dt='20180305' and hour=$hour and proj_id='3289916654594';"
done