本文转自:http://www.geedoo.info/hive-dynamic-partitions-and-static-partitioning.html
参考:http://www.crazyant.net/1197.html
参考:http://www.crazyant.net/1197.html
HIVE默认是静态分区。但是有时候可能需要动态创建不同的分区来区分不同的分类。
HIVE中创建分区表没有什么复杂的分区类型(范围分区、列表分区、hash分区、混合分区等)。分区列也不是表中的一个实际的字段,而是一个或者多个伪列。意思是说在表的数据文件中实际上并不保存分区列的信息与数据。
工作中使用动态分区的例子:
因为要给一个已经创建的表增加分区字段(原表没有分区或者需要多增加几个分区,好多文档的增加分区都默认是增加分区字段的值),而从文档中没有增加分区字段的内容,所以只好做表数据迁移了。
下面是表的SQL语句:
旧表结构
CREATE EXTERNAL TABLE db_stat.beauty_day_imei(imei string,mid string,f string,v string,s string,hid string,ip string,openudid string,tuid string,
count
int,active int)
PARTITIONED BY (time string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY
'`'
STORED AS TEXTFILE;
|
新表结构
CREATE EXTERNAL TABLE db_stat.beauty_day_imei2(imei string,mid string,f string,v string,s string,hid string,ip string,openudid string,tuid string,
count
int,active int)
PARTITIONED BY (type string,time string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY
'`'
STORED AS TEXTFILE;
|
从旧表转移数据到新表
insert overwrite table db_stat.beauty_day_imei2 partition(type=
'base'
,time)
select * from db_stat.beauty_day_imei ;
|
注意上面的time字段。在旧表(db_stat.beauty_day_imei)中time是分区字段,在新表(db_stat.beauty_day_imei2)中type和time是分区字段。time数据直接从旧表中取得,自动作为新表time的分区。而实质存储的数据中没用type和time字段数据。
旧表结构
CREATE EXTERNAL TABLE db_stat.beauty_all_imei(imei string,clicks int,regisday int,modday int,userday int)
PARTITIONED BY ( time string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY
'`'
STORED AS RCFILE;
|
新表结构
CREATE EXTERNAL TABLE db_stat.beauty_all_imei2(imei string,clicks int,regisday int,modday int,userday int)
PARTITIONED BY (type string, pt string , time string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY
'`'
STORED AS RCFILE;
|
从旧表迁移数据到新表
insert overwrite table db_stat.beauty_all_imei2 partition(type=
'base'
,pt=
'android'
, time)
select * from db_stat.beauty_all_imei ;
|
参考:
http://blog.csdn.net/jiedushi/article/details/7356015
http://blog.csdn.net/iquicksandi/article/details/8535956