http://hugh-wangp.iteye.com/blog/1612268
http://blog.csdn.net/opensure/article/details/46537969
- 使用静态分区,创建分区表p_test:
在创建分区表之前需要去人参数是否如下:
0: jdbc:hive2://10.8.1.2:6611/> set hive.exec.dynamic.partition.mode; +------------------------------------------+--+ | set | +------------------------------------------+--+ | hive.exec.dynamic.partition.mode=strict | +------------------------------------------+--+ 1 row selected (0.013 seconds) 0: jdbc:hive2://10.8.1.2:6611/> set hive.exec.dynamic.partition; +-----------------------------------+--+ | set | +-----------------------------------+--+ | hive.exec.dynamic.partition=true | +-----------------------------------+--+
如果不是需要修改参数,创建分区表
create table p_test(id string,name string) partitioned by (p_day string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
添加静态分区
alter table p_test add partition (p_day='20170619');
查看分区表是否创建好,及存储目录
0: jdbc:hive2://10.78.152.62:21066/> show partitions p_test; +-----------------+--+ | partition | +-----------------+--+ | p_day=20170619 | +-----------------+--+ 1 row selected (0.143 seconds) 0: jdbc:hive2://10.78.152.62:21066/> [tt@jh01 service]$ hadoop fs -ls /jrc/jrc_hive_db/p_test 17/06/20 21:27:44 INFO hdfs.PeerCache: SocketCache disabled. Found 1 items drwxrwx---+ - jrc jrc_group 0 2017-06-20 21:24 /jrc/jrc_hive_db/p_test/p_day=20170619
指定分区插入已经存在的分区中数据:
insert into p_test partition(p_day='20170619') select '1' as id,'name1' as name;
指定不存在的分区,并插入数据:
0: jdbc:hive2://10.8.1.2:6611/> insert into p_test partition(p_day='20170620') 0: jdbc:hive2://10.8.1.2:6611/> select '1' as id,'name1' as name; 0: jdbc:hive2://10.8.1.2:6611/> select * from p_test; +------------+--------------+---------------+--+ | p_test.id | p_test.name | p_test.p_day | +------------+--------------+---------------+--+ | 1 | name1 | 20170619 | | 1 | name1 | 20170620 | +------------+--------------+---------------+--+ 2 rows selected (1.8 seconds) 0: jdbc:hive2://10.8.1.2:6611/> show partitions p_test; +-----------------+--+ | partition | +-----------------+--+ | p_day=20170619 | | p_day=20170620 | +-----------------+--+ 2 rows selected (0.281 seconds) 0: jdbc:hive2://10.8.1.2:6611/>
动态分区插入:
0: jdbc:hive2://10.8.1.2:6611/> insert into p_test partition(p_day) 0: jdbc:hive2://10.8.1.2:6611/> select '3' as id,'name3' as name,'20170621' as p_day; Error: Error while compiling statement: FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict (state=42000,code=10096)
插入失败,根据错误提示需要修改参数:set hive.exec.dynamic.partition.mode=nonstrict
set hive.exec.dynamic.partition.mode=nonstrict; insert into p_test partition(p_day) select '3' as id,'name3' as name,'20170621' as p_day; 0: jdbc:hive2://10.8.1.2:6611/> select * from p_test; +------------+--------------+---------------+--+ | p_test.id | p_test.name | p_test.p_day | +------------+--------------+---------------+--+ | 1 | name1 | 20170619 | | 1 | name1 | 20170620 | | 3 | name3 | 20170621 | +------------+--------------+---------------+--+
示例二:
已知表mitem表包含了p_ciy,p_hour字段,我们新建一张分区表mitem_partition(分区字段包括:p_city,p_hour),将mitem数据导入mitem_partition.
create table mitem_partition ( id int, ueid string, time timestamp, srp double, srq double, sul double, nid int, nrp double ) partitioned by (p_city string,p_hour string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; set hive.exec.dynamic.partition.mode=nonstrict;
insert into mitem_partition partition(p_city,p_hour) select (cast(split(s_cgi,'-')[2] as int)*256+cast(split(s_cgi,'-')[3] as int)) as id,ueid,time,srp-141 as srp,srq*0.5-20 as srq, sul-11 as sul,(cast(split(n_cgi,'-')[2] as int)*256+cast(split(n_cgi,'-')[3] as int)) as nid,nrp-141 as nrp, p_city,p_hour from mitem where p_city='wuhan' and p_hour>='2017101200' and p_hour<'2017101300'
备注:
1)如果分区表中想删除某个字段就比较费事。