Hive：动静态分区

最新推荐文章于 2024-11-14 00:00:00 发布

weixin_34138255

最新推荐文章于 2024-11-14 00:00:00 发布

阅读量87

点赞数

文章标签：大数据

http://hugh-wangp.iteye.com/blog/1612268

http://blog.csdn.net/opensure/article/details/46537969

使用静态分区，创建分区表p_test：

在创建分区表之前需要去人参数是否如下：

0: jdbc:hive2://10.8.1.2:6611/> set hive.exec.dynamic.partition.mode;
+------------------------------------------+--+
|                   set                    |
+------------------------------------------+--+
| hive.exec.dynamic.partition.mode=strict  |
+------------------------------------------+--+
1 row selected (0.013 seconds)
0: jdbc:hive2://10.8.1.2:6611/> set hive.exec.dynamic.partition;
+-----------------------------------+--+
|                set                |
+-----------------------------------+--+
| hive.exec.dynamic.partition=true  |
+-----------------------------------+--+

如果不是需要修改参数，创建分区表

create table p_test(id string,name string)
partitioned by (p_day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

添加静态分区

alter table p_test add partition (p_day='20170619');

查看分区表是否创建好，及存储目录

0: jdbc:hive2://10.78.152.62:21066/> show partitions p_test;
+-----------------+--+
|    partition    |
+-----------------+--+
| p_day=20170619  |
+-----------------+--+
1 row selected (0.143 seconds)
0: jdbc:hive2://10.78.152.62:21066/> 

[tt@jh01 service]$ hadoop fs -ls /jrc/jrc_hive_db/p_test
17/06/20 21:27:44 INFO hdfs.PeerCache: SocketCache disabled.
Found 1 items
drwxrwx---+  - jrc jrc_group          0 2017-06-20 21:24 /jrc/jrc_hive_db/p_test/p_day=20170619

指定分区插入已经存在的分区中数据：

insert into p_test partition(p_day='20170619')
select '1' as id,'name1' as name;

指定不存在的分区，并插入数据:

0: jdbc:hive2://10.8.1.2:6611/> insert into p_test partition(p_day='20170620')
0: jdbc:hive2://10.8.1.2:6611/> select '1' as id,'name1' as name;
0: jdbc:hive2://10.8.1.2:6611/> select * from p_test;
+------------+--------------+---------------+--+
| p_test.id  | p_test.name  | p_test.p_day  |
+------------+--------------+---------------+--+
| 1          | name1        | 20170619      |
| 1          | name1        | 20170620      |
+------------+--------------+---------------+--+
2 rows selected (1.8 seconds)
0: jdbc:hive2://10.8.1.2:6611/> show partitions p_test;
+-----------------+--+
|    partition    |
+-----------------+--+
| p_day=20170619  |
| p_day=20170620  |
+-----------------+--+
2 rows selected (0.281 seconds)
0: jdbc:hive2://10.8.1.2:6611/>

动态分区插入：

0: jdbc:hive2://10.8.1.2:6611/> insert into p_test partition(p_day)
0: jdbc:hive2://10.8.1.2:6611/> select '3' as id,'name3' as name,'20170621' as p_day;
Error: Error while compiling statement: FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. 
To turn this off set hive.exec.dynamic.partition.mode=nonstrict (state=42000,code=10096)

插入失败，根据错误提示需要修改参数：set hive.exec.dynamic.partition.mode=nonstrict

set hive.exec.dynamic.partition.mode=nonstrict;
insert into p_test partition(p_day)
select '3' as id,'name3' as name,'20170621' as p_day;

0: jdbc:hive2://10.8.1.2:6611/> select * from p_test;
+------------+--------------+---------------+--+
| p_test.id  | p_test.name  | p_test.p_day  |
+------------+--------------+---------------+--+
| 1          | name1        | 20170619      |
| 1          | name1        | 20170620      |
| 3          | name3        | 20170621      |
+------------+--------------+---------------+--+

示例二:

已知表mitem表包含了p_ciy,p_hour字段，我们新建一张分区表mitem_partition（分区字段包括：p_city,p_hour），将mitem数据导入mitem_partition.

create table mitem_partition 
(
 id int,
 ueid string,
 time timestamp,
 srp double,
 srq double,
 sul double,
 nid int,
 nrp double
) partitioned by (p_city string,p_hour string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

set hive.exec.dynamic.partition.mode=nonstrict;
insert into mitem_partition partition(p_city,p_hour)
select (cast(split(s_cgi,'-')[2] as int)*256+cast(split(s_cgi,'-')[3] as int)) as id,ueid,time,srp-141 as srp,srq*0.5-20 as srq,
sul-11 as sul,(cast(split(n_cgi,'-')[2] as int)*256+cast(split(n_cgi,'-')[3] as int)) as nid,nrp-141 as nrp,
p_city,p_hour
from mitem
where p_city='wuhan' and p_hour>='2017101200' and p_hour<'2017101300'