hive动态分区

最新推荐文章于 2024-05-10 07:38:43 发布

奈何@

最新推荐文章于 2024-05-10 07:38:43 发布

阅读量304

点赞数 1

分类专栏： Hive 文章标签： hadoop hdfs hive 分区表

本文链接：https://blog.csdn.net/sinat_26594945/article/details/112650808

版权

Hive 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1.分区的类型

一般来说，分区分为两种，一种是静态分区，一种是动态分区。
静态分区和动态分区的主要区别在于，静态分区是由手动来指定的，而动态分区则是由数据来进行判定的。
如果需要创建非常多的分区，对于静态分区来说，用户需要创建非常多的sql，但是对于动态分区来说，他是***基于查询参数推断出需要创建的分区名称***。所以这就方便可很多。

2.动态分区的几种情况

2.1 单一字段分区表

创建一张分区表

create table dpartition(id int,name string) partitioned by (ct string);

往分区表里面插入数据

# 开启动态分区
set hive.exec.dynamic.partition=true;#开启动态分区，默认是false
set hive.exec.dynamic.partition.mode=nonstrict;#开启允许所有分区都是动态的，否则必须要有静态分区才能使用。
# 导入数据
要点：因为dpartition表中只有两个字段，所以当我们查询了三个字段时（多了city字段），所以系统默认以最后一个字段city为分区名，因为分区表的
 分区字段默认也是该表中的字段，且依次排在表中字段的最后面。所以分区需要分区的字段只能放在后面，不能把顺序弄错。如果我们查询了四个字段的话，则会报错，因为该表加上分区字段也才三个。要注意系统是根据查询字段的位置推断分区名的，而不是字段名称。
insert overwrite table dpartition partition(ct) select id,name,city from mytest_tmp2_p;
# 查看分区
show partitions dpartition;
+------------+--+
| partition  |
+------------+--+
| ct=china   |
| ct=riben   |
| ct=us      |
+------------+--+
# 查看所有数据

+----------------+------------------+----------------+--+
| dpartition.id  | dpartition.name  | dpartition.ct  |
+----------------+------------------+----------------+--+
| 3              | 可乐               | china          |
| 5              | 张三               | riben          |
| 1              | 李勇               | us             |
| 4              | 张立               | us             |
+----------------+------------------+----------------+--+

注意： 使用，insert…select 往表中导入数据时，查询的字段个数必须和目标的字段个数相同，不能多，也不能少,否则会报错。但是如果字段的类型不一致的话，则会使用null值填充，不会报错。而使用load data形式往hive表中装载数据时，则不会检查。如果字段多了则会丢弃，少了则会null值填充。同样如果字段类型不一致，也是使用null值填充。

2.2 多字段分区表的半自动分区

注意：部分字段静态分区，注意静态分区字段要在动态前面

# 1.创建一个只有一个字段，两个分区字段的分区表
create table ds_parttion(id int ) partitioned by (state string ,ct string );
# 2.往该分区表半动态分区插入数据
 set hive.exec.dynamici.partition=true;
 set hive.exec.dynamic.partition.mode=nonstrict;
 insert overwrite table ds_parttion
 partition(state='china',ct)  #state分区为静态，ct为动态分区，以查询的city字段为分区名
 select id ,city from  mytest_tmp2_p;

#3.查询结果显示：
select *  from ds_parttion where state='china';
+-----------------+--------------------+-----------------+--+
| ds_parttion.id  | ds_parttion.state  | ds_parttion.ct  |
+-----------------+--------------------+-----------------+--+
| 3               | china              | china           |
| 5               | china              | riben           |
| 1               | china              | us              |
| 4               | china              | us              |
+-----------------+--------------------+-----------------+--+

select *  from ds_parttion where state='china' and ct='china';
+-----------------+--------------------+-----------------+--+
| ds_parttion.id  | ds_parttion.state  | ds_parttion.ct  |
+-----------------+--------------------+-----------------+--+
| 3               | china              | china           |
+-----------------+--------------------+-----------------+--+

select *  from ds_parttion where state='china' and ct='us';
+-----------------+--------------------+-----------------+--+
| ds_parttion.id  | ds_parttion.state  | ds_parttion.ct  |
+-----------------+--------------------+-----------------+--+
| 1               | china              | us              |
| 4               | china              | us              |
+-----------------+--------------------+-----------------+--+

2.3 多字段分区表的全自动分区

# 准备数据
1,小明1,12,man
2,小明2,13,boy
3,小明3,13,man
4,小明4,12,boy
5,小明5,13,man
6,小明6,13,boy
7,小明7,13,man
8,小明8,12,boy
9,小明9,12,man

# 建造表
create table psn(
	id int,
    name string,
    age int,
    gender string
)
row format delimited
fields terminated by ',';
# 导入数据
load data local inpath '/root/data/test1' into table psn;
# 创建分区表
create table psn_1(id int,name string) partitioned by (age int,gender string) row format delimited fields terminated by ',';

# 插入数据
insert into psn_1 partition(age,gender) select id,name,age,gender from psn;

# 查看数据
select * from psn_1;

+-----------+-------------+------------+---------------+--+
| psn_1.id  | psn_1.name  | psn_1.age  | psn_1.gender  |
+-----------+-------------+------------+---------------+--+
| 4         | 小明4         | 12         | boy           |
| 8         | 小明8         | 12         | boy           |
| 1         | 小明1         | 12         | man           |
| 9         | 小明9         | 12         | man           |
| 2         | 小明2         | 13         | boy           |
| 6         | 小明6         | 13         | boy           |
| 3         | 小明3         | 13         | man           |
| 5         | 小明5         | 13         | man           |
| 7         | 小明7         | 13         | man           |
+-----------+-------------+------------+---------------+--+