本文讨论 Hive explode 关键字使用, 并使用一个简单案例来验证结果.
Hive 支持 array 和 map 类型, 但是如何统计 array 或 map 里的值, 一直没有找到好的方法. Pig 有行转列关键字 flatten. 查阅了很多 Hive 资料, 找到了 explode 关键字. 谨以此例来验证 Hive explode 功能.
hive> create table if not exists explode_array
> (
> userId string,
> userName string,
> tags array<string>
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ',';
OK
Time taken: 0.942 seconds
数据样例:
00001 zhzhenqin 80,90
00002 hello java, 女
00003 world java,python,90
查询:
hive> select * from explode_array limit 10;
OK
00001 zhzhenqin ["80","90"]
00002 hello ["java"," 女"]
00003 world ["java","python","90"]
Time taken: 0.041 seconds, Fetched: 3 row(s)
使用 explode 查询:
hive> select userId,userName,tagId from explode_array lateral view explode(tags) tags as tagId;
Total jobs = 1
... 省略部分日志
Total MapReduce CPU Time Spent: 0 msec
OK
00001 zhzhenqin 80
00001 zhzhenqin 90
00002 hello java
00002 hello 女
00003 world java
00003 world python
00003 world 90
Time taken: 20.104 seconds, Fetched: 7 row(s)
二次嵌套查询, 并统计:
hive> select user_tag.tagId, count(*) as count from (select userId,userName,tagId from explode_array lateral view explode(tags) tags as tagId) as user_tag group by user_tag.tagId order by count DESC;
Total jobs = 2
... 省略日志
Total MapReduce CPU Time Spent: 0 msec
OK
java 2
90 2
python 1
80 1
女 1
Time taken: 39.994 seconds, Fetched: 5 row(s)
该例子是使用 array 类型, 我们的用户和标签在 Hive 存储的是 map 类型, Map 的 key 为 tagid, value 为 weight. explode 也是支持 map 类型的.
explode 在转 array 时, 输出一列; 转 map 时, 是输出2列, key 和 value 当做 2列输出.
下面演示 Map 类型的 explode 用法:
建表语句, 以及导入的数据
create table if not exists explode_map
(
userId string,
userName string,
tags map<string, int>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':';
数据
00001 zhzhenqin 80:1,90:2
00002 hello java:10,女:2
00003 world java:1,python:3,90:1
查询验证:
hive> select * from explode_map;
OK
00001 zhzhenqin {"80":1,"90":2}
00002 hello {"java":10,"女":2}
00003 world {"java":1,"python":3,"90":1}
Time taken: 0.04 seconds, Fetched: 3 row(s)
使用 explode 关键字查询:
hive>select userId,userName,tagId,weight from explode_map lateral view explode(tags) tags as tagId, weight;
00001 zhzhenqin 80 1
00001 zhzhenqin 90 2
00002 hello java 10
00002 hello 女 2
00003 world java 1
00003 world python 3
00003 world 90 1
内嵌查询及统计:
hive> select user_tag.tagId, count(*) as count from (select userId,userName,tagId,weight from explode_map lateral view explode(tags) tags as tagId, weight) as user_tag group by user_tag.tagId order by count DESC;
java 2
90 2
女 1
python 1
80 1