语法
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*
数据准备
假设我们有一张表pageAds,它有两列数据,第一列是pageid string,第二列是adid_list,即用逗号分隔的广告ID集合。
mahao@ubuntu:~$ cat pageAds.txt
"front_page" 1,2,3
"contact_page" 3,4,5
hive> CREATE TABLE pageAds(pageid STRING,adid_list Array<INT>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',';
OK
Time taken: 3.458 seconds
hive> LOAD DATA LOCAL INPATH 'pageAds.txt' INTO TABLE pageAds;
Loading data to table default.pageads
OK
Time taken: 1.377 seconds
hive> SELECT * FROM pageAds;
OK
"front_page" [1,2,3]
"contact_page" [3,4,5]
Time taken: 2.127 seconds, Fetched: 2 row(s)
hive>
统计所有广告ID出现的次数。
首先要拆分广告ID,explode()指出要拆分的行,AS子句指出拆分后的列名:
hive> SELECT pageid,adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
OK
"front_page" 1
"front_page" 2
"front_page" 3
"contact_page" 3
"contact_page" 4
"contact_page" 5
按照adid分组,进行统计:
hive>SELECT adid,count(1) FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid GROUP BY adid;
OK
1 1
2 1
3 2
4 1
5 1
多个lateral view语句
一个FROM语句后可以跟多个lateral view语句,后面的lateral view语句可以引用它前面的所有表和列名,例子如下:
表的数据:
hive> select * from baseTable;
OK
[1,2] ["'a'","'b'","'c'"]
[3,4] ["'d'","'e'","'f'"]
两个lateral view语句:
hive> select mycol1,col2 from baseTable lateral view explode(col1) tb1 as mycol1
> lateral view explode(col2) tb2 as mycol2;
OK
1 ["'a'","'b'","'c'"]
1 ["'a'","'b'","'c'"]
1 ["'a'","'b'","'c'"]
2 ["'a'","'b'","'c'"]
2 ["'a'","'b'","'c'"]
2 ["'a'","'b'","'c'"]
3 ["'d'","'e'","'f'"]
3 ["'d'","'e'","'f'"]
3 ["'d'","'e'","'f'"]
4 ["'d'","'e'","'f'"]
4 ["'d'","'e'","'f'"]
4 ["'d'","'e'","'f'"]
注意上面语句中,两个lateral view按照出现的次序被执行。
转自http://yugouai.iteye.com/blog/1849902