hive lateral view 与 explode详解

https://blog.csdn.net/bitcarmanlee/article/details/51926530

1.explode

hive wiki对于expolde的解释如下:

explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows. UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW.

As an example of using explode() in the SELECT expression list, consider a table named myTable that has a single column (myCol) and two rows:

 

Then running the query:

SELECT explode(myCol) AS myNewCol FROM myTable;

  • 1

will produce: 

 

The usage with Maps is similar:

SELECT explode(myMap) AS (myMapKey, myMapValue) FROM myMapTable;

  • 1

总结起来一句话:explode就是将hive一行中复杂的array或者map结构拆分成多行。

使用实例: 

xxx表中有一个字段mvt为string类型,数据格式如下:

[{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”},{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”},{“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”},{“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”},{“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01”},{“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”},{“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”},{“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”0”,”vid”:”38”,”vr”:”var1”}]

用explode小试牛刀一下:

select explode(split(regexp_replace(mvt,'\\[|\\]',''),'\\},\\{')) from ods_mvt_hourly where day=20160710 limit 10;

  • 1

最后出来的结果如下: 

{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1” 

“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1” 

“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1” 

“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01” 

“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01” 

“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android” 

“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1” 

“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”0”,”vid”:”38”,”vr”:”var1”} 

{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1” 

“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”

2.lateral view

hive wiki 上的解释如下:

Lateral View Syntax

lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (‘,’ columnAlias)* 

fromClause: FROM baseTable (lateralView)*

Description

Lateral view is used in conjunction with user-defined table generating functions such as explode(). As mentioned in Built-in Table-Generating Functions, a UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

Example

Consider the following base table named pageAds. It has two columns: pageid (name of the page) and adid_list (an array of ads appearing on the page) 

 

An example table with two rows: 

 

and the user would like to count the total number of times an ad appears across all pages. 

A lateral view with explode() can be used to convert adid_list into separate rows using the query:

SELECT pageid, adid

FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;

  • 1
  • 2

The resulting output will be 

 

Then in order to count the number of times a particular ad appears, count/group by can be used:

SELECT adid, count(1)

FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid

GROUP BY adid;

  • 1
  • 2
  • 3

The resulting output will be 

 

由此可见,lateral view与explode等udtf就是天生好搭档,explode将复杂结构一行拆成多行,然后再用lateral view做各种聚合。

3.实例

还是第一部分的例子,上面我们explode出来以后的数据,不是标准的json格式,我们通过lateral view与explode组合解析出标准的json格式数据:

SELECT ecrd, CASE WHEN instr(mvtstr,'{')=0

    AND instr(mvtstr,'}')=0 THEN concat('{',mvtstr,'}') WHEN instr(mvtstr,'{')=0

    AND instr(mvtstr,'}')>0 THEN concat('{',mvtstr) WHEN instr(mvtstr,'}')=0

    AND instr(mvtstr,'{')>0 THEN concat(mvtstr,'}') ELSE mvtstr END AS mvt

      FROM ods.ods_mvt_hourly LATERAL VIEW explode(split(regexp_replace(mvt,'\\[|\\]',''),'\\},\\{')) addTable AS mvtstr

        WHERE DAY='20160710' and ecrd is not null limit 10

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

查询出来的结果: 

xxx 

{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”} 

xxx 

{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”} 

xxx 

{“eid”:”40”,”ex”:”new_rpname_Android”,”val”:”1”,”vid”:”1”,”vr”:”var1”} 

xxx 

{“eid”:”19”,”ex”:”hotellistlpage_Android”,”val”:”1”,”vid”:”1”,”vr”:”var01”} 

xxx 

{“eid”:”29”,”ex”:”bookhotelpage_Android”,”val”:”0”,”vid”:”1”,”vr”:”var01” 

xxx 

{“eid”:”17”,”ex”:”trainMode_Android”,”val”:”1”,”vid”:”1”,”vr”:”mode_Android”} 

xxx 

{“eid”:”44”,”ex”:”ihotelList_Android”,”val”:”1”,”vid”:”36”,”vr”:”var1”} 

xxx 

{“eid”:”47”,”ex”:”ihotelDetail_Android”,”val”:”1”,”vid”:”38”,”vr”:”var1”} 

xxx 

{“eid”:”38”,”ex”:”affirm_time_Android”,”val”:”1”,”vid”:”31”,”vr”:”var1”} 

xxx 

{“eid”:”42”,”ex”:”new_comment_Android”,”val”:”1”,”vid”:”34”,”vr”:”var1”}

4.Ending

Lateral View通常和UDTF一起出现,为了解决UDTF不允许在select字段的问题。 

Multiple Lateral View可以实现类似笛卡尔乘积。 

Outer关键字可以把不输出的UDTF的空结果,输出成NULL,防止丢失数据。

参考内容: 

1.http://blog.csdn.net/oopsoom/article/details/26001307 lateral view的用法实例 

2.https://my.oschina.net/leejun2005/blog/120463 复合函数的用法,比较详细 

3.http://blog.csdn.net/zhaoli081223/article/details/46637517 udtf的介绍

 

hive的复合数据类型

Hive中的列支持使用三类复杂的集合数据类型,即:array,map及struct,这些类型的名称是保留字,具体用法可参见该篇博文,里面有关于三类基本集合数据类型的操作实例,注:map中可嵌套array类型。

例如,定义表:

  1. create table example ( 
  2.     device_id string, 
  3.     login_ip array, 
  4.     user_info map<string,array> 
  5.     address struct<street:string,city:string,state:string> 
  6. row format delimited 
  7. fields terminated by '\001' 
  8. collection items terminated by '\002' 
  9. map keys terminated by '\003' 
  10. lines terminated by '\n' 
  11. stored as RCFile; 

假设这样的数据类型以分区表存储,你要统计一段时间类no=1下的去重score,那么该怎么办了?这里可配合使用lateral view首先实现列转行的功能,如下所示:

select no,score from tablaa lateral view explode(score_set) xxx as score;

注:xxx代表虚表名称,不能缺少。

进一步深化上述代码解决统计一段时间的去重值,可写为:

select no,collect_set(score) from tablaa lateral view explode(score_set) xxx as score group by no;

这样,将两个函数结合实现了行转列或列转行的妙用。

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值