一、时间区间拆分成单独行
1、天级别:根据指定日期(到天)和结束日期,列出这段时间内的所有明细时间
select tf.*,t.*, date_add(start_date,pos)
from (
select 'a' as a, '2018-11-01' as start_date, '2018-12-01' as end_date
) t
lateral view posexplode(split(space(datediff(end_date,start_date)),' ')) tf as pos,val limit 5000
2、秒级别:
如果是对精确到秒的时间段,展开这段时间内的所有的具体时间
select i, from_unixtime(unix_timestamp(time1)+ pe.i,'yyyyMMddHHmmss') as dynamic_date ,
'2020-11-01 13:01:10' as start_time, '2020-11-01 13:01:20' end_time
from ( select '2020-11-01 13:01:10' time1, '2020-11-01 13:01:20' time2 ) t
lateral view posexpoplode(split(space(cast(unix_timestamp(time2)-unix_timestamp(time1) as int)),' ')) pe as i, x
二、数组中的应用
A表(shows表示访问过的城市列表,是有序的)
B表(show表示访问过的城市)
假设现在需求是:判断B表中访问过的城市在A表shows中的位置。
1、由于A表的shows是数组,所以先用posexplode将A表按shows列展开
select * from A LATERAL VIEW posexplode(shows) t as pos, city
得到的结果是如下表,即可以看出按照shows这一列展开了,并且还给出了shows中元素的位置(第三列)
下面merge A表和B表,找到B表中访问过的城市在A表shows中的位置
select T2.city, T2,pos
(select * from B) T1
left outer join
(select * from A LATERAL VIEW posexplode(shows) t as pos , city_single) T2
on T1.id = T2.id and T1.city = T2.citys_single
merge后得到的结果是
Hive之 explode和 posexpolode更详细的教程