hiveSql 构建连续日期

hiveSql 构建连续日期

需求

有如下数据

iddateval
1012022-12-0110
1012022-12-0320
1012022-12-0640
1022022-12-0220
1022022-12-0430
1022022-12-0760

期望构建如下数据:

idfinal_datevalfinal_value
1012022-12-011010
1012022-12-02010
1012022-12-032030
1012022-12-04030
1012022-12-05030
1012022-12-064070
1012022-12-07070
1022022-12-0100
1022022-12-022020
1022022-12-03020
1022022-12-043050
1022022-12-05050
1022022-12-06050
1022022-12-0760110

期望数据是将已有数据补齐2022-12-01到2022-12-07号日期,没有的日期value为0,然后计算每组id内的累加value值。

分析

从期望数据角度看,咱们如果有每组id的2022-12-01到2022-12-07的数据,作为主表left join已有数据,再一个sum()over() 就很简单了,关键在于如何构造连续日期。
构造连续日期,且分成多行很容易想到利用repeat()函数,和posexplode()拆分重复的字符串,构建多行带索引的数据。再利用行索引与2022-12-01做日期计算,就能得到每组组内的01-07号多行数据

实现

1、先构建所有分组的2022-12-01、2022-12-07


with tmp as (
    select '101' as id, '2022-12-01' as idate, 10 as value
    union all 
    select '101' as id, '2022-12-03' as idate, 20 as value
    union all 
    select '101' as id, '2022-12-06' as idate, 40 as value
    union all  
    select '102' as id, '2022-12-02' as idate, 20 as value
    union all 
    select '102' as id, '2022-12-04' as idate, 30 as value
    union all 
    select '102' as id, '2022-12-07' as idate, 60 as value
)
select id,'2022-12-07' as end_time,'2022-12-01' as start_time from tmp group by id

在这里插入图片描述
2、再将步骤1中的2022-12-07与2022-12-01做差+1,作为repeat()函数的参数,重复多少次。得到重复字符串后,利用posexplode()列转行,将重复字符串拆分出有索引(ind)的多行,再将start_time与索引做日期计算,得到最终主表数据结果,见sql:


with tmp as (
    select '101' as id, '2022-12-01' as idate, 10 as value
    union all 
    select '101' as id, '2022-12-03' as idate, 20 as value
    union all 
    select '101' as id, '2022-12-06' as idate, 40 as value
    union all  
    select '102' as id, '2022-12-02' as idate, 20 as value
    union all 
    select '102' as id, '2022-12-04' as idate, 30 as value
    union all 
    select '102' as id, '2022-12-07' as idate, 60 as value
),
during_date as (
    select id,'2022-12-07' as end_time,'2022-12-01' as start_time from tmp group by id
)
select 
    a.id,a.start_time,b.ind
    ,substr(dateadd(to_date(a.start_time,'yyyy-mm-dd'),b.ind,'dd'),1,10) as final_date
from during_date a
lateral view posexplode( split(repeat('a,',datediff(end_time,start_time)+1),',') ) b as ind,val

结果如下:
在这里插入图片描述
3、步骤2中的数据,咱们只需要id和final_date列即可,作为主表数据 去left join基础数据,见sql:

with tmp as (
    select '101' as id, '2022-12-01' as idate, 10 as value
    union all 
    select '101' as id, '2022-12-03' as idate, 20 as value
    union all 
    select '101' as id, '2022-12-06' as idate, 40 as value
    union all  
    select '102' as id, '2022-12-02' as idate, 20 as value
    union all 
    select '102' as id, '2022-12-04' as idate, 30 as value
    union all 
    select '102' as id, '2022-12-07' as idate, 60 as value
),
during_date as (
    select id,'2022-12-07' as end_time,'2022-12-01' as start_time from tmp group by id
)
select 
    a.id,
    a.final_date,
    nvl(b.value,0) as value,
    sum(nvl(b.value,0)) over(partition by a.id order by a.final_date ) as final_value
from
    (select 
        a.id
        ,substr(dateadd(to_date(a.start_time,'yyyy-mm-dd'),b.ind,'dd'),1,10) as final_date
    from during_date a
    lateral view posexplode( split(repeat('a,',datediff(end_time,start_time)+1),',') ) b as ind,val
    ) a 
left join 
    tmp b 
on a.id = b.id and a.final_date = b.idate

结果如下:
在这里插入图片描述

最后

喜欢的点赞、关注、收藏吧~ 你的支持是最大的创作动力~~

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

@nanami

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值