hive sql练习3:连续日期问题-posexplod函数使用

文章提供了两种SQL查询方法来统计每个品牌的营销活动总天数,考虑了日期重叠的情况。在方法二中强调了不能仅通过日期相加减来计算,需处理日期间的重叠部分。
摘要由CSDN通过智能技术生成

营销活动表结构

             表名:marketing
brand 品牌                   string
startdate 营销活动开始日期     date
enddate 营销活动结束日期       date

营销活动表数据

+------------+----------------+--------------+--+
|    brand   |   startdate    |    enddate   |
+------------+----------------+--------------+--+
| 华为        | 2018-08-04     | 2018-08-05   |
| 华为        | 2018-08-04     | 2020-12-25   |
| 小米        | 2018-08-15     | 2018-08-20   |
| 小米        | 2020-01-01     | 2020-01-05   |
| 苹果        | 2018-09-01     | 2018-09-05   |
| 苹果        | 2018-09-03     | 2018-09-06   |
| 苹果        | 2018-09-09     | 2018-09-15   |
+------------+----------------+--------------+--+

题目描述

该表记录了每个品牌的营销活动开始日期以及结束日期,现需要统计出每个品牌的总营销天数。
注意

  1. 苹果第一行数据的营销结束日期比第二行数据的营销开始日期要晚,这部分有重叠的日期的要去重计算。
  2. 苹果第二行数据的营销结束日期和第三行的开始日期不连续,2019-09-07 以及 2019-09-08 不统计到营销天数中。

结果

+----------+-------------+--+
|   品牌   | 总营销天数  |
+----------+-------------+--+
| 华为     | 875         |
| 小米     | 11          |
| 苹果     | 13          |
+----------+-------------+--+

代码实现

方法一

with marketing as(
select '华为' as brand, '2018-08-04' as startdate, '2018-08-05' as enddate union all
select '华为' as brand, '2018-08-04' as startdate, '2020-12-25' as enddate union all
select '小米' as brand, '2018-08-15' as startdate, '2018-08-20' as enddate union all
select '小米' as brand, '2020-01-01' as startdate, '2020-01-05' as enddate union all
select '苹果' as brand, '2018-09-01' as startdate, '2018-09-05' as enddate union all
select '苹果' as brand, '2018-09-03' as startdate, '2018-09-06' as enddate union all
select '苹果' as brand, '2018-09-09' as startdate, '2018-09-15' as enddate 
)
select 
brand
,count(distinct real_date)
from (
select 
       brand
       ,startdate
       ,enddate
       ,pos
       ,date_add(startdate, pos) as real_date
from marketing
lateral view posexplode(split(space(datediff(enddate, startdate)), ' ')) t as pos, val
) t1
group by brand
;

方法二

with marketing as(
select '华为' as brand, '2018-08-04' as startdate, '2018-08-05' as enddate from dual union all
select '华为' as brand, '2018-08-04' as startdate, '2020-12-25' as enddate from dual union all
select '小米' as brand, '2018-08-15' as startdate, '2018-08-20' as enddate from dual union all
select '小米' as brand, '2020-01-01' as startdate, '2020-01-05' as enddate from dual union all
select '苹果' as brand, '2018-09-01' as startdate, '2018-09-05' as enddate from dual union all
select '苹果' as brand, '2018-09-03' as startdate, '2018-09-06' as enddate from dual union all
select '苹果' as brand, '2018-09-09' as startdate, '2018-09-15' as enddate from dual 
)
,tmp as(
select 
brand
,to_date(startdate, 'yyyy-mm-dd') as startdate
,to_date(enddate, 'yyyy-mm-dd') as enddate
--取上一个活动的结束日期
,lag(to_date(enddate, 'yyyy-mm-dd')) over(partition by brand order by startdate) as lag_end_date
from marketing
)
,tmp2 as(
select 
brand
,startdate
,enddate
,lag_end_date
--判断上个活动的结束日期与下个活动的开始日期差值,计算重叠天数
--大于0,则证明有重叠,取反,其余证明没有重叠,取0即可
,case when lag_end_date - startdate > 0 then -(lag_end_date - startdate) else 0 end as overlap_days
,enddate - startdate as days
from tmp
)
select 
brand
--单纯的日期相加减可不行,注意最后结果
--每次活动的天数+重叠天数+日期相减差的天数(正常两日期相减+1,才为活动天数)
,sum(days) + sum(overlap_days) + sum(case when overlap_days < 0 then 0 else 1 end) as total_days
from tmp2
group by brand
;

注意

方法二此处比较容易忽略的是不能只用单纯的日期相加减,从最后结果可以看出,比如 2019-09-07 以及 2019-09-08 举办了营销活动,实际是两天,而日期想加减只是一天。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值