营销活动表结构
表名:marketing
brand 品牌 string
startdate 营销活动开始日期 date
enddate 营销活动结束日期 date
营销活动表数据
+------------+----------------+--------------+--+
| brand | startdate | enddate |
+------------+----------------+--------------+--+
| 华为 | 2018-08-04 | 2018-08-05 |
| 华为 | 2018-08-04 | 2020-12-25 |
| 小米 | 2018-08-15 | 2018-08-20 |
| 小米 | 2020-01-01 | 2020-01-05 |
| 苹果 | 2018-09-01 | 2018-09-05 |
| 苹果 | 2018-09-03 | 2018-09-06 |
| 苹果 | 2018-09-09 | 2018-09-15 |
+------------+----------------+--------------+--+
题目描述
该表记录了每个品牌的营销活动开始日期以及结束日期,现需要统计出每个品牌的总营销天数。
注意
- 苹果第一行数据的营销结束日期比第二行数据的营销开始日期要晚,这部分有
重叠
的日期的要去重计算。 - 苹果第二行数据的营销结束日期和第三行的开始日期不连续,2019-09-07 以及 2019-09-08 不统计到营销天数中。
结果
+----------+-------------+--+
| 品牌 | 总营销天数 |
+----------+-------------+--+
| 华为 | 875 |
| 小米 | 11 |
| 苹果 | 13 |
+----------+-------------+--+
代码实现
方法一
with marketing as(
select '华为' as brand, '2018-08-04' as startdate, '2018-08-05' as enddate union all
select '华为' as brand, '2018-08-04' as startdate, '2020-12-25' as enddate union all
select '小米' as brand, '2018-08-15' as startdate, '2018-08-20' as enddate union all
select '小米' as brand, '2020-01-01' as startdate, '2020-01-05' as enddate union all
select '苹果' as brand, '2018-09-01' as startdate, '2018-09-05' as enddate union all
select '苹果' as brand, '2018-09-03' as startdate, '2018-09-06' as enddate union all
select '苹果' as brand, '2018-09-09' as startdate, '2018-09-15' as enddate
)
select
brand
,count(distinct real_date)
from (
select
brand
,startdate
,enddate
,pos
,date_add(startdate, pos) as real_date
from marketing
lateral view posexplode(split(space(datediff(enddate, startdate)), ' ')) t as pos, val
) t1
group by brand
;
方法二
with marketing as(
select '华为' as brand, '2018-08-04' as startdate, '2018-08-05' as enddate from dual union all
select '华为' as brand, '2018-08-04' as startdate, '2020-12-25' as enddate from dual union all
select '小米' as brand, '2018-08-15' as startdate, '2018-08-20' as enddate from dual union all
select '小米' as brand, '2020-01-01' as startdate, '2020-01-05' as enddate from dual union all
select '苹果' as brand, '2018-09-01' as startdate, '2018-09-05' as enddate from dual union all
select '苹果' as brand, '2018-09-03' as startdate, '2018-09-06' as enddate from dual union all
select '苹果' as brand, '2018-09-09' as startdate, '2018-09-15' as enddate from dual
)
,tmp as(
select
brand
,to_date(startdate, 'yyyy-mm-dd') as startdate
,to_date(enddate, 'yyyy-mm-dd') as enddate
--取上一个活动的结束日期
,lag(to_date(enddate, 'yyyy-mm-dd')) over(partition by brand order by startdate) as lag_end_date
from marketing
)
,tmp2 as(
select
brand
,startdate
,enddate
,lag_end_date
--判断上个活动的结束日期与下个活动的开始日期差值,计算重叠天数
--大于0,则证明有重叠,取反,其余证明没有重叠,取0即可
,case when lag_end_date - startdate > 0 then -(lag_end_date - startdate) else 0 end as overlap_days
,enddate - startdate as days
from tmp
)
select
brand
--单纯的日期相加减可不行,注意最后结果
--每次活动的天数+重叠天数+日期相减差的天数(正常两日期相减+1,才为活动天数)
,sum(days) + sum(overlap_days) + sum(case when overlap_days < 0 then 0 else 1 end) as total_days
from tmp2
group by brand
;
注意
方法二此处比较容易忽略的是不能只用单纯的日期相加减,从最后结果可以看出,比如 2019-09-07 以及 2019-09-08 举办了营销活动,实际是两天,而日期想加减只是一天。