HQL:求出每个商品的打折天数

HQL:求出每个商品的打折天数

问题描述:

每个商品可能有多个打折活动,而打折活动的时间段可能会有重叠,求出商品实际参与打折的天数。

商品打折数据:

id	stt			edt
1	2021-06-01	2021-06-05
1	2021-06-07	2021-06-09
2	2021-06-01	2021-06-10
2	2021-06-05	2021-06-15
3	2021-06-01	2021-06-21
3	2021-06-05	2021-06-17
3	2021-06-15	2021-06-27
4	2021-06-01	2021-06-21
4	2021-06-01	2021-06-05
4	2021-06-07	2021-06-15
5	2021-06-01	2021-06-21
5	2021-06-05	2021-06-10
5	2021-06-08	2021-06-15

建表语句:

create table discount
(
    id  string,
    stt string,
    edt string
)
    row format delimited fields terminated by '\t';

分析:

1)这里因为有时间的存在,所以我将商品的打折开始时间和结束时间合并到一列中当作流式数据处理,并且给开始时间打上开始标记,给结束时间打上结束标记,最终按照id和开始时间(stt)排序。

查询结果:

1,2021-06-01,start
1,2021-06-05,stop
1,2021-06-07,start
1,2021-06-09,stop
2,2021-06-01,start
2,2021-06-05,start
2,2021-06-10,stop
2,2021-06-15,stop
3,2021-06-01,start
3,2021-06-05,start
3,2021-06-15,start
3,2021-06-17,stop
3,2021-06-21,stop
3,2021-06-27,stop
4,2021-06-01,start
4,2021-06-01,start
4,2021-06-05,stop
4,2021-06-07,start
4,2021-06-15,stop
4,2021-06-21,stop
5,2021-06-01,start
5,2021-06-05,start
5,2021-06-08,start
5,2021-06-10,stop
5,2021-06-15,stop
5,2021-06-21,stop

子查询:

select id,
       stt     event_time,
       'start' event_type
from discount
union all
select id,
       edt    event_time,
       'stop' event_type
from discount
order by id, event_time;

2)根据标记进行对某个时间段内的活动个数进行统计

查询结果:

1,2021-06-01,1
1,2021-06-05,0
1,2021-06-07,1
1,2021-06-09,0
3,2021-06-01,1
3,2021-06-05,2
3,2021-06-15,3
3,2021-06-17,2
3,2021-06-21,1
3,2021-06-27,0
2,2021-06-01,1
2,2021-06-05,2
2,2021-06-10,1
2,2021-06-15,0
4,2021-06-01,2
4,2021-06-01,2
4,2021-06-05,1
4,2021-06-07,2
4,2021-06-15,1
4,2021-06-21,0
5,2021-06-01,1
5,2021-06-05,2
5,2021-06-08,3
5,2021-06-10,2
5,2021-06-15,1
5,2021-06-21,0

子查询:

select id,
       event_time,
       sum(`if`(event_type = 'start', 1, -1)) over (partition by id order by event_time) event_ct
from (
         select id,
                stt     event_time,
                'start' event_type
         from discount
         union all
         select id,
                edt    event_time,
                'stop' event_type
         from discount
         order by id, event_time
     ) t1;

3)如果event_ct为0,则那条数据对应的event_time就是活动结束时间,统计出结束时间

查询结果:

1,2021-06-01,1,
1,2021-06-05,0,2021-06-05
1,2021-06-07,1,
1,2021-06-09,0,2021-06-09
3,2021-06-01,1,
3,2021-06-05,2,
3,2021-06-15,3,
3,2021-06-17,2,
3,2021-06-21,1,
3,2021-06-27,0,2021-06-27
2,2021-06-01,1,
2,2021-06-05,2,
2,2021-06-10,1,
2,2021-06-15,0,2021-06-15
4,2021-06-01,2,
4,2021-06-01,2,
4,2021-06-05,1,
4,2021-06-07,2,
4,2021-06-15,1,
4,2021-06-21,0,2021-06-21
5,2021-06-01,1,
5,2021-06-05,2,
5,2021-06-08,3,
5,2021-06-10,2,
5,2021-06-15,1,
5,2021-06-21,0,2021-06-21

子查询:

select id,
       event_time,
       event_ct,
       `if`(event_ct = 0, event_time, null) endtime
from (
         select id,
                event_time,
                sum(`if`(event_type = 'start', 1, -1)) over (partition by id order by event_time) event_ct
         from (
                  select id,
                         stt     event_time,
                         'start' event_type
                  from discount
                  union all
                  select id,
                         edt    event_time,
                         'stop' event_type
                  from discount
                  order by id, event_time
              ) t1
     ) t2;

4)给每一条数据打上一个标记,用来表示该条数据的上一个时间段有几条活动存在

查询结果:

1,2021-06-01,1,0,
1,2021-06-05,0,1,2021-06-05
1,2021-06-07,1,0,
1,2021-06-09,0,1,2021-06-09
3,2021-06-01,1,0,
3,2021-06-05,2,1,
3,2021-06-15,3,2,
3,2021-06-17,2,3,
3,2021-06-21,1,2,
3,2021-06-27,0,1,2021-06-27
2,2021-06-01,1,0,
2,2021-06-05,2,1,
2,2021-06-10,1,2,
2,2021-06-15,0,1,2021-06-15
4,2021-06-01,2,0,
4,2021-06-01,2,2,
4,2021-06-05,1,2,
4,2021-06-07,2,1,
4,2021-06-15,1,2,
4,2021-06-21,0,1,2021-06-21
5,2021-06-01,1,0,
5,2021-06-05,2,1,
5,2021-06-08,3,2,
5,2021-06-10,2,3,
5,2021-06-15,1,2,
5,2021-06-21,0,1,2021-06-21

子查询:

select id,
       event_time,
       event_ct,
       lag(event_ct,1,0) over (partition by id order by event_time) lag_event_ct,
       endtime
from (
         select id,
                event_time,
                event_ct,
                `if`(event_ct = 0, event_time, null) endtime
         from (
                  select id,
                         event_time,
                         sum(`if`(event_type = 'start', 1, -1)) over (partition by id order by event_time) event_ct
                  from (
                           select id,
                                  stt     event_time,
                                  'start' event_type
                           from discount
                           union all
                           select id,
                                  edt    event_time,
                                  'stop' event_type
                           from discount
                           order by id, event_time
                       ) t1
              ) t2
     ) t3;

5)如果字段lag_event_ct为0,证明该条数据之前所有的打折活动未开始或者已结束,即该条数据对应的event_time是打折活动开始时间,统计出每个商品打折活动开始时间和结束时间

查询结果:

(空白部分为null)

1,2021-06-01,
1,,2021-06-05
1,2021-06-07,
1,,2021-06-09
3,2021-06-01,
3,,
3,,
3,,
3,,
3,,2021-06-27
2,2021-06-01,
2,,
2,,
2,,2021-06-15
4,2021-06-01,
4,,
4,,
4,,
4,,
4,,2021-06-21
5,2021-06-01,
5,,
5,,
5,,
5,,
5,,2021-06-21

子查询:

select id,
       if(lag_event_ct = 0, event_time, null) start_time,
       endtime
from (
         select id,
                event_time,
                event_ct,
                lag(event_ct, 1, 0) over (partition by id order by event_time) lag_event_ct,
                endtime
         from (
                  select id,
                         event_time,
                         event_ct,
                         `if`(event_ct = 0, event_time, null) endtime
                  from (
                           select id,
                                  event_time,
                                  sum(`if`(event_type = 'start', 1, -1))
                                      over (partition by id order by event_time) event_ct
                           from (
                                    select id,
                                           stt     event_time,
                                           'start' event_type
                                    from discount
                                    union all
                                    select id,
                                           edt    event_time,
                                           'stop' event_type
                                    from discount
                                    order by id, event_time
                                ) t1
                       ) t2
              ) t3
     ) t4; 

6)将开始时间和结束时间合并到一行

查询结果:

3,2021-06-01,2021-06-27
3,,
5,2021-06-01,2021-06-21
5,,
1,2021-06-01,2021-06-05
1,,
1,2021-06-07,2021-06-09
1,,
2,2021-06-01,2021-06-15
2,,
4,2021-06-01,2021-06-21
4,,

子查询:

select id,
       start_time,
       lead(endtime) over (partition by id) end_time
from (
         select id,
                if(lag_event_ct = 0, event_time, null) start_time,
                endtime
         from (
                  select id,
                         event_time,
                         event_ct,
                         lag(event_ct, 1, 0) over (partition by id order by event_time) lag_event_ct,
                         endtime
                  from (
                           select id,
                                  event_time,
                                  event_ct,
                                  `if`(event_ct = 0, event_time, null) endtime
                           from (
                                    select id,
                                           event_time,
                                           sum(`if`(event_type = 'start', 1, -1))
                                               over (partition by id order by event_time) event_ct
                                    from (
                                             select id,
                                                    stt     event_time,
                                                    'start' event_type
                                             from discount
                                             union all
                                             select id,
                                                    edt    event_time,
                                                    'stop' event_type
                                             from discount
                                             order by id, event_time
                                         ) t1
                                ) t2
                       ) t3
              ) t4
     ) t5
where start_time is not null or endtime is not null;

7)过滤掉空数据

查询结果:

3,2021-06-01,2021-06-27
5,2021-06-01,2021-06-21
1,2021-06-01,2021-06-05
1,2021-06-07,2021-06-09
2,2021-06-01,2021-06-15
4,2021-06-01,2021-06-21

子查询:

select id,
       start_time,
       end_time
from (
         select id,
                start_time,
                lead(endtime) over (partition by id) end_time
         from (
                  select id,
                         if(lag_event_ct = 0, event_time, null) start_time,
                         endtime
                  from (
                           select id,
                                  event_time,
                                  event_ct,
                                  lag(event_ct, 1, 0) over (partition by id order by event_time) lag_event_ct,
                                  endtime
                           from (
                                    select id,
                                           event_time,
                                           event_ct,
                                           `if`(event_ct = 0, event_time, null) endtime
                                    from (
                                             select id,
                                                    event_time,
                                                    sum(`if`(event_type = 'start', 1, -1))
                                                        over (partition by id order by event_time) event_ct
                                             from (
                                                      select id,
                                                             stt     event_time,
                                                             'start' event_type
                                                      from discount
                                                      union all
                                                      select id,
                                                             edt    event_time,
                                                             'stop' event_type
                                                      from discount
                                                      order by id, event_time
                                                  ) t1
                                         ) t2
                                ) t3
                       ) t4
              ) t5
         where start_time is not null
            or endtime is not null
     ) t6
where start_time is not null;

8)按照id,开始时间,结束时间进行分组,求得每个id,每个打折活动持续时间

查询结果:

1,5
5,21
2,15
4,21
3,27
1,3

子查询:

select id,
       (datediff(end_time, start_time) + 1) days
from (
         select id,
                start_time,
                end_time
         from (
                  select id,
                         start_time,
                         lead(endtime) over (partition by id) end_time
                  from (
                           select id,
                                  if(lag_event_ct = 0, event_time, null) start_time,
                                  endtime
                           from (
                                    select id,
                                           event_time,
                                           event_ct,
                                           lag(event_ct, 1, 0) over (partition by id order by event_time) lag_event_ct,
                                           endtime
                                    from (
                                             select id,
                                                    event_time,
                                                    event_ct,
                                                    `if`(event_ct = 0, event_time, null) endtime
                                             from (
                                                      select id,
                                                             event_time,
                                                             sum(`if`(event_type = 'start', 1, -1))
                                                                 over (partition by id order by event_time) event_ct
                                                      from (
                                                               select id,
                                                                      stt     event_time,
                                                                      'start' event_type
                                                               from discount
                                                               union all
                                                               select id,
                                                                      edt    event_time,
                                                                      'stop' event_type
                                                               from discount
                                                               order by id, event_time
                                                           ) t1
                                                  ) t2
                                         ) t3
                                ) t4
                       ) t5
                  where start_time is not null
                     or endtime is not null
              ) t6
         where start_time is not null
     ) t7
group by id, start_time, end_time;

9)将每个id的每个打折活动的持续时间相加(到这里已经没有重叠的天数了)

查询结果:

1,8
2,15
3,27
4,21
5,21

最终SQL:

select id,
       sum(days) days
from (
         select id,
                (datediff(end_time, start_time) + 1) days
         from (
                  select id,
                         start_time,
                         end_time
                  from (
                           select id,
                                  start_time,
                                  lead(endtime) over (partition by id) end_time
                           from (
                                    select id,
                                           if(lag_event_ct = 0, event_time, null) start_time,
                                           endtime
                                    from (
                                             select id,
                                                    event_time,
                                                    event_ct,
                                                    lag(event_ct, 1, 0) over (partition by id order by event_time) lag_event_ct,
                                                    endtime
                                             from (
                                                      select id,
                                                             event_time,
                                                             event_ct,
                                                             `if`(event_ct = 0, event_time, null) endtime
                                                      from (
                                                               select id,
                                                                      event_time,
                                                                      sum(`if`(event_type = 'start', 1, -1))
                                                                          over (partition by id order by event_time) event_ct
                                                               from (
                                                                        select id,
                                                                               stt     event_time,
                                                                               'start' event_type
                                                                        from discount
                                                                        union all
                                                                        select id,
                                                                               edt    event_time,
                                                                               'stop' event_type
                                                                        from discount
                                                                        order by id, event_time
                                                                    ) t1
                                                           ) t2
                                                  ) t3
                                         ) t4
                                ) t5
                           where start_time is not null
                              or endtime is not null
                       ) t6
                  where start_time is not null
              ) t7
         group by id, start_time, end_time
     ) t8
group by id
order by id;
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值