HiveSql面试题:连续签到领金币问题【百度-困难题-通用解法】

49 篇文章 222 订阅
48 篇文章 119 订阅

目录

0 问题描述

1 数据准备

2 问题分析

3 小结


0 问题描述

用户行为日志表tb_user_log

iduidartical_idin_timeout_timesign_in
110102021-07-07 10:00:002021-07-07 10:00:091
210102021-07-08 10:00:002021-07-08 10:00:091
310102021-07-09 10:00:002021-07-09 10:00:421
41010

2021-07-10 10:00:00

2021-07-10 10:00:091
510102021-07-11 23:59:552021-07-11 23:59:591
610102021-07-12 10:00:282021-07-12 10:00:501
710102021-07-13 10:00:28

2021-07-13 10:00:50

1
810202021-10-01 10:00:282021-10-01 10:00:501
910202021-10-02 10:00:012021-10-02 10:01:501
1010202021-10-03 10:00:552021-10-03 11:00:591
1110202021-10-04 10:00:452021-10-04 11:00:550
1210202021-10-05 10:00:532021-10-05 11:00:591
1310202021-10-06 10:00:452021-10-06 11:00:551

(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)

场景逻辑说明

  • artical_id-文章ID代表用户浏览的文章的ID,特殊情况artical_id-文章ID0表示用户在非文章内容页(比如App内的列表页、活动页等)。注意:只有artical_id为0时sign_in值才有效。
  • 从2021年7月7日0点开始,用户每天签到可以领1金币,并可以开始累积签到天数,连续签到的第3、7天分别可额外领2、6金币。
  • 每连续签到7天后重新累积签到天数(即重置签到天数:连续第8天签到时记为新的一轮签到的第一天,领1金币)

问题:计算每个用户2021年7月以来每月获得的金币数(该活动到10月底结束,11月1日开始的签到不再获得金币)。结果按月份、ID升序排序。

:如果签到记录的in_time-进入时间和out_time-离开时间跨天了,也只记作in_time对应的日期签到了。

输出示例:

示例数据的输出结果如下:

uidmonthcoin
10120210715
1022021107

解释:

101在活动期内连续签到了7天,因此获得1*7+2+6=15金币;

102在10.01~10.03连续签到3天获得5金币

10.04断签了,10.05~10.06连续签到2天获得2金币,共得到7金币。

1 数据准备

DROP TABLE IF EXISTS tb_user_log;
CREATE TABLE tb_user_log (
    uid INT COMMENT '用户ID',
    artical_id INT COMMENT '视频ID',
    in_time string COMMENT '进入时间',
    out_time string COMMENT '离开时间',
    sign_in int COMMENT '是否签到'
) ;

INSERT INTO tb_user_log(uid, artical_id, in_time, out_time, sign_in) VALUES
  (101, 0, '2021-07-07 10:00:00', '2021-07-07 10:00:09', 1),
  (101, 0, '2021-07-08 10:00:00', '2021-07-08 10:00:09', 1),
  (101, 0, '2021-07-09 10:00:00', '2021-07-09 10:00:42', 1),
  (101, 0, '2021-07-10 10:00:00', '2021-07-10 10:00:09', 1),
  (101, 0, '2021-07-11 23:59:55', '2021-07-11 23:59:59', 1),
  (101, 0, '2021-07-12 10:00:28', '2021-07-12 10:00:50', 1),
  (101, 0, '2021-07-13 10:00:28', '2021-07-13 10:00:50', 1),
  (101, 0, '2021-07-14 11:00:28', '2021-07-14 11:00:50', 1),
  (101, 0, '2021-07-15 11:59:28', '2021-07-16 00:01:20', 1),
  (102, 0, '2021-10-01 10:00:28', '2021-10-01 10:00:50', 1),
  (102, 0, '2021-10-02 10:00:01', '2021-10-02 10:01:50', 1),
  (102, 0, '2021-10-03 11:00:55', '2021-10-03 11:00:59', 1),
  (102, 0, '2021-10-04 11:00:45', '2021-10-04 11:00:55', 0),
  (102, 0, '2021-10-05 11:00:53', '2021-10-05 11:00:59', 1),
  (102, 0, '2021-10-06 11:00:45', '2021-10-06 11:00:55', 1);

2 问题分析

本题在牛客网上算比较难的一道题目,本地所要求的是每个用户每月获得的金币数:

领取金币规则:

  • (1)只要用户签到就能获取一枚金币
  • (2)连续签到第三天和第七天分别额外获取2枚和6金币
  • (3)连续签到7天后进行重置,按照规则(1)和(2)进行新一轮领币

 分析:本题的维度是用户和月份,也就是最终结果是按照这两个来分组

第一步:根据领金币的规则,我们容易想到按照签到的连续性先进行分组,将连续的进行标记分成一组

第二步:在每一个连续的组里进行row_number()标记(rn),目的是获取连续的第三天和第七个天特殊位置。

第三步:根据第二步连续的天数中是否有超过7天的,然后以此进行分割,再进行分组

整体计算规则如下草图:

图1
图1

步骤三种,首先需要对为7天的地方进行标记(打标记的目的就是为了辅助计算)

标记的方法:rn % 7==0.如果为0则记为1,不为0则记为0

分组方法:按照滑动窗口求和的方法。此类分俩种一种是窗口变得越来越大的方法,另一种是窗口越来越小的方法。具体如下图所示

 此两种算法对应的目标不一样:第一种窗口越来越大的方法是希望1标记位分组的时候能分到下面组中(标记位向下分组),第二种窗口越来越小的形式是希望标记为1能分到上一组数据中(或理解为进入到上一个数据桶中)(标记位向上)。具体整体分组如下图

图2
图1

步骤一:先判断连续性,做第一个分组标签

注意此处不要将是否签到值sign_in在where条件中过滤,此题连续真正的含义是:时间连续+签到的连续,如果sign_in在where条件中过滤则不能构成分组条件

select  uid
      ,in_time
      ,sign_in
      ,month
      -- 时间和签到同时的连续性
     , sum(if(diff < 1,1,0)) over(partition by uid,month,series_sign_in_flg order by in_time) as    series_flg

from(
select uid
      ,in_time
      ,sign_in
      ,month
    --求出当前值与上一个值的差值,如果为1具备连续的条件(时间的连续性)
      ,datediff(to_date(in_time),to_date(lag_in_time))  as diff
      --构造签到的连续性条件
      ,sum(if(sign_in!=lag_sign_in,1,0)) over(partition by uid,month order by in_time) as series_sign_in_flg
from
--求出上一行值
  (select uid
        ,in_time
        ,sign_in
        ,substr(in_time,1,7) as month
        ,lag(in_time,1,in_time) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_in_time
        ,lag(sign_in,1,sign_in) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_sign_in
  from tb_user_log
  where artical_id=0 and substr(in_time,1,7)>='2021-07' 
        and substr(in_time,1,7) <='2021-10'
  ) t
  ) t
where sign_in=1
+------+----------------------+----------+----------+-------------+
| uid  |       in_time        | sign_in  |  month   | series_flg  |
+------+----------------------+----------+----------+-------------+
| 101  | 2021-07-07 10:00:00  | 1        | 2021-07  | 1           |
| 101  | 2021-07-08 10:00:00  | 1        | 2021-07  | 1           |
| 101  | 2021-07-09 10:00:00  | 1        | 2021-07  | 1           |
| 101  | 2021-07-10 10:00:00  | 1        | 2021-07  | 1           |
| 101  | 2021-07-11 23:59:55  | 1        | 2021-07  | 1           |
| 101  | 2021-07-12 10:00:28  | 1        | 2021-07  | 1           |
| 101  | 2021-07-13 10:00:28  | 1        | 2021-07  | 1           |
| 101  | 2021-07-14 11:00:28  | 1        | 2021-07  | 1           |
| 101  | 2021-07-15 11:59:28  | 1        | 2021-07  | 1           |
| 102  | 2021-10-01 10:00:28  | 1        | 2021-10  | 1           |
| 102  | 2021-10-02 10:00:01  | 1        | 2021-10  | 1           |
| 102  | 2021-10-03 11:00:55  | 1        | 2021-10  | 1           |
| 102  | 2021-10-05 11:00:53  | 1        | 2021-10  | 0           |
| 102  | 2021-10-06 11:00:45  | 1        | 2021-10  | 0           |
+------+----------------------+----------+----------+-------------+

错误的分组标签生成:sign_in在where条件中过滤

select *
      -- 签到连续性标签
     , sum(if(datediff(to_date(in_time),to_date(lag_in_time)) < 1,1,0)) over(partition by uid,month order by in_time) as    series_flg
from
  (select uid
        ,in_time
        ,substr(in_time,1,7) as month
        ,lag(in_time,1,in_time) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_in_time
  from tb_user_log
  where artical_id=0 and substr(in_time,1,7)>='2021-07' 
        and substr(in_time,1,7) <='2021-10' and sign_in=1
  ) t

 生成的结果如下:

步骤二:生成第二个标签,连续7天的分组标记

select *
      --生成连续7天的分组标记
      ,sum(case when rn % 7=0 then 1 else 0 end) over(partition by uid,month,series_flg order by in_time rows between current row and unbounded following) as series_7_flg
from 
(select *
--生成每组中的顺序标记
     , row_number() over(partition by uid,month,series_flg) as rn
from  
(select  uid
      ,in_time
      ,sign_in
      ,month
      -- 时间和签到同时的连续性
     , sum(if(diff < 1,1,0)) over(partition by uid,month,series_sign_in_flg order by in_time) as    series_flg

from(
select uid
      ,in_time
      ,sign_in
      ,month
    --求出当前值与上一个值的差值,如果为1具备连续的条件(时间的连续性)
      ,datediff(to_date(in_time),to_date(lag_in_time))  as diff
      --构造签到的连续性条件
      ,sum(if(sign_in!=lag_sign_in,1,0)) over(partition by uid,month order by in_time) as series_sign_in_flg
from
--求出上一行值
  (select uid
        ,in_time
        ,sign_in
        ,substr(in_time,1,7) as month
        ,lag(in_time,1,in_time) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_in_time
        ,lag(sign_in,1,sign_in) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_sign_in
  from tb_user_log
  where artical_id=0 and substr(in_time,1,7)>='2021-07' 
        and substr(in_time,1,7) <='2021-10'
  ) t
  ) t
where sign_in=1
) t
) t

对应结果如下:

 步骤三:根据领取金币规则,在每个条件分组中求金币数

select uid
      ,month
      ,series_flg
      ,series_7_flg
      --按照领取金币规则求金币数
      ,case when cnt>=1 and cnt<3 then cnt
            when cnt>=3 and cnt<7 then cnt + 2
       else cnt + 2 + 6 end coin        
from
(select uid
      ,month
      ,series_flg
      ,series_7_flg
      ,count(1) as cnt
from
(select *
      --生成连续7天的分组标记
      ,sum(case when rn % 7=0 then 1 else 0 end) over(partition by uid,month,series_flg order by in_time rows between current row and unbounded following) as series_7_flg
from 
(select *
--生成每组中的顺序标记
     , row_number() over(partition by uid,month,series_flg) as rn
from  
(select  uid
      ,in_time
      ,sign_in
      ,month
      -- 时间和签到同时的连续性
     , sum(if(diff < 1,1,0)) over(partition by uid,month,series_sign_in_flg order by in_time) as    series_flg

from(
select uid
      ,in_time
      ,sign_in
      ,month
    --求出当前值与上一个值的差值,如果为1具备连续的条件(时间的连续性)
      ,datediff(to_date(in_time),to_date(lag_in_time))  as diff
      --构造签到的连续性条件
      ,sum(if(sign_in!=lag_sign_in,1,0)) over(partition by uid,month order by in_time) as series_sign_in_flg
from
--求出上一行值
  (select uid
        ,in_time
        ,sign_in
        ,substr(in_time,1,7) as month
        ,lag(in_time,1,in_time) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_in_time
        ,lag(sign_in,1,sign_in) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_sign_in
  from tb_user_log
  where artical_id=0 and substr(in_time,1,7)>='2021-07' 
        and substr(in_time,1,7) <='2021-10'
  ) t
  ) t
where sign_in=1
) t
) t
) t
group by uid,month,series_flg,series_7_flg
) t
+------+----------+-------------+---------------+-------+
| uid  |  month   | series_flg  | series_7_flg  | coin  |
+------+----------+-------------+---------------+-------+
| 102  | 2021-10  | 1           | 0             | 5     |
| 102  | 2021-10  | 0           | 0             | 2     |
| 101  | 2021-07  | 1           | 1             | 15    |
| 101  | 2021-07  | 1           | 0             | 2     |
+------+----------+-------------+---------------+-------+

分组1:series_flg表示连续签到分组。分组2:series_7_flg表示连续领取金币天数中有超过7天的,并以7天为分割点重新分组。

此处领取金币的算法为:1-2天为等于连续的天数,3-6天为连续的天数加2,当为7天时为连续天数加8

步骤四:获取用户每月的金币数,并按用户及月份升序排序

 最终SQL如下

select uid,month,sum(coin) as coin
from(
select uid
      ,month
      ,series_flg
      ,series_7_flg
      --按照领取金币规则求金币数
      ,case when cnt>=1 and cnt<3 then cnt
            when cnt>=3 and cnt<7 then cnt + 2
       else cnt + 2 + 6 end coin        
from
(select uid
      ,month
      ,series_flg
      ,series_7_flg
      ,count(1) as cnt
from
(select *
      --生成连续7天的分组标记
      ,sum(case when rn % 7=0 then 1 else 0 end) over(partition by uid,month,series_flg order by in_time rows between current row and unbounded following) as series_7_flg
from 
(select *
--生成每组中的顺序标记
     , row_number() over(partition by uid,month,series_flg) as rn
from  
(select  uid
      ,in_time
      ,sign_in
      ,month
      -- 时间和签到同时的连续性
     , sum(if(diff < 1,1,0)) over(partition by uid,month,series_sign_in_flg order by in_time) as    series_flg

from(
select uid
      ,in_time
      ,sign_in
      ,month
    --求出当前值与上一个值的差值,如果为1具备连续的条件(时间的连续性)
      ,datediff(to_date(in_time),to_date(lag_in_time))  as diff
      --构造签到的连续性条件
      ,sum(if(sign_in!=lag_sign_in,1,0)) over(partition by uid,month order by in_time) as series_sign_in_flg
from
--求出上一行值
  (select uid
        ,in_time
        ,sign_in
        ,substr(in_time,1,7) as month
        ,lag(in_time,1,in_time) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_in_time
        ,lag(sign_in,1,sign_in) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_sign_in
  from tb_user_log
  where artical_id=0 and substr(in_time,1,7)>='2021-07' 
        and substr(in_time,1,7) <='2021-10'
  ) t
  ) t
where sign_in=1
) t
) t
) t
group by uid,month,series_flg,series_7_flg
) t
)t
group by uid,month 
order by uid,month

 最终结果如下:

+------+----------+-------+
| uid  |  month   | coin  |
+------+----------+-------+
| 101  | 2021-07  | 17    |
| 102  | 2021-10  | 7     |
+------+----------+-------+

注意此处与所给的答案有出入是因为造的数据中增加了2条数据,便于测试。

3 小结

本文给出了一种连续签到领取金币的一种通用解法,通过窗口函数生成标签值进行辅助计算。通过本题可以收获以下知识点:

  • (1)连续签到的判断方法(注意时间连续+签到的连续)
  • (2)断点(按照某种条件)重分组的方法(如本题的连续7天后为一个断点)
         1) 断点同上:order by in_time rows between current row and unbounded   

                     following(上边界滑动,下边界固定)

         2)断点同下: order by in_time(下边界滑动,上边界固定)
  • (3)周期的计算方法,如本题7天一个循环(采用求余的方法:rn%7==0)

欢迎关注石榴姐公众号"我的SQL呀",关注我不迷路

  • 5
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 5
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值