连续签到领金币数

 需求解析

今天写hive的SQL题目,遇到了一个有趣的题目,题目的需求如下:

用户每天签到可以领 1 金币,并可以累计签到天数,连续签到的第 3、7 天分别可以额外领 2 和 6 金币。
每连续签到 7 天重新累积签到天数。
计算用户从 2021 年 7 月以来每个月获得的金币数,结果按照月份、ID 升序排序。

现在来分析一下这个需求,通过转换可以得知:用户每登录一天就可以获得一个金币,连续登录的第三天可以获得三个金币,连续登录的第七天可以获得七个金币。

那么现在问题来到了怎么进行7天一统计呢?

有个思路,一共是七天,给每天增加一个标记,第三天是3,第七天是7,其余的全是1。

那么怎么给第三天和第七天打标记呢?

现在来找规律:

第三天登录:3,10,17,24 .......  

第七天登录:7,14,21,28......

date-3=0   or    (date-3) % 7=0

date % 7 = 0

那么好,现在上代码

DROP TABLE IF EXISTS user_sign;
CREATE TABLE user_sign
(
    `user_id`    varchar(32),
    `id`         varchar(32),
    `start_time` timestamp,
    `end_time`   timestamp,
    `sign`       int
)
    COMMENT '用户签到表'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
        NULL DEFINED AS ''
    LOCATION '/warehouse/sdc/rds/user_sign';
INSERT
INTO user_sign
VALUES ('101', '0', CAST('2021-07-07 02:00:00' AS TIMESTAMP), CAST('2021-07-07 02:00:09' AS TIMESTAMP), 1),
       ('101', '0', CAST('2021-07-08 02:00:00' AS TIMESTAMP), CAST('2021-07-08 02:00:09' AS TIMESTAMP), 1),
       ('101', '0', CAST('2021-07-09 02:00:00' AS TIMESTAMP), CAST('2021-07-09 02:00:42' AS TIMESTAMP), 1),
       ('101', '0', CAST('2021-07-10 02:00:00' AS TIMESTAMP), CAST('2021-07-10 02:00:09' AS TIMESTAMP), 1),
       ('101', '0', CAST('2021-07-11 15:59:55' AS TIMESTAMP), CAST('2021-07-11 15:59:59' AS TIMESTAMP), 1),
       ('101', '0', CAST('2021-07-12 02:00:28' AS TIMESTAMP), CAST('2021-07-12 02:00:50' AS TIMESTAMP), 1),
       ('101', '0', CAST('2021-07-13 02:00:28' AS TIMESTAMP), CAST('2021-07-13 02:00:50' AS TIMESTAMP), 1),
       ('102', '0', CAST('2021-10-01 02:00:28' AS TIMESTAMP), CAST('2021-10-01 02:00:50' AS TIMESTAMP), 1),
       ('102', '0', CAST('2021-10-02 02:00:01' AS TIMESTAMP), CAST('2021-10-02 02:01:50' AS TIMESTAMP), 1),
       ('102', '0', CAST('2021-10-03 03:00:55' AS TIMESTAMP), CAST('2021-10-03 03:00:59' AS TIMESTAMP), 1),
       ('102', '0', CAST('2021-10-04 03:00:45' AS TIMESTAMP), CAST('2021-10-04 03:00:55' AS TIMESTAMP), 0),
       ('102', '0', CAST('2021-10-05 03:00:53' AS TIMESTAMP), CAST('2021-10-05 03:00:59' AS TIMESTAMP), 1),
       ('102', '0', CAST('2021-10-06 03:00:45' AS TIMESTAMP), CAST('2021-10-06 03:00:55' AS TIMESTAMP), 1);

 具体实现

第一步

插入的数据是时间戳类型,因为我们统计登录信息就可以了,把时间戳做个格式转换“yyyy-MM-dd”,并且去除不需要的字段

select user_id, date_format(start_time, "yyyy-MM-dd") ts
from user_sign
where sign = 1;

结果如下: 

这样看起来就省心多了,接下来开始第二步

第二步

统计用户的连续登录区间,经典老套路了。

先给时间开个窗,按照用户的ID分组

然后现在的时间和排名作减法【date_sub】,如果结果是相等的,就说明是连续登录的。上代码

select user_id,
       ts,
       row_number() over (partition by user_id order by ts)               rn,
       date_sub(ts, row_number() over (partition by user_id order by ts)) flag
from (select user_id, date_format(start_time, "yyyy-MM-dd") ts
      from user_sign
      where sign = 1) t;

ts表示登录的日期

rn是用了开窗排序,按照登录日期来排序

如果flag字段相等的话,就说明在这段时间是连续登录的

表中的102用户,他的登录区间是断开的,我们要怎么把1,2,3,4,5拆成1,2,3和1,2呢

只需要再次用row_number开个窗,不过这次是按照用户的id和flag进行分组,把每个用户的不同登录区间找出来

select user_id,
       ts,
       row_number() over (partition by t1.user_id,t1.flag order by t1.ts) cnt
from (select user_id,
             ts,
             row_number() over (partition by user_id order by ts)               rn,
             date_sub(ts, row_number() over (partition by user_id order by ts)) flag
      from (select user_id, date_format(start_time, "yyyy-MM-dd") ts
            from user_sign
            where sign = 1) t) t1;

现在思路是不是突然“柳暗花明又一村”了?

根据我们最初对登录领金币的需求分析,对第三天登录和第七天登录的金币作标记3和7,其余的天数登录都是1。

select user_id,
       case
           when (cnt - 3) = 3 or (cnt - 3) % 7 = 0 then 3
           when (cnt % 7) = 0 then 7
           else 1 end glods
from (select user_id,
             ts,
             row_number() over (partition by t1.user_id,t1.flag order by t1.ts) cnt
      from (select user_id,
                   ts,
                   row_number() over (partition by user_id order by ts)               rn,
                   date_sub(ts, row_number() over (partition by user_id order by ts)) flag
            from (select user_id, date_format(start_time, "yyyy-MM-dd") ts
                  from user_sign
                  where sign = 1) t) t1) t2;

然后现在进行简单的分id做sum聚合就可以了

select user_id, sum(glods) num
from (select user_id,
             case
                 when (cnt - 3) = 3 or (cnt - 3) % 7 = 0 then 3
                 when (cnt % 7) = 0 then 7
                 else 1 end glods
      from (select user_id,
                   ts,
                   row_number() over (partition by t1.user_id,t1.flag order by t1.ts) cnt
            from (select user_id,
                         ts,
                         row_number() over (partition by user_id order by ts)               rn,
                         date_sub(ts, row_number() over (partition by user_id order by ts)) flag
                  from (select user_id, date_format(start_time, "yyyy-MM-dd") ts
                        from user_sign
                        where sign = 1) t) t1) t2) t3
group by user_id;

至此,完成对id 的金币统计

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值