目录
0 问题描述
用户行为日志表tb_user_log
id | uid | artical_id | in_time | out_time | sign_in |
1 | 101 | 0 | 2021-07-07 10:00:00 | 2021-07-07 10:00:09 | 1 |
2 | 101 | 0 | 2021-07-08 10:00:00 | 2021-07-08 10:00:09 | 1 |
3 | 101 | 0 | 2021-07-09 10:00:00 | 2021-07-09 10:00:42 | 1 |
4 | 101 | 0 | 2021-07-10 10:00:00 | 2021-07-10 10:00:09 | 1 |
5 | 101 | 0 | 2021-07-11 23:59:55 | 2021-07-11 23:59:59 | 1 |
6 | 101 | 0 | 2021-07-12 10:00:28 | 2021-07-12 10:00:50 | 1 |
7 | 101 | 0 | 2021-07-13 10:00:28 | 2021-07-13 10:00:50 | 1 |
8 | 102 | 0 | 2021-10-01 10:00:28 | 2021-10-01 10:00:50 | 1 |
9 | 102 | 0 | 2021-10-02 10:00:01 | 2021-10-02 10:01:50 | 1 |
10 | 102 | 0 | 2021-10-03 10:00:55 | 2021-10-03 11:00:59 | 1 |
11 | 102 | 0 | 2021-10-04 10:00:45 | 2021-10-04 11:00:55 | 0 |
12 | 102 | 0 | 2021-10-05 10:00:53 | 2021-10-05 11:00:59 | 1 |
13 | 102 | 0 | 2021-10-06 10:00:45 | 2021-10-06 11:00:55 | 1 |
(uid-用户ID, artical_id-文章ID, in_time-进入时间, out_time-离开时间, sign_in-是否签到)
场景逻辑说明:
- artical_id-文章ID代表用户浏览的文章的ID,特殊情况artical_id-文章ID为0表示用户在非文章内容页(比如App内的列表页、活动页等)。注意:只有artical_id为0时sign_in值才有效。
- 从2021年7月7日0点开始,用户每天签到可以领1金币,并可以开始累积签到天数,连续签到的第3、7天分别可额外领2、6金币。
- 每连续签到7天后重新累积签到天数(即重置签到天数:连续第8天签到时记为新的一轮签到的第一天,领1金币)
问题:计算每个用户2021年7月以来每月获得的金币数(该活动到10月底结束,11月1日开始的签到不再获得金币)。结果按月份、ID升序排序。
注:如果签到记录的in_time-进入时间和out_time-离开时间跨天了,也只记作in_time对应的日期签到了。
输出示例:
示例数据的输出结果如下:
uid | month | coin |
101 | 202107 | 15 |
102 | 202110 | 7 |
解释:
101在活动期内连续签到了7天,因此获得1*7+2+6=15金币;
102在10.01~10.03连续签到3天获得5金币
10.04断签了,10.05~10.06连续签到2天获得2金币,共得到7金币。
1 数据准备
DROP TABLE IF EXISTS tb_user_log;
CREATE TABLE tb_user_log (
uid INT COMMENT '用户ID',
artical_id INT COMMENT '视频ID',
in_time string COMMENT '进入时间',
out_time string COMMENT '离开时间',
sign_in int COMMENT '是否签到'
) ;
INSERT INTO tb_user_log(uid, artical_id, in_time, out_time, sign_in) VALUES
(101, 0, '2021-07-07 10:00:00', '2021-07-07 10:00:09', 1),
(101, 0, '2021-07-08 10:00:00', '2021-07-08 10:00:09', 1),
(101, 0, '2021-07-09 10:00:00', '2021-07-09 10:00:42', 1),
(101, 0, '2021-07-10 10:00:00', '2021-07-10 10:00:09', 1),
(101, 0, '2021-07-11 23:59:55', '2021-07-11 23:59:59', 1),
(101, 0, '2021-07-12 10:00:28', '2021-07-12 10:00:50', 1),
(101, 0, '2021-07-13 10:00:28', '2021-07-13 10:00:50', 1),
(101, 0, '2021-07-14 11:00:28', '2021-07-14 11:00:50', 1),
(101, 0, '2021-07-15 11:59:28', '2021-07-16 00:01:20', 1),
(102, 0, '2021-10-01 10:00:28', '2021-10-01 10:00:50', 1),
(102, 0, '2021-10-02 10:00:01', '2021-10-02 10:01:50', 1),
(102, 0, '2021-10-03 11:00:55', '2021-10-03 11:00:59', 1),
(102, 0, '2021-10-04 11:00:45', '2021-10-04 11:00:55', 0),
(102, 0, '2021-10-05 11:00:53', '2021-10-05 11:00:59', 1),
(102, 0, '2021-10-06 11:00:45', '2021-10-06 11:00:55', 1);
2 问题分析
本题在牛客网上算比较难的一道题目,本地所要求的是每个用户每月获得的金币数:
领取金币规则:
- (1)只要用户签到就能获取一枚金币
- (2)连续签到第三天和第七天分别额外获取2枚和6金币
- (3)连续签到7天后进行重置,按照规则(1)和(2)进行新一轮领币
分析:本题的维度是用户和月份,也就是最终结果是按照这两个来分组
第一步:根据领金币的规则,我们容易想到按照签到的连续性先进行分组,将连续的进行标记分成一组
第二步:在每一个连续的组里进行row_number()标记(rn),目的是获取连续的第三天和第七个天特殊位置。
第三步:根据第二步连续的天数中是否有超过7天的,然后以此进行分割,再进行分组
整体计算规则如下草图:
![图1](https://img-blog.csdnimg.cn/3fe2a9ad52964e3c8900419238e5ea16.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA6I6r5Y-r55-z5qa05aeQ,size_20,color_FFFFFF,t_70,g_se,x_16)
步骤三种,首先需要对为7天的地方进行标记(打标记的目的就是为了辅助计算)
标记的方法:rn % 7==0.如果为0则记为1,不为0则记为0
分组方法:按照滑动窗口求和的方法。此类分俩种一种是窗口变得越来越大的方法,另一种是窗口越来越小的方法。具体如下图所示
此两种算法对应的目标不一样:第一种窗口越来越大的方法是希望1标记位分组的时候能分到下面组中(标记位向下分组),第二种窗口越来越小的形式是希望标记为1能分到上一组数据中(或理解为进入到上一个数据桶中)(标记位向上)。具体整体分组如下图
![图2](https://img-blog.csdnimg.cn/d435d0c37ffb4d3bb22eeafdd524dd40.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA6I6r5Y-r55-z5qa05aeQ,size_20,color_FFFFFF,t_70,g_se,x_16)
步骤一:先判断连续性,做第一个分组标签
注意此处不要将是否签到值sign_in在where条件中过滤,此题连续真正的含义是:时间连续+签到的连续,如果sign_in在where条件中过滤则不能构成分组条件
select uid
,in_time
,sign_in
,month
-- 时间和签到同时的连续性
, sum(if(diff < 1,1,0)) over(partition by uid,month,series_sign_in_flg order by in_time) as series_flg
from(
select uid
,in_time
,sign_in
,month
--求出当前值与上一个值的差值,如果为1具备连续的条件(时间的连续性)
,datediff(to_date(in_time),to_date(lag_in_time)) as diff
--构造签到的连续性条件
,sum(if(sign_in!=lag_sign_in,1,0)) over(partition by uid,month order by in_time) as series_sign_in_flg
from
--求出上一行值
(select uid
,in_time
,sign_in
,substr(in_time,1,7) as month
,lag(in_time,1,in_time) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_in_time
,lag(sign_in,1,sign_in) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_sign_in
from tb_user_log
where artical_id=0 and substr(in_time,1,7)>='2021-07'
and substr(in_time,1,7) <='2021-10'
) t
) t
where sign_in=1
+------+----------------------+----------+----------+-------------+
| uid | in_time | sign_in | month | series_flg |
+------+----------------------+----------+----------+-------------+
| 101 | 2021-07-07 10:00:00 | 1 | 2021-07 | 1 |
| 101 | 2021-07-08 10:00:00 | 1 | 2021-07 | 1 |
| 101 | 2021-07-09 10:00:00 | 1 | 2021-07 | 1 |
| 101 | 2021-07-10 10:00:00 | 1 | 2021-07 | 1 |
| 101 | 2021-07-11 23:59:55 | 1 | 2021-07 | 1 |
| 101 | 2021-07-12 10:00:28 | 1 | 2021-07 | 1 |
| 101 | 2021-07-13 10:00:28 | 1 | 2021-07 | 1 |
| 101 | 2021-07-14 11:00:28 | 1 | 2021-07 | 1 |
| 101 | 2021-07-15 11:59:28 | 1 | 2021-07 | 1 |
| 102 | 2021-10-01 10:00:28 | 1 | 2021-10 | 1 |
| 102 | 2021-10-02 10:00:01 | 1 | 2021-10 | 1 |
| 102 | 2021-10-03 11:00:55 | 1 | 2021-10 | 1 |
| 102 | 2021-10-05 11:00:53 | 1 | 2021-10 | 0 |
| 102 | 2021-10-06 11:00:45 | 1 | 2021-10 | 0 |
+------+----------------------+----------+----------+-------------+
错误的分组标签生成:sign_in在where条件中过滤
select *
-- 签到连续性标签
, sum(if(datediff(to_date(in_time),to_date(lag_in_time)) < 1,1,0)) over(partition by uid,month order by in_time) as series_flg
from
(select uid
,in_time
,substr(in_time,1,7) as month
,lag(in_time,1,in_time) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_in_time
from tb_user_log
where artical_id=0 and substr(in_time,1,7)>='2021-07'
and substr(in_time,1,7) <='2021-10' and sign_in=1
) t
生成的结果如下:
步骤二:生成第二个标签,连续7天的分组标记
select *
--生成连续7天的分组标记
,sum(case when rn % 7=0 then 1 else 0 end) over(partition by uid,month,series_flg order by in_time rows between current row and unbounded following) as series_7_flg
from
(select *
--生成每组中的顺序标记
, row_number() over(partition by uid,month,series_flg) as rn
from
(select uid
,in_time
,sign_in
,month
-- 时间和签到同时的连续性
, sum(if(diff < 1,1,0)) over(partition by uid,month,series_sign_in_flg order by in_time) as series_flg
from(
select uid
,in_time
,sign_in
,month
--求出当前值与上一个值的差值,如果为1具备连续的条件(时间的连续性)
,datediff(to_date(in_time),to_date(lag_in_time)) as diff
--构造签到的连续性条件
,sum(if(sign_in!=lag_sign_in,1,0)) over(partition by uid,month order by in_time) as series_sign_in_flg
from
--求出上一行值
(select uid
,in_time
,sign_in
,substr(in_time,1,7) as month
,lag(in_time,1,in_time) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_in_time
,lag(sign_in,1,sign_in) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_sign_in
from tb_user_log
where artical_id=0 and substr(in_time,1,7)>='2021-07'
and substr(in_time,1,7) <='2021-10'
) t
) t
where sign_in=1
) t
) t
对应结果如下:
步骤三:根据领取金币规则,在每个条件分组中求金币数
select uid
,month
,series_flg
,series_7_flg
--按照领取金币规则求金币数
,case when cnt>=1 and cnt<3 then cnt
when cnt>=3 and cnt<7 then cnt + 2
else cnt + 2 + 6 end coin
from
(select uid
,month
,series_flg
,series_7_flg
,count(1) as cnt
from
(select *
--生成连续7天的分组标记
,sum(case when rn % 7=0 then 1 else 0 end) over(partition by uid,month,series_flg order by in_time rows between current row and unbounded following) as series_7_flg
from
(select *
--生成每组中的顺序标记
, row_number() over(partition by uid,month,series_flg) as rn
from
(select uid
,in_time
,sign_in
,month
-- 时间和签到同时的连续性
, sum(if(diff < 1,1,0)) over(partition by uid,month,series_sign_in_flg order by in_time) as series_flg
from(
select uid
,in_time
,sign_in
,month
--求出当前值与上一个值的差值,如果为1具备连续的条件(时间的连续性)
,datediff(to_date(in_time),to_date(lag_in_time)) as diff
--构造签到的连续性条件
,sum(if(sign_in!=lag_sign_in,1,0)) over(partition by uid,month order by in_time) as series_sign_in_flg
from
--求出上一行值
(select uid
,in_time
,sign_in
,substr(in_time,1,7) as month
,lag(in_time,1,in_time) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_in_time
,lag(sign_in,1,sign_in) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_sign_in
from tb_user_log
where artical_id=0 and substr(in_time,1,7)>='2021-07'
and substr(in_time,1,7) <='2021-10'
) t
) t
where sign_in=1
) t
) t
) t
group by uid,month,series_flg,series_7_flg
) t
+------+----------+-------------+---------------+-------+
| uid | month | series_flg | series_7_flg | coin |
+------+----------+-------------+---------------+-------+
| 102 | 2021-10 | 1 | 0 | 5 |
| 102 | 2021-10 | 0 | 0 | 2 |
| 101 | 2021-07 | 1 | 1 | 15 |
| 101 | 2021-07 | 1 | 0 | 2 |
+------+----------+-------------+---------------+-------+
分组1:series_flg表示连续签到分组。分组2:series_7_flg表示连续领取金币天数中有超过7天的,并以7天为分割点重新分组。
此处领取金币的算法为:1-2天为等于连续的天数,3-6天为连续的天数加2,当为7天时为连续天数加8
步骤四:获取用户每月的金币数,并按用户及月份升序排序
最终SQL如下
select uid,month,sum(coin) as coin
from(
select uid
,month
,series_flg
,series_7_flg
--按照领取金币规则求金币数
,case when cnt>=1 and cnt<3 then cnt
when cnt>=3 and cnt<7 then cnt + 2
else cnt + 2 + 6 end coin
from
(select uid
,month
,series_flg
,series_7_flg
,count(1) as cnt
from
(select *
--生成连续7天的分组标记
,sum(case when rn % 7=0 then 1 else 0 end) over(partition by uid,month,series_flg order by in_time rows between current row and unbounded following) as series_7_flg
from
(select *
--生成每组中的顺序标记
, row_number() over(partition by uid,month,series_flg) as rn
from
(select uid
,in_time
,sign_in
,month
-- 时间和签到同时的连续性
, sum(if(diff < 1,1,0)) over(partition by uid,month,series_sign_in_flg order by in_time) as series_flg
from(
select uid
,in_time
,sign_in
,month
--求出当前值与上一个值的差值,如果为1具备连续的条件(时间的连续性)
,datediff(to_date(in_time),to_date(lag_in_time)) as diff
--构造签到的连续性条件
,sum(if(sign_in!=lag_sign_in,1,0)) over(partition by uid,month order by in_time) as series_sign_in_flg
from
--求出上一行值
(select uid
,in_time
,sign_in
,substr(in_time,1,7) as month
,lag(in_time,1,in_time) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_in_time
,lag(sign_in,1,sign_in) over(partition by uid,substr(in_time,1,7) order by in_time) as lag_sign_in
from tb_user_log
where artical_id=0 and substr(in_time,1,7)>='2021-07'
and substr(in_time,1,7) <='2021-10'
) t
) t
where sign_in=1
) t
) t
) t
group by uid,month,series_flg,series_7_flg
) t
)t
group by uid,month
order by uid,month
最终结果如下:
+------+----------+-------+
| uid | month | coin |
+------+----------+-------+
| 101 | 2021-07 | 17 |
| 102 | 2021-10 | 7 |
+------+----------+-------+
注意此处与所给的答案有出入是因为造的数据中增加了2条数据,便于测试。
3 小结
本文给出了一种连续签到领取金币的一种通用解法,通过窗口函数生成标签值进行辅助计算。通过本题可以收获以下知识点:
- (1)连续签到的判断方法(注意时间连续+签到的连续)
- (2)断点(按照某种条件)重分组的方法(如本题的连续7天后为一个断点)
1) 断点同上:order by in_time rows between current row and unbounded
following(上边界滑动,下边界固定)
2)断点同下: order by in_time(下边界滑动,上边界固定)
- (3)周期的计算方法,如本题7天一个循环(采用求余的方法:rn%7==0)
欢迎关注石榴姐公众号"我的SQL呀",关注我不迷路