SQL求最大连续天数

斐硕人

已于 2024-04-29 11:03:21 修改

阅读量867

点赞数 22

文章标签： sql 数据库

于 2024-04-29 05:18:53 首次发布

本文链接：https://blog.csdn.net/feishuoren/article/details/138297688

版权

笔试面试专栏收录该内容

2 篇文章 0 订阅

订阅专栏

SQL 最大连续回答问题的天数大于等于n天的用户及其等级

题目描述

现有某乎问答创作者信息表author_tb如下(其中author_id表示创作者编号、author_level表示创作者级别，共1-6六个级别、sex表示创作者性别)：

author_id	author_level	sex
101	6	m
102	1	f
103	1	m
104	3	m
105	4	f
106	2	f
107	2	m
108	5	f
109	6	f
110	5	m

创作者回答情况表answer_tb如下（其中answer_date表示创作日期、author_id指创作者编号、issue_id指回答问题编号、char_len表示回答字数）：

answer_date	author_id	issue_id	char_len
2021-11-01	101	E001	150
2021-11-01	101	E002	200
2021-11-01	102	C003	50
2021-11-01	103	P001	35
2021-11-01	104	C003	120
2021-11-01	105	P001	125
2021-11-01	102	P002	105
2021-11-02	101	P001	201
2021-11-02	110	C002	200
2021-11-02	110	C001	225
2021-11-02	110	C002	220
2021-11-03	101	C002	180
2021-11-04	109	E003	130
2021-11-04	109	E001	123
2021-11-05	108	C001	160
2021-11-05	108	C002	120
2021-11-05	110	P001	180
2021-11-05	106	P002	45
2021-11-05	107	E003	56

请你统计最大连续回答问题的天数大于等于3天的用户及其等级（若有多条符合条件的数据，按author_id升序排序），以上例子的输出结果如下：

author_id	author_level	days_cnt
101	6	3

解决

对每个用户的回答日期去重、连续排序

# drop table if exists tmp_days;
# create temporary table tmp_days
with tmp_days as(
	select distinct a.author_id, a.answer_date, 
	dense_rank() over(partition by author_id order by answer_date) day_rank
	from answer_tb a
)

author_id	answer_date	day_rank
101	2021-11-01	1
101	2021-11-02	2
101	2021-11-03	3
102	2021-11-01	1
103	2021-11-01	1
104	2021-11-01	1
105	2021-11-01	1
106	2021-11-05	1
107	2021-11-05	1
108	2021-11-05	1
109	2021-11-04	1
110	2021-11-02	1
110	2021-11-05	2

注意：

由于表变量实际上使用了临时表，增加了额外的I/O开销，造成性能的损失。因此不太适合数据量大且频繁查询的情况。为此，在SQL Server 2005中提供了WITH公用表表达式（CTE）。不需要显式定义删除，效率更高。
CTEs are similar to tables created with CREATE [TEMPORARY] TABLE but need not be defined or dropped explicitly. For a CTE, you need no privileges to create tables.
上面用的排序函数为 dense_rank()，而不是row_number()。
这是因为使用 row_number() over (partition by * order by *) 时，order by 的字段最好是唯一的、不为空的，如果重复或为空，排序可能会失败，按照 sql 的默认排序去取值。

author_id	answer_date	day_rank
101	2021-11-01	1
101	2021-11-01	2
101	2021-11-02	3
101	2021-11-03	4
102	2021-11-01	1
102	2021-11-01	2
103	2021-11-01	1
104	2021-11-01	1
105	2021-11-01	1
106	2021-11-05	1
107	2021-11-05	1
108	2021-11-05	1
108	2021-11-05	2
109	2021-11-04	1
109	2021-11-04	2
110	2021-11-02	1
110	2021-11-02	2
110	2021-11-02	3
110	2021-11-05	4

如果要使用row_number()，应该对answer_date列去重后再使用，SQL如下。

select a.author_id, a.answer_date,
row_number() over(partition by author_id order by answer_date asc) day_rank
from (
	select distinct a.author_id, a.answer_date 
	from answer_tb a
	order by author_id asc, answer_date asc
) a

对回答日期与日期排序相减，如果日期连续差值应该相同（eg: [1,2,3]-[7,8,9]=[-6,-6,-6]）

# drop table if exists tmp
# create temporary table tmp 
with tmp as(
	select author_id,answer_date, day_rank, date_sub(answer_date, interval day_rank day) p_day 
	from tmp_days
)

author_id	answer_date	day_rank	p_day
101	2021-11-01	1	2021-10-31
101	2021-11-02	2	2021-10-31
101	2021-11-03	3	2021-10-31
102	2021-11-01	1	2021-10-31
103	2021-11-01	1	2021-10-31
104	2021-11-01	1	2021-10-31
105	2021-11-01	1	2021-10-31
106	2021-11-05	1	2021-11-04
107	2021-11-05	1	2021-11-04
108	2021-11-05	1	2021-11-04
109	2021-11-04	1	2021-11-03
110	2021-11-02	1	2021-11-01
110	2021-11-05	2	2021-11-03

统计连续的天数

# drop table if exists t
# create temporary table t
with t as(
	select author_id,p_day, count(p_day) days_cnt
	from tmp
	group by author_id,p_day
	# order by days_cnt desc;
)

author_id	p_day	days_cnt
101	2021-10-31	3
eg*101	2021-12-01	5
102	2021-10-31	1
103	2021-10-31	1
104	2021-10-31	1
105	2021-10-31	1
106	2021-11-04	1
107	2021-11-04	1
eg*107	2021-12-01	3
108	2021-11-04	1
109	2021-11-03	1
110	2021-11-01	1
110	2021-11-03	1

统计最大连续天数大于3的用户及其等级

select distinct t.author_id, l.author_level, 
max(t.days_cnt) over(partition by author_id) days_cnt
from t
left join author_tb l
on l.author_id = t.author_id
where days_cnt >= 3

多个连续天数举例

author_id	author_level	days_cnt
eg*101	6	5
eg*107	2	3

原答案

author_id	author_level	days_cnt
101	6	3

完整代码

select distinct t.author_id, l.author_level, 
max(t.days_cnt) over(partition by author_id) as days_cnt
from (
    select author_id,p_day, count(p_day) days_cnt
    from ( 
        select author_id,answer_date, day_rank, 
        date_sub(answer_date, interval day_rank day) p_day from (
            select distinct a.author_id, a.answer_date,
            dense_rank() over(partition by author_id order by answer_date) day_rank
            from answer_tb a
        ) tmp_days
    ) tmp
    group by author_id,p_day
) t
left join author_tb l
on l.author_id = t.author_id
where days_cnt >= 3

使用WITH，可读性更强

with tmp_days as (
    select distinct a.author_id, a.answer_date,
    dense_rank() over(partition by author_id order by answer_date) day_rank
    from answer_tb a
),tmp as(
    select author_id,answer_date, day_rank, 
    date_sub(answer_date, interval day_rank day) p_day 
    from tmp_days
),t as (
    select author_id,p_day, count(p_day) days_cnt
    from tmp 
    group by author_id,p_day
)


select distinct t.author_id, l.author_level, max(t.days_cnt) over(partition by author_id) days_cnt
from t
left join author_tb l
on l.author_id = t.author_id
where days_cnt >= 3