SQL求最大连续天数

SQL 最大连续回答问题的天数大于等于n天的用户及其等级

题目描述

现有某乎问答创作者信息表author_tb如下(其中author_id表示创作者编号、author_level表示创作者级别,共1-6六个级别、sex表示创作者性别):

author_idauthor_levelsex
1016m
1021f
1031m
1043m
1054f
1062f
1072m
1085f
1096f
1105m

创作者回答情况表answer_tb如下(其中answer_date表示创作日期、author_id指创作者编号、issue_id指回答问题编号、char_len表示回答字数):

answer_dateauthor_idissue_idchar_len
2021-11-01101E001150
2021-11-01101E002200
2021-11-01102C00350
2021-11-01103P00135
2021-11-01104C003120
2021-11-01105P001125
2021-11-01102P002105
2021-11-02101P001201
2021-11-02110C002200
2021-11-02110C001225
2021-11-02110C002220
2021-11-03101C002180
2021-11-04109E003130
2021-11-04109E001123
2021-11-05108C001160
2021-11-05108C002120
2021-11-05110P001180
2021-11-05106P00245
2021-11-05107E00356

请你统计最大连续回答问题的天数大于等于3天的用户及其等级(若有多条符合条件的数据,按author_id升序排序),以上例子的输出结果如下:

author_idauthor_leveldays_cnt
10163

解决

  1. 对每个用户的回答日期去重、连续排序
# drop table if exists tmp_days;
# create temporary table tmp_days
with tmp_days as(
	select distinct a.author_id, a.answer_date, 
	dense_rank() over(partition by author_id order by answer_date) day_rank
	from answer_tb a
)
author_idanswer_dateday_rank
1012021-11-011
1012021-11-022
1012021-11-033
1022021-11-011
1032021-11-011
1042021-11-011
1052021-11-011
1062021-11-051
1072021-11-051
1082021-11-051
1092021-11-041
1102021-11-021
1102021-11-052

注意:

  1. 由于表变量实际上使用了临时表,增加了额外的I/O开销,造成性能的损失。因此不太适合数据量大且频繁查询的情况。为此,在SQL Server 2005中提供了WITH公用表表达式(CTE)。不需要显式定义删除,效率更高。
    CTEs are similar to tables created with CREATE [TEMPORARY] TABLE but need not be defined or dropped explicitly. For a CTE, you need no privileges to create tables.
  2. 上面用的排序函数为 dense_rank(),而不是row_number()。
    这是因为使用 row_number() over (partition by * order by *) 时,order by 的字段最好是唯一的、不为空的,如果重复或为空,排序可能会失败,按照 sql 的默认排序去取值。
author_idanswer_dateday_rank
1012021-11-011
1012021-11-012
1012021-11-023
1012021-11-034
1022021-11-011
1022021-11-012
1032021-11-011
1042021-11-011
1052021-11-011
1062021-11-051
1072021-11-051
1082021-11-051
1082021-11-052
1092021-11-041
1092021-11-042
1102021-11-021
1102021-11-022
1102021-11-023
1102021-11-054

如果要使用row_number(),应该对answer_date列去重后再使用,SQL如下。

select a.author_id, a.answer_date,
row_number() over(partition by author_id order by answer_date asc) day_rank
from (
	select distinct a.author_id, a.answer_date 
	from answer_tb a
	order by author_id asc, answer_date asc
) a
  1. 对回答日期与日期排序相减,如果日期连续差值应该相同(eg: [1,2,3]-[7,8,9]=[-6,-6,-6])
# drop table if exists tmp
# create temporary table tmp 
with tmp as(
	select author_id,answer_date, day_rank, date_sub(answer_date, interval day_rank day) p_day 
	from tmp_days
)
author_idanswer_dateday_rankp_day
1012021-11-0112021-10-31
1012021-11-0222021-10-31
1012021-11-0332021-10-31
1022021-11-0112021-10-31
1032021-11-0112021-10-31
1042021-11-0112021-10-31
1052021-11-0112021-10-31
1062021-11-0512021-11-04
1072021-11-0512021-11-04
1082021-11-0512021-11-04
1092021-11-0412021-11-03
1102021-11-0212021-11-01
1102021-11-0522021-11-03
  1. 统计连续的天数
# drop table if exists t
# create temporary table t
with t as(
	select author_id,p_day, count(p_day) days_cnt
	from tmp
	group by author_id,p_day
	# order by days_cnt desc;
)
author_idp_daydays_cnt
1012021-10-313
eg*1012021-12-015
1022021-10-311
1032021-10-311
1042021-10-311
1052021-10-311
1062021-11-041
1072021-11-041
eg*1072021-12-013
1082021-11-041
1092021-11-031
1102021-11-011
1102021-11-031
  1. 统计最大连续天数大于3的用户及其等级
select distinct t.author_id, l.author_level, 
max(t.days_cnt) over(partition by author_id) days_cnt
from t
left join author_tb l
on l.author_id = t.author_id
where days_cnt >= 3

多个连续天数举例

author_idauthor_leveldays_cnt
eg*10165
eg*10723

原答案

author_idauthor_leveldays_cnt
10163

完整代码

select distinct t.author_id, l.author_level, 
max(t.days_cnt) over(partition by author_id) as days_cnt
from (
    select author_id,p_day, count(p_day) days_cnt
    from ( 
        select author_id,answer_date, day_rank, 
        date_sub(answer_date, interval day_rank day) p_day from (
            select distinct a.author_id, a.answer_date,
            dense_rank() over(partition by author_id order by answer_date) day_rank
            from answer_tb a
        ) tmp_days
    ) tmp
    group by author_id,p_day
) t
left join author_tb l
on l.author_id = t.author_id
where days_cnt >= 3

使用WITH,可读性更强

with tmp_days as (
    select distinct a.author_id, a.answer_date,
    dense_rank() over(partition by author_id order by answer_date) day_rank
    from answer_tb a
),tmp as(
    select author_id,answer_date, day_rank, 
    date_sub(answer_date, interval day_rank day) p_day 
    from tmp_days
),t as (
    select author_id,p_day, count(p_day) days_cnt
    from tmp 
    group by author_id,p_day
)


select distinct t.author_id, l.author_level, max(t.days_cnt) over(partition by author_id) days_cnt
from t
left join author_tb l
on l.author_id = t.author_id
where days_cnt >= 3

相关链接

  1. 牛客SQL194题
  2. sql 四大排名函数—(ROW_NUMBER、RANK、DENSE_RANK、NTILE)简介
  3. MySQL DATE_SUB()函数
  4. row_number() over (partition by * order by *) 排序失效
  5. SQL With As 用法
  6. SQL语句性能优化操作
  • 22
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值