一、组内排名(分组后进行排名)
1、排序开窗函数:
rank(),dense_rank(),row_number()
2、基本语法:
<排序函数> over (partition by <用于分组的列名> order by <用于排序的列名>),
如rank() over (partition by user_id order by score )
注意在写rank(),dense_rank()时,其括号为空
3、三种排序开窗函数的区别:
考虑并列问题
(图片取自知乎网络)
二、求取top n内的数据
求出组内排名后,就可以轻松解决top n的问题,子查询加上条件函数where,如下:
select * from
(select *, dense_rank() over(partition by 分组列 order by 排名列 desc)as 排名
from 各科成绩表) as a
where 排名<=N;
三、相邻间隔问题
如:SQL,表user_pv_time中字段是user_id , user_time(用户访问时间),求每个用户相邻两次浏览时间之差小于三分钟的次数
方法一:
select a.user_id, count(*) as pv_count
from (select user_id, user_time,
row_number() over (partition by user_id order by user_time) as rn
from user_pv_time) as a
left join (select user_id, user_time,
row_number() over (partition by user_id order by user_time) as rn
from user_pv_time) as b
on a.user_id = b.user_id
where cast(b.rn as signed) - cast(a.rn as signed) = =1
and timestampdiff(minute, a.user_time, b.user_time) < 3
group by user_id
这里用到了cast()函数和timestampdiff()函数,解释如下:
- cast函数主要是转换字段类型,语法规则是:cast(字段名 as 转换的类型 ),其中类型可以为:char[(n)]:字符型,date:日期型,signed:int型等。我理解正常排名基本都是int型,所以可以省略cast函数,只是加上更严谨。
- timestampdiff()语法规则是(unit,begin,end),返回begin-end的结果,其中begin和end是date或datetime表达式,unit参数是确定(end-begin)的结果的单位,表示为整数,取值可以是秒second,分钟minute,小时hour和天day等。
方法二:
select t.user_id, count(*) as pv_count
from (select user_id, user_time,
lead(user_time, 1) over (partition by user_id
order by user_time) as next_time
from user_pv_time) as t
wfere timestampdiff(minute, user_time, next_time) < 3
group by user_id
这里用到了lead()函数,分组中位于当前行后n行(lead为前函数,对应的还有后函数lag),该方法省去了多表链接的问题。