原文:https://www.cnblogs.com/wujin/p/6051768.html
row_number()、rank()、dense_rank()三个函数相当于group by 之后再来个order by
上实例
id name sal
1 a 10
2 a 12
3 b 13
4 b 12
5 a 14
6 a 15
7 a 13
8 b 11
9 a 16
10 b 17
11 a 14
三个函数
select id,
name,
sal,
rank()over(partition by name order by sal desc ) as rp,
dense_rank() over(partition by name order by sal desc ) as drp,
row_number()over(partition by name order by sal desc) as rmp
from 表
结果
ID NAME SAL rank dense_rank row_number
10 b 17 1 1 1
3 b 13 2 2 2
4 b 12 3 3 3
8 b 11 4 4 4
9 a 16 1 1 1
6 a 15 2 2 2
11 a 14 3 3 3
5 a 14 3 3 4
7 a 13 5 4 5
2 a 12 6 5 6
1 a 10 7 6 7
rank()在sal排序的时候,sal遇到重复的值时,rank值也重复,接下来的值还是按正常顺序,但总数不会变
dense_rank()在sal排序的时候,sal遇到重复的值时,rank值也重复,接下来的值按下一个顺序来,总数会变
row_number()会正常排序,没什么说法,一个一个递增
求sal前50%的人
select * from (
select id,
name,
sal,
rank()over(partition by name order by sal desc ) rp,
dense_rank() over(partition by name order by sal desc ) drp,
row_number()over(partition by name order by sal desc) rmp,
count(*)over(partition by name) *0.5 as count
from f_test
) t where t.rp <t.count;
感觉虽然可以实现,但是有点复杂,有没有更好的方法实现呢
NTILE
NTILE(n),用于将分组数据按照顺序切分成n片,返回当前切片值
NTILE不支持ROWS BETWEEN,比如 NTILE(2) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
如果切片不均匀,默认增加第一个切片的分布
上面那个例子 正好可以用到这个
select * from (
select id,
name,
sal,
NTILE(2) over(partition by name order by sal desc ) rn
from f_test
) t where t.rn=1
查重语句:(group by 查重)
SELECT user_id,
count(*) as num
FROM 表名
WHERE 条件 GROUP BY user_id HAVING count(*)>1;
teradata中运用row_number()去重
select * from 表名 where 条件 QUALIFY ROW_NUMBER() OVER(PARTITION BY 字段 ORDER BY 字段 DESC) = 1
qualify前面不用接and,如果没有条件的话可以用1=1来填充