sql - MySQL中的Rank函数
我需要找出客户的排名。 在这里,我为我的要求添加了相应的ANSI标准SQL查询。 请帮我转换为MySQL。
SELECT RANK() OVER (PARTITION BY Gender ORDER BY Age) AS [Partition by Gender],
FirstName,
Age,
Gender
FROM Person
有没有找到MySQL排名的函数?
10个解决方案
242 votes
一种选择是使用排名变量,例如:
SELECT first_name,
age,
gender,
@curRank := @curRank + 1 AS rank
FROM person p, (SELECT @curRank := 0) r
ORDER BY age;
(SELECT @curRank := 0)部分允许变量初始化,而无需单独的SET命令。
测试用例:
CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));
INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Jack', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');
结果:
+------------+------+--------+------+
| first_name | age | gender | rank |
+------------+------+--------+------+
| Kathy | 18 | F | 1 |
| Jane | 20 | F | 2 |
| Nick | 22 | M | 3 |
| Bob | 25 | M | 4 |
| Anne | 25 | F | 5 |
| Jack | 30 | M | 6 |
| Bill | 32 | M | 7 |
| Steve | 36 | M | 8 |
+------------+------+--------+------+
8 rows in set (0.02 sec)
Daniel Vassallo answered 2019-05-10T17:06:23Z
46 votes
这是一个通用的解决方案,它根据列对表进行排序并分配排名; 带有关系的行被赋予相同的排名(为此目的使用额外的变量):
SET @prev_value = NULL;
SET @rank_count = 0;
SELECT id, rank_column, CASE
WHEN @prev_value = rank_column THEN @rank_count
WHEN @prev_value := rank_column THEN @rank_count := @rank_count + 1
END AS rank
FROM rank_table
ORDER BY rank_column
请注意,第二个WHEN子句中有两个赋值语句。 样本数据:
CREATE TABLE rank_table(id INT, rank_column INT);
INSERT INTO rank_table (id, rank_column) VALUES
(1, 10),
(2, 20),
(3, 30),
(4, 30),
(5, 30),
(6, 40),
(7, 50),
(8, 50),
(9, 50);
输出:
+------+-------------+------+
| id | rank_column | rank |
+------+-------------+------+
| 1 | 10 | 1 |
| 2 | 20 | 2 |
| 3 | 30 | 3 |
| 4 | 30 | 3 |
| 5 | 30 | 3 |
| 6 | 40 | 4 |
| 7 | 50 | 5 |
| 8 | 50 | 5 |
| 9 | 50 | 5 |
+------+-------------+------+
SQL小提琴
Salman A answered 2019-05-10T17:07:01Z
42 votes
虽然最受欢迎的答案排名,但它没有分区,你可以自己加入以获得整个分区的东西:
SELECT a.first_name,
a.age,
a.gender,
count(b.age)+1 as rank
FROM person a left join person b on a.age>b.age and a.gender=b.gender
group by a.first_name,
a.age,
a.gender
用例
CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));
INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Jack', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');
回答:
Bill 32 M 4
Bob 25 M 2
Jack 30 M 3
Nick 22 M 1
Steve 36 M 5
Anne 25 F 3
Jane 20 F 2
Kathy 18 F 1
Rahul Agarwal answered 2019-05-10T17:07:32Z
23 votes
丹尼尔版本的调整,以计算百分位数和排名。 另外两个具有相同标记的人将获得相同的排名。
set @totalStudents = 0;
select count(*) into @totalStudents from marksheets;
SELECT id, score, @curRank := IF(@prevVal=score, @curRank, @studentNumber) AS rank,
@percentile := IF(@prevVal=score, @percentile, (@totalStudents - @studentNumber + 1)/(@totalStudents)*100),
@studentNumber := @studentNumber + 1 as studentNumber,
@prevVal:=score
FROM marksheets, (
SELECT @curRank :=0, @prevVal:=null, @studentNumber:=1, @percentile:=100
) r
ORDER BY score DESC
查询样本数据的结果 -
+----+-------+------+---------------+---------------+-----------------+
| id | score | rank | percentile | studentNumber | @prevVal:=score |
+----+-------+------+---------------+---------------+-----------------+
| 10 | 98 | 1 | 100.000000000 | 2 | 98 |
| 5 | 95 | 2 | 90.000000000 | 3 | 95 |
| 6 | 91 | 3 | 80.000000000 | 4 | 91 |
| 2 | 91 | 3 | 80.000000000 | 5 | 91 |
| 8 | 90 | 5 | 60.000000000 | 6 | 90 |
| 1 | 90 | 5 | 60.000000000 | 7 | 90 |
| 9 | 84 | 7 | 40.000000000 | 8 | 84 |
| 3 | 83 | 8 | 30.000000000 | 9 | 83 |
| 4 | 72 | 9 | 20.000000000 | 10 | 72 |
| 7 | 60 | 10 | 10.000000000 | 11 | 60 |
+----+-------+------+---------------+---------------+-----------------+
Mukesh Soni answered 2019-05-10T17:08:04Z
15 votes
丹尼尔和萨尔曼的答案相结合。 然而,等级不会给出存在关系的连续序列。 相反,它会将排名跳到下一个。 所以最大值总是达到行数。
SELECT first_name,
age,
gender,
IF(age=@_last_age,@curRank:=@curRank,@curRank:=@_sequence) AS rank,
@_sequence:=@_sequence+1,@_last_age:=age
FROM person p, (SELECT @curRank := 1, @_sequence:=1, @_last_age:=0) r
ORDER BY age;
架构和测试用例:
CREATE TABLE person (id int, first_name varchar(20), age int, gender char(1));
INSERT INTO person VALUES (1, 'Bob', 25, 'M');
INSERT INTO person VALUES (2, 'Jane', 20, 'F');
INSERT INTO person VALUES (3, 'Jack', 30, 'M');
INSERT INTO person VALUES (4, 'Bill', 32, 'M');
INSERT INTO person VALUES (5, 'Nick', 22, 'M');
INSERT INTO person VALUES (6, 'Kathy', 18, 'F');
INSERT INTO person VALUES (7, 'Steve', 36, 'M');
INSERT INTO person VALUES (8, 'Anne', 25, 'F');
INSERT INTO person VALUES (9, 'Kamal', 25, 'M');
INSERT INTO person VALUES (10, 'Saman', 32, 'M');
输出:
+------------+------+--------+------+--------------------------+-----------------+
| first_name | age | gender | rank | @_sequence:=@_sequence+1 | @_last_age:=age |
+------------+------+--------+------+--------------------------+-----------------+
| Kathy | 18 | F | 1 | 2 | 18 |
| Jane | 20 | F | 2 | 3 | 20 |
| Nick | 22 | M | 3 | 4 | 22 |
| Kamal | 25 | M | 4 | 5 | 25 |
| Anne | 25 | F | 4 | 6 | 25 |
| Bob | 25 | M | 4 | 7 | 25 |
| Jack | 30 | M | 7 | 8 | 30 |
| Bill | 32 | M | 8 | 9 | 32 |
| Saman | 32 | M | 8 | 10 | 32 |
| Steve | 36 | M | 10 | 11 | 36 |
+------------+------+--------+------+--------------------------+-----------------+
erandac answered 2019-05-10T17:08:40Z
5 votes
@Sam,你的观点非常出色,但我认为你误解了MySQL文档在引用的页面上说的内容 - 或者我误解了:-) - 我只想添加这个,以便如果有人对@感到不舒服 丹尼尔回答他们会更放心,或者至少深入挖掘一下。
您在SELECT中看到“@curRank:= @curRank + 1 AS等级”不是“一个语句”,它是语句的一个“原子”部分,因此它应该是安全的。
您引用的文档继续显示在语句的2(原子)部分中相同的用户定义变量的示例,例如,“SELECT @curRank,@curRank:= @curRank + 1 AS rank”。
有人可能会说@curRank在@ Daniel的回答中使用了两次:(1)“@curRank:= @curRank + 1 AS等级”和(2)“(SELECT @curRank:= 0)r”但是从第二次开始 用法是FROM子句的一部分,我很确定它可以保证首先被评估; 基本上使它成为第二个和前面的声明。
事实上,在您引用的同一个MySQL文档页面上,您将在评论中看到相同的解决方案 - 它可能是@Daniel从中得到的; 是的,我知道这是评论,但它是官方文档页面上的评论,并确实带来了一些重量。
David Husnian answered 2019-05-10T17:09:37Z
4 votes
确定给定值的等级的最直接的解决方案是计算它之前的值的数量。 假设我们有以下值:
10 20 30 30 30 40
所有RANK() OVER (PARTITION BY Gender ORDER BY Age)值均被视为第3个
所有RANK() OVER (PARTITION BY Gender ORDER BY Age)值均被视为第6(排名)或第4(dense_rank)
现在回到原来的问题。 以下是一些示例数据,按OP中的描述排序(预期的排名在右侧添加):
+------+-----------+------+--------+ +------+------------+
| id | firstname | age | gender | | rank | dense_rank |
+------+-----------+------+--------+ +------+------------+
| 11 | Emily | 20 | F | | 1 | 1 |
| 3 | Grace | 25 | F | | 2 | 2 |
| 20 | Jill | 25 | F | | 2 | 2 |
| 10 | Megan | 26 | F | | 4 | 3 |
| 8 | Lucy | 27 | F | | 5 | 4 |
| 6 | Sarah | 30 | F | | 6 | 5 |
| 9 | Zoe | 30 | F | | 6 | 5 |
| 14 | Kate | 35 | F | | 8 | 6 |
| 4 | Harry | 20 | M | | 1 | 1 |
| 12 | Peter | 20 | M | | 1 | 1 |
| 13 | John | 21 | M | | 3 | 2 |
| 16 | Cole | 25 | M | | 4 | 3 |
| 17 | Dennis | 27 | M | | 5 | 4 |
| 5 | Scott | 30 | M | | 6 | 5 |
| 7 | Tony | 30 | M | | 6 | 5 |
| 2 | Matt | 31 | M | | 8 | 6 |
| 15 | James | 32 | M | | 9 | 7 |
| 1 | Adams | 33 | M | | 10 | 8 |
| 18 | Smith | 35 | M | | 11 | 9 |
| 19 | Zack | 35 | M | | 11 | 9 |
+------+-----------+------+--------+ +------+------------+
要为Sarah计算RANK() OVER (PARTITION BY Gender ORDER BY Age),您可以使用此查询:
SELECT COUNT(id) + 1 AS rank, COUNT(DISTINCT age) + 1 AS dense_rank
FROM testdata
WHERE gender = (SELECT gender FROM testdata WHERE id = 6)
AND age < (SELECT age FROM testdata WHERE id = 6)
+------+------------+
| rank | dense_rank |
+------+------------+
| 6 | 5 |
+------+------------+
要为所有行计算RANK() OVER (PARTITION BY Gender ORDER BY Age),您可以使用此查询:
SELECT testdata.id, COUNT(lesser.id) + 1 AS rank, COUNT(DISTINCT lesser.age) + 1 AS dense_rank
FROM testdata
LEFT JOIN testdata AS lesser ON lesser.age < testdata.age AND lesser.gender = testdata.gender
GROUP BY testdata.id
这是结果(连接值在右侧添加):
+------+------+------------+ +-----------+-----+--------+
| id | rank | dense_rank | | firstname | age | gender |
+------+------+------------+ +-----------+-----+--------+
| 11 | 1 | 1 | | Emily | 20 | F |
| 3 | 2 | 2 | | Grace | 25 | F |
| 20 | 2 | 2 | | Jill | 25 | F |
| 10 | 4 | 3 | | Megan | 26 | F |
| 8 | 5 | 4 | | Lucy | 27 | F |
| 6 | 6 | 5 | | Sarah | 30 | F |
| 9 | 6 | 5 | | Zoe | 30 | F |
| 14 | 8 | 6 | | Kate | 35 | F |
| 4 | 1 | 1 | | Harry | 20 | M |
| 12 | 1 | 1 | | Peter | 20 | M |
| 13 | 3 | 2 | | John | 21 | M |
| 16 | 4 | 3 | | Cole | 25 | M |
| 17 | 5 | 4 | | Dennis | 27 | M |
| 5 | 6 | 5 | | Scott | 30 | M |
| 7 | 6 | 5 | | Tony | 30 | M |
| 2 | 8 | 6 | | Matt | 31 | M |
| 15 | 9 | 7 | | James | 32 | M |
| 1 | 10 | 8 | | Adams | 33 | M |
| 18 | 11 | 9 | | Smith | 35 | M |
| 19 | 11 | 9 | | Zack | 35 | M |
+------+------+------------+ +-----------+-----+--------+
Salman A answered 2019-05-10T17:10:49Z
3 votes
如果您只想对一个人进行排名,您可以执行以下操作:
SELECT COUNT(Age) + 1
FROM PERSON
WHERE(Age < age_to_rank)
这个排名对应于oracle RANK函数(如果你有相同年龄的人,他们获得相同的排名,之后的排名是非连续的)。
它比在子查询中使用上述解决方案之一并从中进行选择以获得一个人的排名要快一点。
这可以用来对每个人进行排名,但它比上述解决方案慢。
SELECT
Age AS age_var,
(
SELECT COUNT(Age) + 1
FROM Person
WHERE (Age < age_var)
) AS rank
FROM Person
Sam Kidman answered 2019-05-10T17:11:35Z
3 votes
从MySQL 8开始,您最终也可以在MySQL中使用窗口函数:[https://dev.mysql.com/doc/refman/8.0/en/window-functions.html]
您的查询可以用完全相同的方式编写:
SELECT RANK() OVER (PARTITION BY Gender ORDER BY Age) AS `Partition by Gender`,
FirstName,
Age,
Gender
FROM Person
Lukas Eder answered 2019-05-10T17:12:08Z
0 votes
为了避免Erandac的答案结合Daniel和Salman的答案中的“然而”,可以使用以下“分区解决方法”之一
SELECT customerID, myDate
-- partition ranking works only with CTE / from MySQL 8.0 on
, RANK() OVER (PARTITION BY customerID ORDER BY dateFrom) AS rank,
-- Erandac's method in combination of Daniel's and Salman's
-- count all items in sequence, maximum reaches row count.
, IF(customerID=@_lastRank, @_curRank:=@_curRank, @_curRank:=@_sequence+1) AS sequenceRank
, @_sequence:=@_sequence+1 as sequenceOverAll
-- Dense partition ranking, works also with MySQL 5.7
-- remember to set offset values in from clause
, IF(customerID=@_lastRank, @_nxtRank:=@_nxtRank, @_nxtRank:=@_nxtRank+1 ) AS partitionRank
, IF(customerID=@_lastRank, @_overPart:=@_overPart+1, @_overPart:=1 ) AS partitionSequence
, @_lastRank:=customerID
FROM myCustomers,
(SELECT @_curRank:=0, @_sequence:=0, @_lastRank:=0, @_nxtRank:=0, @_overPart:=0 ) r
ORDER BY customerID, myDate
此代码段中第3个变体中的分区排名将返回连续排名数字。 这将导致类似于rank() over partition by结果的数据结构。 例如,见下文。 特别是,对于每个新的partitionRank,partitionSequence将始终以1开头,使用此方法:
customerID myDate sequenceRank (Erandac)
| sequenceOverAll
| | partitionRank
| | | partitionSequence
| | | | lastRank
... lines ommitted for clarity
40 09.11.2016 11:19 1 44 1 44 40
40 09.12.2016 12:08 1 45 1 45 40
40 09.12.2016 12:08 1 46 1 46 40
40 09.12.2016 12:11 1 47 1 47 40
40 09.12.2016 12:12 1 48 1 48 40
40 13.10.2017 16:31 1 49 1 49 40
40 15.10.2017 11:00 1 50 1 50 40
76 01.07.2015 00:24 51 51 2 1 76
77 04.08.2014 13:35 52 52 3 1 77
79 15.04.2015 20:25 53 53 4 1 79
79 24.04.2018 11:44 53 54 4 2 79
79 08.10.2018 17:37 53 55 4 3 79
117 09.07.2014 18:21 56 56 5 1 117
119 26.06.2014 13:55 57 57 6 1 119
119 02.03.2015 10:23 57 58 6 2 119
119 12.10.2015 10:16 57 59 6 3 119
119 08.04.2016 09:32 57 60 6 4 119
119 05.10.2016 12:41 57 61 6 5 119
119 05.10.2016 12:42 57 62 6 6 119
...
Max answered 2019-05-10T17:12:41Z