前言
- 本文主要用于记录TopK问题的一些简答的SQL解法。如果读者有更好的解法,或者觉得博主方法存在缺陷,欢迎在评论区交流。
测试用表
表结构:
CREATE TABLE employees(
`employee_id` int(6) NOT NULL auto_increment, -- '员工编号'
`salary` double(10,2) DEFAULT NULL,-- '月薪'
`department_id` int(6) DEFAULT NULL, -- '部门ID'
PRIMARY KEY(`employee_id`)
);
表数据:
100 24000 90
101 17000 90
102 17000 90
103 9000 60
104 6000 60
...
199 2600 50
200 4400 10
201 13000 20
202 6000 20
203 6500 40
204 10000 70
205 12000 110
206 8300 110
问题:求各个部门的工资前3名的员工信息
方法一:使用连接查询
1)查询每位员工的员工信息,以及同部门中员工工资大于此员工的员工信息,生成表t1。
即对两张原始表a和b进行左连接LEFT JOIN,连接条件为a.department_id=b.department_id AND a.salary < b.salary
。
SELECT
a.employee_id,
a.salary,
a.department_id,
b.employee_id as r_employee_id,
b.salary as r_salary
FROM
employees a
LEFT JOIN
employees b
ON a.department_id = b.department_id AND a.salary < b.salary;
2)基于表t1,过滤出在对应部门工资排名前3的员工记录
即基于表t1,按照department_id和employee_id进行分组,过滤出行组内记录数小于等于2的组
SELECT
department_id,
count(r_employee_id)+1 AS rank,
employee_id,
salary
FROM (
SELECT
a.employee_id,
a.salary,
a.department_id,
b.employee_id as r_employee_id,
b.salary as r_salary
FROM
employees a
LEFT JOIN
employees b
ON a.department_id = b.department_id AND a.salary < b.salary
) AS t1
GROUP BY t1.department_id,t1.employee_id
HAVING count(r_employee_id)<=2
ORDER BY department_id,rank; # 为了便于观察,对结果进行了排序
方法二:使用窗口函数
根据实际情况,选择不同的窗口函数,即选择不同的排名方式:
- row_number() over():按照分组、排序后的行号排名,不重复。例如:值[1,1,3]的排名为[1,2,3]
- rank() over():按照分组、排序后的字段值进行排名,可重复,会跳过前面相同值对应的行号名次。例如:值[1,1,3]的排名为[1,1,3]
- dense_rank() over():按照分组、排序后的字段值进行排名,可重复,不会跳过名次。例如:值[1,1,3]的排名为[1,1,2]
1)查询每位员工的员工信息,并计算其工资水平在部门中的排名,生成表t1
即使用窗口函数rank(),按照部门分组,工资大小计算排名
SELECT
department_id,
employee_id,
salary,
rank() over(partition by department_id order by salary DESC) as rank
FROM
employees;
2)基于表t1,过滤出在对应部门工资排名前3的员工记录
即使用WHERE子句进行过滤
SELECT
*
FROM (
SELECT
department_id,
employee_id,
salary,
rank() over((PARTITION BY department_id ORDER BY salary DESC) as rank
FROM
employees
) AS t1
WHERE t1.rank <= 3;
方法三:WHERE子句执行关联子查询
1)查询每个员工的员工信息,并使用关联子查询查找和当前员工在同部门,且工资大于当前员工的员工ID,子查询的SELECT子句中统计子查询的记录数num,在外部查询的WHERE子句中过滤num小于等于2的记录
SELECT
department_id,
employee_id,
salary
FROM
employees t1
WHERE (
SELECT
count(t2.employee_id)
FROM
employees t2
WHERE
t1.department_id = t2.department_id
AND
t1.salary < t2.salary
) <= 2
ORDER BY department_id,salary DESC; # 为了便于观察,对结果进行了排序