感谢datawhale开展的每月组队学习,教材ref: 第六章:综合练习
练习一: 各部门工资最高的员工(难度:中等)
创建Employee 表,包含所有员工信息,每个员工有其对应的 Id, salary 和 department Id。
+----+-------+--------+--------------+
| Id | Name | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1 | Joe | 70000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
+----+-------+--------+--------------+
创建table没什么特别的,就按照MySQL规定写把。
复习一下创建table的知识点:
CHAR
- 定长字符串、VARCHAR
- 可变长度字符串PRIMARY KEY
主键约束、NOT NULL
非空约束- 删除表
DROP TABLE < 表名 > ;
- 添加列
ALTER TABLE < 表名 > ADD COLUMN < 列的定义 >;
- 删除列
ALTER TABLE < 表名 > DROP COLUMN < 列名 >;
- 清空表内容
TRUNCATE TABLE TABLE_NAME;
- 插入数据
INSERT INTO <表名> (列1, 列2, 列3, ……) VALUES (值1, 值2, 值3, ……);
- 查看当前所有的数据库
SHOW DATABASES;
- 打开指定的库
USE <database_name>;
- 查看当前库的所有表
SHOW TABLES FROM <database_name>;
- 查看表结构
DESC <table_name>;
查看服务器版本
- 方式一:登录到mysql服务端
SELECT version();
- 方式二:未登录到mysql服务端
mysql --version
CREATE TABLE employee( employee_id CHAR(4) NOT NULL, emplyee_name VARCHAR(100) NOT NULL, salary INTEGER NOT NULL, department_id CHAR(4) NOT NULL, PRIMARY KEY (employee_id) ); INSERT INTO employee VALUES ('0001', 'Joe', 70000, '0001'); INSERT INTO employee VALUES ('0002', 'Henry', 80000, '0002'); INSERT INTO employee VALUES ('0003', 'Sam', 60000, '0002'); INSERT INTO employee VALUES ('0004', 'Max', 90000, '0001'); SELECT * FROM employee;
创建Department 表,包含公司所有部门的信息。
+----+----------+
| Id | Name |
+----+----------+
| 1 | IT |
| 2 | Sales |
+----+----------+
CREATE TABLE department( department_id CHAR(4) NOT NULL, department_name VARCHAR(100) NOT NULL, PRIMARY KEY (department_id) ); INSERT INTO department VALUES ('0001', 'IT'); INSERT INTO department VALUES ('0002', 'Sales'); SELECT * FROM department;
编写一个 SQL 查询,找出每个部门工资最高的员工。例如,根据上述给定的表格,Max 在 IT 部门有最高工资,Henry 在 Sales 部门有最高工资。
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| Sales | Henry | 80000 |
+------------+----------+--------+
思路:
- 需要的表:Employee表(获得每个部门最高salary和最高salary员工)& Department表(获得部门名称)
- 因为Employee表里每个都有department_id,我们可以直接用employee表对department_id做GROUP BY,然后使用aggregate函数MAX()找出每个department里最高的salary。–> 表temp
- 因为用GROUP BY department的话,SELECT 子句里只能出现聚合键和agg函数,所以我们在上一步无法同时获得该部门收入最高员工
- 得到每个部门最高薪水之后,我们可以基于temp表、employee表和department表公共列department_id做三表连接
- 使用WHERE过滤出employee里薪水和temp表最高薪水一致的人
SELECT d.department_name, emplyee_name, max_salary FROM employee JOIN department d ON employee.department_id = d.department_id JOIN (SELECT department_id, MAX(salary) as max_salary FROM employee GROUP BY department_id) temp ON temp.department_id = d.department_id WHERE employee.salary = max_salary ORDER BY max_salary DESC;
练习二: 换座位(难度:中等)
小美是一所中学的信息科技老师,她有一张 seat 座位表,平时用来储存学生名字和与他们相对应的座位 id。
其中纵列的id是连续递增的
小美想改变相邻俩学生的座位。
你能不能帮她写一个 SQL query 来输出小美想要的结果呢?
请创建如下所示seat表:
示例:
+---------+---------+
| id | student |
+---------+---------+
| 1 | Abbot |
| 2 | Doris |
| 3 | Emerson |
| 4 | Green |
| 5 | Jeames |
+---------+---------+
假如数据输入的是上表,则输出结果如下:
+---------+---------+
| id | student |
+---------+---------+
| 1 | Doris |
| 2 | Abbot |
| 3 | Green |
| 4 | Emerson |
| 5 | Jeames |
+---------+---------+
注意:
如果学生人数是奇数,则不需要改变最后一个同学的座位。
( SELECT s1.seat_id, CASE WHEN s2.student_name IS NULL THEN s1.student_name ELSE s2.student_name END AS "student" FROM seat s1 LEFT JOIN seat s2 ON s1.seat_id = s2.seat_id-1 WHERE s1.seat_id % 2 = 1 ) UNION ( SELECT s1.seat_id, s2.student_name FROM seat s1 LEFT JOIN seat s2 ON s1.seat_id = s2.seat_id+1 WHERE s1.seat_id%2=0 ) ORDER BY seat_id;
练习三: 分数排名(难度:中等)
编写一个 SQL 查询来实现分数排名。如果两个分数相同,则两个分数排名(Rank)相同。请注意,平分后的下一个名次应该是下一个连续的整数值。换句话说,名次之间不应该有“间隔”。
创建以下score表:
+----+-------+
| Id | Score |
+----+-------+
| 1 | 3.50 |
| 2 | 3.65 |
| 3 | 4.00 |
| 4 | 3.85 |
| 5 | 4.00 |
| 6 | 3.65 |
+----+-------+
例如,根据上述给定的 Scores 表,你的查询应该返回(按分数从高到低排列):
+-------+------+
| Score | Rank |
+-------+------+
| 4.00 | 1 |
| 4.00 | 1 |
| 3.85 | 2 |
| 3.65 | 3 |
| 3.65 | 3 |
| 3.50 | 4 |
+-------+------+
SELECT FORMAT(score,2), DENSE_RANK() OVER(ORDER BY score DESC) AS "Rank" FROM scores;
练习四:连续出现的数字(难度:中等)
编写一个 SQL 查询,查找所有至少连续出现三次的数字。
+----+-----+
| Id | Num |
+----+-----+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 2 |
| 7 | 2 |
+----+-----+
例如,给定上面的 Logs 表, 1 是唯一连续出现至少三次的数字。
+-----------------+
| ConsecutiveNums |
+-----------------+
| 1 |
+-----------------+
SELECT DISTINCT l1.num as ConsecutiveNums FROM logs l1 LEFT JOIN logs l2 ON l1.id = l2.id-1 LEFT JOIN logs l3 ON l2.id = l3.id-1 WHERE l1.num=l2.num AND l2.num=l3.num
练习五:树节点 (难度:中等)
对于tree表,id是树节点的标识,p_id是其父节点的id。
+----+------+
| id | p_id |
+----+------+
| 1 | null |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
+----+------+
每个节点都是以下三种类型中的一种:
- Root: 如果节点是根节点。
- Leaf: 如果节点是叶子节点。
- Inner: 如果节点既不是根节点也不是叶子节点。
写一条查询语句打印节点id及对应的节点类型。按照节点id排序。上面例子的对应结果为:
+----+------+
| id | Type |
+----+------+
| 1 | Root |
| 2 | Inner|
| 3 | Leaf |
| 4 | Leaf |
| 5 | Leaf |
+----+------+
SELECT DISTINCT t1.id AS "id", CASE WHEN t1.p_id IS NULL THEN "Root" WHEN t2.id IS NULL THEN "Leaf" ELSE "Inner" END AS "Type" FROM tree t1 LEFT JOIN tree t2 ON t1.id=t2.p_id ORDER BY id;
一开始有点想错了,逻辑上应该是:
练习六:至少有五名直接下属的经理 (难度:中等)
Employee表包含所有员工及其上级的信息。每位员工都有一个Id,并且还有一个对应主管的Id(ManagerId)。
+------+----------+-----------+----------+
|Id |Name |Department |ManagerId |
+------+----------+-----------+----------+
|101 |John |A |null |
|102 |Dan |A |101 |
|103 |James |A |101 |
|104 |Amy |A |101 |
|105 |Anne |A |101 |
|106 |Ron |B |101 |
+------+----------+-----------+----------+
针对Employee表,写一条SQL语句找出有5个下属的主管。对于上面的表,结果应输出:
+-------+
| Name |
+-------+
| John |
+-------+
注意:
没有人向自己汇报。
SELECT name FROM employee JOIN (SELECT managerID, COUNT(DISTINCT id) as cnt FROM employee GROUP BY managerID HAVING cnt > 4) temp ON id = temp.managerID;
练习七: 分数排名 (难度:中等)
练习三的分数表,实现排名功能,但是排名需要是非连续的,如下:
+-------+------+
| Score | Rank |
+-------+------+
| 4.00 | 1 |
| 4.00 | 1 |
| 3.85 | 3 |
| 3.65 | 4 |
| 3.65 | 4 |
| 3.50 | 6 |
+-------+------
SELECT FORMAT(score,2), RANK() OVER(ORDER BY score DESC) AS "Rank" FROM scores;
练习八:查询回答率最高的问题 (难度:中等)
求出survey_log
表中回答率最高的问题,表格的字段有:uid
, action
, question_id
, answer_id
, q_num
, timestamp。
uid
是用户idaction
的值为:“show”, “answer”, “skip”;当action是"answer"时,answer_id不为空,相反,当action是"show"和"skip"时为空(null);q_num
是问题的数字序号。
写一条sql语句找出回答率最高的问题。
举例:
输入
uid action question_id answer_id q_num timestamp
5 show 285 null 1 123
5 answer 285 124124 1 124
5 show 369 null 2 125
5 skip 369 null 2 126
输出
survey_log
285
说明
问题285的回答率为1/1,然而问题369的回答率是0/1,所以输出是285。
注意: 最高回答率的意思是:同一个问题出现的次数中回答的比例。
练习九:各部门前3高工资的员工(难度:中等)
将项目7中的employee表清空,重新插入以下数据(其实是多插入5,6两行):
+----+-------+--------+--------------+
| Id | Name | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1 | Joe | 70000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
| 5 | Janet | 69000 | 1 |
| 6 | Randy | 85000 | 1 |
+----+-------+--------+--------------+
编写一个 SQL 查询,找出每个部门工资前三高的员工。例如,根据上述给定的表格,查询结果应返回:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Randy | 85000 |
| IT | Joe | 70000 |
| Sales | Henry | 80000 |
| Sales | Sam | 60000 |
+------------+----------+--------+
此外,请考虑实现各部门前N高工资的员工功能。
SELECT department_name, emplyee_name, salary FROM department JOIN( SELECT emplyee_name, department_id, salary, DENSE_RANK() over (PARTITION BY department_id ORDER BY salary DESC) as "rank" FROM employee) salary_rank ON department.department_id=salary_rank.department_id WHERE salary_rank.rank<4 ;
练习十:平面上最近距离 (难度: 困难)
point_2d表包含一个平面内一些点(超过两个)的坐标值(x,y)。
写一条查询语句求出这些点中的最短距离并保留2位小数。
|x | y |
|----|----|
| -1 | -1 |
| 0 | 0 |
| -1 | -2 |
最短距离是1,从点(-1,-1)到点(-1,-2)。所以输出结果为:
+--------+
|shortest|
+--------+
|1.00 |
+--------+
注意: 所有点的最大距离小于10000。
SELECT MIN(ROUND(SQRT((POW(p1.x-p2.x, 2)+POW(p1.y-p2.y, 2))), 2)) AS shortest FROM point_2d as p1, point_2d as p2 WHERE p1.x > p2.x OR p1.y > p2.y;
练习十一:行程和用户(难度:困难)
Trips
表中存所有出租车的行程信息。每段行程有唯一键 Id
,Client_Id
和 Driver_Id
是 Users
表中 Users_Id
的外键。Status
是枚举类型,枚举成员为 (‘completed’
, ‘cancelled_by_driver’
, ‘cancelled_by_client’
)。
Id Client_Id Driver_Id City_Id Status Request_at
1 1 10 1 completed 2013-10-1
2 2 11 1 cancelled_by_driver 2013-10-1
3 3 12 6 completed 2013-10-1
4 4 13 6 cancelled_by_client 2013-10-1
5 1 10 1 completed 2013-10-2
6 2 11 6 completed 2013-10-2
7 3 12 6 completed 2013-10-2
8 2 12 12 completed 2013-10-3
9 3 10 12 completed 2013-10-3
10 4 13 12 cancelled_by_driver 2013-10-3
Users 表存所有用户。每个用户有唯一键 Users_Id。Banned 表示这个用户是否被禁止,Role 则是一个表示(‘client’, ‘driver’, ‘partner’)的枚举类型。
+----------+--------+--------+
| Users_Id | Banned | Role |
+----------+--------+--------+
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
+----------+--------+--------+
写一段 SQL 语句查出2013年10月1日至2013年10月3日期间非禁止用户的取消率。基于上表,你的 SQL 语句应返回如下结果,取消率(Cancellation Rate)保留两位小数。
+------------+-------------------+
| Day | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
+------------+-------------------+
with cte as( SELECT t.* FROM Trips t JOIN Users u1 ON t.client_id=u1.users_id JOIN Users u2 ON t.driver_id=u2.users_id WHERE u1.banned='No' AND u2.banned='No' ) SELECT request_at as "Day", round(sum(IF(Status='completed',0,1))/count(status),2) as "Cancellation Rate" FROM cte WHERE request_at BETWEEN '2013-10-01' AND '2013-10-03' GROUP BY request_at ORDER BY request_at