LeetCode-SQL（一）

最新推荐文章于 2024-05-31 19:53:24 发布

Fresh_anyu

最新推荐文章于 2024-05-31 19:53:24 发布

阅读量889

点赞数 1

分类专栏： SQL 文章标签： sql leetcode 数据库 HQL mysql

本文链接：https://blog.csdn.net/Sdhrs_nn/article/details/120708736

版权

SQL 专栏收录该内容

8 篇文章 5 订阅

订阅专栏

以下题目都来自力扣（LeetCode）

1、175.组合两个表

-- 表1：Person
+-------------+---------+
| 列名         | 类型     |
+-------------+---------+
| PersonId    | int     |
| FirstName   | varchar |
| LastName    | varchar |
+-------------+---------+
PersonId 是上表主键
-- 表2：Address
+-------------+---------+
| 列名         | 类型    |
+-------------+---------+
| AddressId   | int     |
| PersonId    | int     |
| City        | varchar |
| State       | varchar |
+-------------+---------+
AddressId 是上表主键

要求：编写一个 SQL 查询，满足条件：无论 person 是否有地址信息，都需要基于上述两表提供 person 的以下信息：

FirstName, LastName, City, State

分析：因为表 Address 中的 personId 是表 Person 的外关键字，所以我们可以连接这两个表来获取一个人的地址信息。考虑到可能不是每个人都有地址信息，我们应该使用 outer join 而不是默认的 inner join

-- 方法：使用outer join
select FirstName, LastName, City, State
from Person left join Address
on Person.PersonId = Address.PersonId
;

2、176.第二高薪水

编写一个 SQL 查询，获取 Employee 表中第二高的薪水（Salary）。

+----+--------+
| Id | Salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+

例如上述 Employee 表，SQL查询应该返回 200 作为第二高的薪水。如果不存在第二高的薪水，那么查询应返回 null。

方法一：使用子查询和LIMIT子句

-- 将不同的薪资按降序排序，然后使用limit子句获得第二高的薪资
select(select  distinct Salary as SecondHighestSalary from Employee order by Salary desc limit 1 offset 1) AS SecondHighestSalary
;

方法二：使用 ifnull和 limit

select ifnull((select  distinct Salary as SecondHighestSalary from Employee order by Salary desc limit 1 offset 1),null) as SecondHighestSalray
;

方法三：取小于最大值的最大值即第二大

-- 先查询最大值
select max(Salary) from Employee 
-- 再查询小于最大值的最大值
select max(Salary) from Employee where Salray<(select max(Salary) from Employee);

扩展：limit

limit 2,3：意思为跳过2条数据，取3条数据

limit 3 offset 1：意思为跳过1条，取3条数据

3、177.第N高的薪水

编写一个 SQL 查询，获取 Employee 表中第 n 高的薪水（Salary）

+----+--------+
| Id | Salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+

例如上述 Employee 表，n = 2 时，应返回第二高的薪水 200。如果不存在第 n 高的薪水，那么查询应返回 null。

说明：相同的值为同一排名

-- 分组去重，排序，使用limit n,1，n为排名-1
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
    SET N := N-1;
  RETURN (
      select salary from employee group by salary order by Salary desc limit N,1
  );
END

4、178.分数排名

编写一个 SQL 查询来实现分数排名。

如果两个分数相同，则两个分数排名（Rank）相同。请注意，平分后的下一个名次应该是下一个连续的整数值。换句话说，名次之间不应该有“间隔”。

+----+-------+
| Id | Score |
+----+-------+
| 1  | 3.50  |
| 2  | 3.65  |
| 3  | 4.00  |
| 4  | 3.85  |
| 5  | 4.00  |
| 6  | 3.65  |
+----+-------+

例如，根据上述给定的 Scores 表，你的查询应该返回（按分数从高到低排列）：

+-------+------+
| Score | Rank |
+-------+------+
| 4.00  | 1    |
| 4.00  | 1    |
| 3.85  | 2    |
| 3.65  | 3    |
| 3.65  | 3    |
| 3.50  | 4    |
+-------+------+

**重要提示：**对于 MySQL 解决方案，如果要转义用作列名的保留字，可以在关键字之前和之后使用撇号。例如 `Rank`

select Score,dense_rank() over(order by Score desc) `Rank` from Scores;

扩展：排序函数

Rank（）：排序相同时会重复，总数不会变

score		Rank()
84			1
84			1
76			3
68			4

Dense_Rank（）：排序相同时会重复，总数会减小

score		Dense_Rank()
84			1
84			1
76			2
68			3

Row_Number（）：会根据顺序计算

score		Dense_Rank()
84			1
84			2
76			3
68			4

5、180.连续出现的数字

-- 表logs
+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| num         | varchar |
+-------------+---------+
id 是这个表的主键。

编写一个 SQL 查询，查找所有至少连续出现三次的数字。

返回的结果表中的数据可以按 任意顺序 排列。

查询结果格式如下面的例子所示：

Logs 表：
+----+-----+
| Id | Num |
+----+-----+
| 1  | 1   |
| 2  | 1   |
| 3  | 1   |
| 4  | 2   |
| 5  | 1   |
| 6  | 2   |
| 7  | 2   |
+----+-----+

Result 表：
+-----------------+
| ConsecutiveNums |
+-----------------+
| 1               |
+-----------------+
1 是唯一连续出现至少三次的数字。

解法一：

-- 使用 Logs 并检查是否有 3 个连续的相同数字
select * from
Logs l1,Logs l2,Logs l3 
where l1.id=l2.id-1 and l2.id=l3.id-1 and l1.num = l2.num and l2.num = l3.num;


Id	Num	Id	Num	Id	Num
1	1	2	1	3	1

-- 然后我们从上表中选择任意的 Num 获得想要的答案。同时我们需要添加关键字 DISTINCT ，因为如果一个数字连续出现超过 3 次，会返回重复元素。

select distinct l1.num as ConsecutiveNums from
Logs l1,Logs l2,Logs l3 
where l1.id=l2.id-1 and l2.id=l3.id-1 and l1.num = l2.num and l2.num = l3.num;

这种解法没有通用性，且效率不高

解法二：需要有主键id

-- 先用开窗函数进行分组排序（mysql版本8支持这些函数）,再开窗排序
select *,row_number() over(partition by num order by id) as `rank`
,row_number() over(order by id) as `id2` from log
+----+-----+----+-------+
| Id | Num |rank|id2	|
+----+-----+----+------+|
| 1  | 1   | 1	| 1	  	|
| 2  | 1   | 2	| 2		|
| 3  | 1   | 3	| 3		|
| 5  | 1   | 4	| 4		|
| 4  | 2   | 1	| 5		|
| 6  | 2   | 2	| 6		|
| 7  | 2   | 3	| 7		|
+----+-----+----+-------+

-- 分组，分组条件为id - id2 
select distinct t1.num
from(select *,row_number() over(partition by num order by id) as `rank`
,row_number() over(order by id) as `id2` from log) t1
group by (t1.id-t1.id2),t1.num
having count(*)>=3

6、181.超过经理收入的员工

Employee 表包含所有员工，他们的经理也属于员工。每个员工都有一个 Id，此外还有一列对应员工的经理的 Id。

+----+-------+--------+-----------+
| Id | Name  | Salary | ManagerId |
+----+-------+--------+-----------+
| 1  | Joe   | 70000  | 3         |
| 2  | Henry | 80000  | 4         |
| 3  | Sam   | 60000  | NULL      |
| 4  | Max   | 90000  | NULL      |
+----+-------+--------+-----------+

给定 Employee 表，编写一个 SQL 查询，该查询可以获取收入超过他们经理的员工的姓名。在上面的表格中，Joe 是唯一一个收入超过他的经理的员工。

+----------+
| Employee |
+----------+
| Joe      |
+----------+

-- 自连接加条件
select t1.Name as Employee from Employee t1 join Employee t2 on t1.ManagerId=t2.Id where t1.Salary>t2.Salary
；

7、182.查找重复的电子邮箱

编写一个 SQL 查询，查找 Person 表中所有重复的电子邮箱。

+----+---------+
| Id | Email   |
+----+---------+
| 1  | a@b.com |
| 2  | c@d.com |
| 3  | a@b.com |
+----+---------+

根据以上输入，你的查询应返回以下结果：

+---------+
| Email   |
+---------+
| a@b.com |
+---------+

**说明：**所有电子邮箱都是小写字母。

-- 直接根据邮箱分组，加条件数大于1的就是重复的
select Email from Person group by Email having count(Email)>1;

8、183.从不订购的客户

某网站包含两个表，Customers 表和 Orders 表。编写一个 SQL 查询，找出所有从不订购任何东西的客户。

-- Customers
+----+-------+
| Id | Name  |
+----+-------+
| 1  | Joe   |
| 2  | Henry |
| 3  | Sam   |
| 4  | Max   |
+----+-------+

-- Orders
+----+------------+
| Id | CustomerId |
+----+------------+
| 1  | 3          |
| 2  | 1          |
+----+------------+

例如给定上述表格，你的查询应返回：

+-----------+
| Customers |
+-----------+
| Henry     |
| Max       |
+-----------+

-- 使用子查询作为条件
select Name as Customers from `Customers` where `Id` not in
(select CustomerId from Orders);
-- 使用左连接加条件
select c.Name Customers from `Customers` c left join Orders o on c.id=o.CustomerId where o.CustomerId is null;

9、184.部门工资最高的员工

-- Employee 表包含所有员工信息，每个员工有其对应的 Id, salary 和 department Id
+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Jim   | 90000  | 1            |
| 3  | Henry | 80000  | 2            |
| 4  | Sam   | 60000  | 2            |
| 5  | Max   | 90000  | 1            |
+----+-------+--------+--------------+
-- Department 表包含公司所有部门的信息
+----+----------+
| Id | Name     |
+----+----------+
| 1  | IT       |
| 2  | Sales    |
+----+----------+


-- 编写一个 SQL 查询，找出每个部门工资最高的员工。对于上述表，您的 SQL 查询应返回以下行（行的顺序无关紧要）。

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| IT         | Jim      | 90000  |
| Sales      | Henry    | 80000  |
+------------+----------+--------+

-- 解释：

Max 和 Jim 在 IT 部门的工资都是最高的，Henry 在销售部的工资最高。

-- 方法一：使用多行子查询
-- 第一步，先根据分组求出最大工资
select DepartmentId,max(Salary) from Employee
-- 连接两个表，使用join，因为有的表中可能为空,where条件2个字段用in
select  
	Department.Name as Department,
	Employee.Name as Employee,
	Salary
from
	Employee
		join
	Department
		on
	Employee.DepartmentId=Department.Id
where
	(Employee.DepartmentId,Salary)
		in
	(select DepartmentId,max(Salary) from Employee)
；

-- 方法二：先开窗排序，然后连接另外一张表在家条件查询
-- 先开窗排序
select 
	Name,
	Salary,
	DepartmentId,
	dense_rank() over(partition by DepartmentId order by Salary desc) as rk 
from 
	Employee;
	
-- join连接表格,并加条件查询，条件为排名为1的即为最大值
select 
	B.Name Department,
	A.Name Employee,
	A.Salary Salary 
from (
    select 
    	Name,
    	Salary,
    	DepartmentId,
    	dense_rank() over(partition by DepartmentId order by Salary desc) as rk 
    from 
    	Employee
	) A 
join  
	Department B 
on 
	A.DepartmentId=B.Id
where
	A.rk=1
;

扩展：多行子查询

当是多行子查询时，主查询语句的条件语句中引用子查询结果时必须用多行比较符号（IN，ANY，ALL 等）来进行比较

10、185.部门工资前三高的所有员工

-- `Employee` 表包含所有员工信息，每个员工有其对应的工号 `Id`，姓名 `Name`，工资 `Salary` 和部门编号 `DepartmentId` 。
+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 85000  | 1            |
| 2  | Henry | 80000  | 2            |
| 3  | Sam   | 60000  | 2            |
| 4  | Max   | 90000  | 1            |
| 5  | Janet | 69000  | 1            |
| 6  | Randy | 85000  | 1            |
| 7  | Will  | 70000  | 1            |
+----+-------+--------+--------------+
-- Department 表包含公司所有部门的信息
+----+----------+
| Id | Name     |
+----+----------+
| 1  | IT       |
| 2  | Sales    |
+----+----------+
-- 编写一个 SQL 查询，找出每个部门获得前三高工资的所有员工。例如，根据上述给定的表，查询结果应返回：
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| IT         | Randy    | 85000  |
| IT         | Joe      | 85000  |
| IT         | Will     | 70000  |
| Sales      | Henry    | 80000  |
| Sales      | Sam      | 60000  |
+------------+----------+--------+

-- 解释：

IT 部门中，Max 获得了最高的工资，Randy 和 Joe 都拿到了第二高的工资，Will 的工资排第三。销售部门（Sales）只有两名员工，Henry 的工资最高，Sam 的工资排第二。

-- 解法一：先开窗，在联合加条件
-- 开窗排序
select Name,Salary,DepartmentId，dense_rank() over(partition by DepartmentId order by Salary desc) from Employee

-- 联合查询加条件
select 
B.Name Department,
A.Name Employee,
A.Salary Salary
from (select Name,Salary,DepartmentId,dense_rank() over(partition by DepartmentId order by Salary desc) as rk from Employee) as A
join Department as B on A.DepartmentId=B.Id
where A.rk<=3


-- 解法二：先联合，再开窗，最后再查询
-- 先内连接
select Employee.Name,Employee.Salary,Employee.DepartmentId,Department.Name
from Employee
join Department
on Employee.DepartmentId=Department.Id

--  开窗排序
select A.eName eName,A.Salary Salary,A.DepartmentId DepartmentId,A.Name Name,
dense_rank() over(partition by A.DepartmentId order by A.Salary desc) as rk
from(
select Employee.Name eName,Employee.Salary Salary,Employee.DepartmentId DepartmentId,Department.Name dName
from Employee
join Department
on Employee.DepartmentId=Department.Id
) as A

-- 条件查询
select
B.dName Department,
B.eName Employee,
B.Salary Salary
from 
(select A.eName eName,A.Salary Salary,A.DepartmentId DepartmentId,A.dName dName,
dense_rank() over(partition by A.DepartmentId order by A.Salary desc) as rk
from(
select Employee.Name eName,Employee.Salary Salary,Employee.DepartmentId DepartmentId,Department.Name dName
from Employee
join Department
on Employee.DepartmentId=Department.Id
) as A) as B where B.rk<=3

11、196.删除重复的电子邮箱

编写一个 SQL 查询，来删除 Person 表中所有重复的电子邮箱，重复的邮箱里只保留 Id 最小的那个。

+----+------------------+
| Id | Email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
| 3  | john@example.com |
+----+------------------+
Id 是这个表的主键。

例如，在运行你的查询语句之后，上面的 Person 表应返回以下几行:

+----+------------------+
| Id | Email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
+----+------------------+

-- 自连接条件为 
select p1.*
from Person p1,Person p2 where p1.Email=p2.Email and p1.Id>p2.Id
;

-- 删除操作
delete p1 from Person p1,Person p2 where p1.Email=p2.Email and p1.Id>p2.Id

12、197.上升的温度

-- Weather
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| recordDate    | date    |
| temperature   | int     |
+---------------+---------+
id 是这个表的主键
该表包含特定日期的温度信息

编写一个 SQL 查询，来查找与之前（昨天的）日期相比温度更高的所有日期的 id 。

返回结果 不要求顺序 。

查询结果格式如下例：

Weather
+----+------------+-------------+
| id | recordDate | Temperature |
+----+------------+-------------+
| 1  | 2015-01-01 | 10          |
| 2  | 2015-01-02 | 25          |
| 3  | 2015-01-03 | 20          |
| 4  | 2015-01-04 | 30          |
+----+------------+-------------+

Result table:
+----+
| id |
+----+
| 2  |
| 4  |
+----+
2015-01-02 的温度比前一天高（10 -> 25）
2015-01-04 的温度比前一天高（20 -> 30）

-- 直接自连接，条件为前一天温度大于第二天温度
select 
	w1.id id 
from
	Weather w1,Weather w2 
where 
	date_add(w1.recordDate,INTERVAL -1 DAY)=w2.recordDate 
and
	w1.Temperature>w2.Temperature
；

扩展：MySQL中日期函数

-- date_add
date_add(datetime,interval expr type)
参数类型：year、month、day、hour、minute、second、
举例：
select date_add('2021-09-24',interval -1 day)  === 2021-09-23  -- 可以是负数
select date_add('2021-09-24',interval '1_1' year_month) == 2022-10-24 -- 需要单引号

13、262.行程和用户

-- 表：Trips
+-------------+----------+
| Column Name | Type     |
+-------------+----------+
| Id          | int      |
| Client_Id   | int      |
| Driver_Id   | int      |
| City_Id     | int      |
| Status      | enum     |
| Request_at  | date     |     
+-------------+----------+
Id 是这张表的主键。
这张表中存所有出租车的行程信息。每段行程有唯一 Id ，其中 Client_Id 和 Driver_Id 是 Users 表中 Users_Id 的外键。
Status 是一个表示行程状态的枚举类型，枚举成员为(‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’) 。

-- 表：Users
+-------------+----------+
| Column Name | Type     |
+-------------+----------+
| Users_Id    | int      |
| Banned      | enum     |
| Role        | enum     |
+-------------+----------+
Users_Id 是这张表的主键。
这张表中存所有用户，每个用户都有一个唯一的 Users_Id ，Role 是一个表示用户身份的枚举类型，枚举成员为 (‘client’, ‘driver’, ‘partner’) 。
Banned 是一个表示用户是否被禁止的枚举类型，枚举成员为 (‘Yes’, ‘No’) 。

写一段 SQL 语句查出 “2013-10-01” 至 “2013-10-03” 期间非禁止用户（乘客和司机都必须未被禁止）的取消率。非禁止用户即 Banned 为 No 的用户，禁止用户即 Banned 为 Yes 的用户。

取消率的计算方式如下：(被司机或乘客取消的非禁止用户生成的订单数量) / (非禁止用户生成的订单总数)。

返回结果表中的数据可以按任意顺序组织。其中取消率 Cancellation Rate 需要四舍五入保留两位小数。

查询结果格式如下例所示：

Trips 表：
+----+-----------+-----------+---------+---------------------+------------+
| Id | Client_Id | Driver_Id | City_Id | Status              | Request_at |
+----+-----------+-----------+---------+---------------------+------------+
| 1  | 1         | 10        | 1       | completed           | 2013-10-01 |
| 2  | 2         | 11        | 1       | cancelled_by_driver | 2013-10-01 |
| 3  | 3         | 12        | 6       | completed           | 2013-10-01 |
| 4  | 4         | 13        | 6       | cancelled_by_client | 2013-10-01 |
| 5  | 1         | 10        | 1       | completed           | 2013-10-02 |
| 6  | 2         | 11        | 6       | completed           | 2013-10-02 |
| 7  | 3         | 12        | 6       | completed           | 2013-10-02 |
| 8  | 2         | 12        | 12      | completed           | 2013-10-03 |
| 9  | 3         | 10        | 12      | completed           | 2013-10-03 |
| 10 | 4         | 13        | 12      | cancelled_by_driver | 2013-10-03 |
+----+-----------+-----------+---------+---------------------+------------+

Users 表：
+----------+--------+--------+
| Users_Id | Banned | Role   |
+----------+--------+--------+
| 1        | No     | client |
| 2        | Yes    | client |
| 3        | No     | client |
| 4        | No     | client |
| 10       | No     | driver |
| 11       | No     | driver |
| 12       | No     | driver |
| 13       | No     | driver |
+----------+--------+--------+

Result 表：
+------------+-------------------+
| Day        | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33              |
| 2013-10-02 | 0.00              |
| 2013-10-03 | 0.50              |
+------------+-------------------+

2013-10-01：
  - 共有 4 条请求，其中 2 条取消。
  - 然而，Id=2 的请求是由禁止用户（User_Id=2）发出的，所以计算时应当忽略它。
  - 因此，总共有 3 条非禁止请求参与计算，其中 1 条取消。
  - 取消率为 (1 / 3) = 0.33
2013-10-02：
  - 共有 3 条请求，其中 0 条取消。
  - 然而，Id=6 的请求是由禁止用户发出的，所以计算时应当忽略它。
  - 因此，总共有 2 条非禁止请求参与计算，其中 0 条取消。
  - 取消率为 (0 / 2) = 0.00
2013-10-03：
  - 共有 3 条请求，其中 1 条取消。
  - 然而，Id=8 的请求是由禁止用户发出的，所以计算时应当忽略它。
  - 因此，总共有 2 条非禁止请求参与计算，其中 1 条取消。
  - 取消率为 (1 / 2) = 0.50

-- 第一步，先找到User表中被禁止的id
select
	Users_id 
from 
	Users
where 
	Banned='yes'
;
-- 第二步，从Trips中排除掉被禁止的ID
-- 注意：因为不确定被禁止的id为client_Id还是Driver_Id
select * 
from Trips t
where  
	Client_Id 
not in 
	(select Users_id from Users where Banned='yes') 
and
	Driver_Id
not in 
	(select Users_id from Users where Banned='yes')
；
-- 第三部，根据提供的计算方法，计算取消率
select
	Request_at Day,
	round((sum(if(Status='cancelled_by_driver' or Status='cancelled_by_driver',1,0)) / count(Status)),2) as `Cancellation Rate `
from Trips t
where  
	Client_Id 
not in 
	(select Users_id from Users where Banned='yes') 
and
	Driver_Id
not in 
	(select Users_id from Users where Banned='yes')
and
	Request_at
between '2013-10-01' and '2013-10-03'
group by 
	Request_at
;

14、511.游戏玩法分析I

-- 活动表 Activity：
+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+
表的主键是 (player_id, event_date)。
这张表展示了一些游戏玩家在游戏平台上的行为活动。
每行数据记录了一名玩家在退出平台之前，当天使用同一台设备登录平台后打开的游戏的数目（可能是 0 个）。

写一条 SQL 查询语句获取每位玩家 第一次登陆平台的日期。

查询结果的格式如下所示：

Activity 表：
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result 表：
+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1         | 2016-03-01  |
| 2         | 2017-06-25  |
| 3         | 2016-03-02  |
+-----------+-------------+

-- 使用group by 分组，然后min求最小值
select 
	player_id,
	min(event_date) first_login
from 
	Activity 
group by 
	player_id
；

15、512.游戏玩法分析II

-- Table: Activity
+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+
(player_id, event_date) 是这个表的两个主键
这个表显示的是某些游戏玩家的游戏活动情况
每一行是在某天使用某个设备登出之前登录并玩多个游戏（可能为0）的玩家的记录

请编写一个 SQL 查询，描述每一个玩家首次登陆的设备名称

查询结果格式在以下示例中：

Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:
+-----------+-----------+
| player_id | device_id |
+-----------+-----------+
| 1         | 2         |
| 2         | 3         |
| 3         | 1         |
+-----------+-----------+

-- 在511的基础上联合加条件查询一下就行了
select 
	A.player_id player_id,
	A.device_id device_id
from
     Activity A,
    (select 
        player_id,
        min(event_date) first_login
    from 
        Activity 
    group by 
        player_id) B
where 
    A.player_id=B.player_id
and 
    B.first_login=A.event_date
;

16、513.游戏玩法分析III

-- Table: Activity
+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+
（player_id，event_date）是此表的主键。
这张表显示了某些游戏的玩家的活动情况。
每一行是一个玩家的记录，他在某一天使用某个设备注销之前登录并玩了很多游戏（可能是 0 ）。

编写一个 SQL 查询，同时报告每组玩家和日期，以及玩家到目前为止玩了多少游戏。也就是说，在此日期之前玩家所玩的游戏总数。详细情况请查看示例。

查询结果格式如下所示：

Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-05-02 | 6            |
| 1         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:
+-----------+------------+---------------------+
| player_id | event_date | games_played_so_far |
+-----------+------------+---------------------+
| 1         | 2016-03-01 | 5                   |
| 1         | 2016-05-02 | 11                  |
| 1         | 2017-06-25 | 12                  |
| 3         | 2016-03-02 | 0                   |
| 3         | 2018-07-03 | 5                   |
+-----------+------------+---------------------+
对于 ID 为 1 的玩家，2016-05-02 共玩了 5+6=11 个游戏，2017-06-25 共玩了 5+6+1=12 个游戏。
对于 ID 为 3 的玩家，2018-07-03 共玩了 0+5=5 个游戏。
请注意，对于每个玩家，我们只关心玩家的登录日期。

解法一：开窗

-- 求和开窗
select
	player_id,
	event_date,
	sum(games_played) over(partition by player_id order by event_date rows unbounded predecing and current row) as games_played_so_far
from Activity

解法二：内连接加条件

-- 内连接加条件
1	2	2016-03-01	5	1	2	2016-03-01	5
1	2	2016-03-01	5	1	2	2016-05-02	6
1	2	2016-05-02	6	1	2	2016-05-02	6
1	2	2016-03-01	5	1	3	2017-06-25	1
1	2	2016-05-02	6	1	3	2017-06-25	1
1	3	2017-06-25	1	1	3	2017-06-25	1
3	1	2016-03-02	0	3	1	2016-03-02	0
3	1	2016-03-02	0	3	4	2018-07-03	5
3	4	2018-07-03	5	3	4	2018-07-03	5

-- 通过b的id和日期进行分组，然后求和a的数量就是我们要求的数量
-- 内连接加条件
select	
	b.player_id player_id,
	b.event_date event_date,
	sum(a.games_played) as games_played_so_far
from 
	Activity a
inner join
	Activity b
on 
	a.player_id=b.player_id
and 
	a.event_date<=b.event_date
group by
	b.player_id,b.event_date
;

1	2016-03-01	5
1	2016-05-02	11
1	2017-06-25	12
3	2016-03-02	0
3	2018-07-03	5

扩展：over()具体用法

over()：指定分析函数工作的数据窗口大小，这个数据窗口大小可能会随着行的改变而改变

current row：当前行

n preceding：往前n行数据

n following：往后n行数据

unbounded：起点

unbounded preceding ：表示从前面的起点

unbounded following：表示到后面的终点

LAG(col,n,default_val)：往前第n行数据

LEAD(col,n, default_val)：往后第n行数据

NTILE(n)：把有序窗口的行分发到指定数据的组中，各个组有编号，编号从1开始，对于每一行，NTILE返回此行所属的组的编号。注意：n必须为int类型。

-- 用法
select name,orderdate,cost, 
sum(cost) over() as sample1,--所有行相加 
sum(cost) over(partition by name) as sample2,--按name分组，组内数据相加 
sum(cost) over(partition by name order by orderdate) as sample3,--按name分组，组内数据累加 
sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--和sample3一样,由起点到当前行的聚合 
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --当前行和前面一行做聚合 
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--当前行和前边一行及后面一行 
sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --当前行及后面所有行 
from business
;

17、550.游戏玩法分析IV

-- Table: Activity
+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+
（player_id，event_date）是此表的主键。
这张表显示了某些游戏的玩家的活动情况。
每一行是一个玩家的记录，他在某一天使用某个设备注销之前登录并玩了很多游戏（可能是 0）。

编写一个 SQL 查询，报告在首次登录的第二天再次登录的玩家的比率，四舍五入到小数点后两位。换句话说，您需要计算从首次登录日期开始至少连续两天登录的玩家的数量，然后除以玩家总数。

查询结果格式如下所示：

Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-03-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:
+-----------+
| fraction  |
+-----------+
| 0.33      |
+-----------+
只有 ID 为 1 的玩家在第一天登录后才重新登录，所以答案是 1/3 = 0.33

-- 1、先查询出每个玩家第一天登录的id和日期，这个日期做一下处理，推后一天
select 
	player_id,
	date_add(min(event_date),interval 1 day) event_date
from 
	Activity 
group by 
	player_id
;	--t1
-- 2、根据1的结果从Activity表中找到相关的数据（这里可以使用多列子查询，或者连接加条件）
-- 2.1、多列子查询
select 
	player_id,
	event_date
from 
	Activity
where
	(player_id,event_date) 
in 	(t1)
;
-- 2.2、连接加条件
select
	A.player_id player_id,
	A.event_date event_date
from 
	Activity A
join 
	(t1) B
on 
	A.player_id=B.player_id
;
-- 3、做计算用到一个子查询,分母的统计总人数为（select count(distinct player_id) from Activity)，分子为步骤2得到的player_id的数量为：count(player_id)，最后再在外层加一个四舍五入的函数round
select 
	round(count(player_id)/(select count(distinct player_id) from Activity),2) as fraction
from 
	Activity
where
	(player_id,event_date) 
in 	(
	select 
		player_id,
		date_add(min(event_date),interval 1 day) event_date
	from 
		Activity 
	group by 
		player_id
)
;

18、569.员工薪水中位数

-- Employee 表包含所有员工。Employee 表有三列：员工Id，公司名和薪水。
+-----+------------+--------+
|Id   | Company    | Salary |
+-----+------------+--------+
|1    | A          | 2341   |
|2    | A          | 341    |
|3    | A          | 15     |
|4    | A          | 15314  |
|5    | A          | 451    |
|6    | A          | 513    |
|7    | B          | 15     |
|8    | B          | 13     |
|9    | B          | 1154   |
|10   | B          | 1345   |
|11   | B          | 1221   |
|12   | B          | 234    |
|13   | C          | 2345   |
|14   | C          | 2645   |
|15   | C          | 2645   |
|16   | C          | 2652   |
|17   | C          | 65     |
+-----+------------+--------+

请编写SQL查询来查找每个公司的薪水中位数。挑战点：你是否可以在不使用任何内置的SQL函数的情况下解决此问题。

+-----+------------+--------+
|Id   | Company    | Salary |
+-----+------------+--------+
|5    | A          | 451    |
|6    | A          | 513    |
|12   | B          | 234    |
|9    | B          | 1154   |
|14   | C          | 2645   |
+-----+------------+--------+

-- 排序开窗，数量开窗
select Id,Company,Salary,row_number() over(partition by Company order by Salary) rk,count(Id) over(partition by Company) ct
from Employee
-- 上一步基础上加条件
select Id,Company,Salary
from
(
select Id,Company,Salary,row_number() over(partition by Company order by Salary) rk,count(Id) over(partition by Company) ct
from Employee
) t
where t.rk in (floor((t.ct+1)/2),floor((t.ct+2)/2))

19、至少有5名直接下属的经理

Employee 表包含所有员工和他们的经理。每个员工都有一个 Id，并且还有一列是经理的 Id。

+------+----------+-----------+----------+
|Id    |Name 	  |Department |ManagerId |
+------+----------+-----------+----------+
|101   |John 	  |A 	      |null      |
|102   |Dan 	  |A 	      |101       |
|103   |James 	  |A 	      |101       |
|104   |Amy 	  |A 	      |101       |
|105   |Anne 	  |A 	      |101       |
|106   |Ron 	  |B 	      |101       |
+------+----------+-----------+----------+

给定 Employee 表，请编写一个SQL查询来查找至少有5名直接下属的经理。对于上表，您的SQL查询应该返回：

+-------+
| Name  |
+-------+
| John  |
+-------+

-- 对ManagerId进行分组找出大于等于5的ManagerId
select ManagerId from Employee group by ManagerId having count(ManagerId)>=5;
-- 然后根据求出的作为条件进行查询
select
	Name
from
	Employee
where
	Id
in (
	select ManagerId from Employee group by ManagerId having count(ManagerId)>=5
)
;

20、571.给定数字的频率查询中位数

-- Numbers 表保存数字的值及其频率。
+----------+-------------+
|  Number  |  Frequency  |
+----------+-------------|
|  0       |  7          |
|  1       |  1          |
|  2       |  3          |
|  3       |  1          |
+----------+-------------+
在此表中，数字为 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3，所以中位数是 (0 + 0) / 2 = 0。
+--------+
| median |
+--------|
| 0.0000 |
+--------+

请编写一个查询来查找所有数字的中位数并将结果命名为 median 。

解答：

--  
select Number,
	sum(Frequency) over(order by number) as asc_accumu,
	sum(Frequency) over(order by number) as desc_accumu
	from Numbers
-- 数字的总数
select sum(Frequency) total from Numbers
-- 
select avg(Number) median
from 
	(
    select Number,
	sum(Frequency) over(order by number) as asc_accumu,
	sum(Frequency) over(order by number desc) as desc_accumu
	from Numbers
    ) t1,
    (select sum(Frequency) total from Numbers) t2
where asc_accumu>=total/2 and desc_accumu>=total/2