以下题目都来自力扣(LeetCode)
1、175.组合两个表
-- 表1:Person
+-------------+---------+
| 列名 | 类型 |
+-------------+---------+
| PersonId | int |
| FirstName | varchar |
| LastName | varchar |
+-------------+---------+
PersonId 是上表主键
-- 表2:Address
+-------------+---------+
| 列名 | 类型 |
+-------------+---------+
| AddressId | int |
| PersonId | int |
| City | varchar |
| State | varchar |
+-------------+---------+
AddressId 是上表主键
要求:编写一个 SQL 查询,满足条件:无论 person 是否有地址信息,都需要基于上述两表提供 person 的以下信息:
FirstName, LastName, City, State
分析:因为表 Address 中的 personId 是表 Person 的外关键字,所以我们可以连接这两个表来获取一个人的地址信息。考虑到可能不是每个人都有地址信息,我们应该使用 outer join 而不是默认的 inner join
-- 方法:使用outer join
select FirstName, LastName, City, State
from Person left join Address
on Person.PersonId = Address.PersonId
;
2、176.第二高薪水
编写一个 SQL 查询,获取 Employee
表中第二高的薪水(Salary) 。
+----+--------+
| Id | Salary |
+----+--------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
+----+--------+
例如上述 Employee
表,SQL查询应该返回 200
作为第二高的薪水。如果不存在第二高的薪水,那么查询应返回 null
。
方法一:使用子查询和LIMIT子句
-- 将不同的薪资按降序排序,然后使用limit子句获得第二高的薪资
select(select distinct Salary as SecondHighestSalary from Employee order by Salary desc limit 1 offset 1) AS SecondHighestSalary
;
方法二:使用 ifnull
和 limit
select ifnull((select distinct Salary as SecondHighestSalary from Employee order by Salary desc limit 1 offset 1),null) as SecondHighestSalray
;
方法三:取小于最大值的最大值即第二大
-- 先查询最大值
select max(Salary) from Employee
-- 再查询小于最大值的最大值
select max(Salary) from Employee where Salray<(select max(Salary) from Employee);
扩展:limit
limit 2,3:意思为跳过2条数据,取3条数据
limit 3 offset 1:意思为跳过1条,取3条数据
3、177.第N高的薪水
编写一个 SQL 查询,获取 Employee
表中第 n 高的薪水(Salary)
+----+--------+
| Id | Salary |
+----+--------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
+----+--------+
例如上述 Employee
表,n = 2 时,应返回第二高的薪水 200
。如果不存在第 n 高的薪水,那么查询应返回 null
。
说明:相同的值为同一排名
-- 分组去重,排序,使用limit n,1,n为排名-1
CREATE FUNCTION getNthHighestSalary(N INT) RETURNS INT
BEGIN
SET N := N-1;
RETURN (
select salary from employee group by salary order by Salary desc limit N,1
);
END
4、178.分数排名
编写一个 SQL 查询来实现分数排名。
如果两个分数相同,则两个分数排名(Rank)相同。请注意,平分后的下一个名次应该是下一个连续的整数值。换句话说,名次之间不应该有“间隔”。
+----+-------+
| Id | Score |
+----+-------+
| 1 | 3.50 |
| 2 | 3.65 |
| 3 | 4.00 |
| 4 | 3.85 |
| 5 | 4.00 |
| 6 | 3.65 |
+----+-------+
例如,根据上述给定的 Scores
表,你的查询应该返回(按分数从高到低排列):
+-------+------+
| Score | Rank |
+-------+------+
| 4.00 | 1 |
| 4.00 | 1 |
| 3.85 | 2 |
| 3.65 | 3 |
| 3.65 | 3 |
| 3.50 | 4 |
+-------+------+
**重要提示:**对于 MySQL 解决方案,如果要转义用作列名的保留字,可以在关键字之前和之后使用撇号。例如 `Rank`
select Score,dense_rank() over(order by Score desc) `Rank` from Scores;
扩展:排序函数
Rank():排序相同时会重复,总数不会变
score Rank()
84 1
84 1
76 3
68 4
Dense_Rank():排序相同时会重复,总数会减小
score Dense_Rank()
84 1
84 1
76 2
68 3
Row_Number():会根据顺序计算
score Dense_Rank()
84 1
84 2
76 3
68 4
5、180.连续出现的数字
-- 表logs
+-------------+---------+
| Column Name | Type |
+-------------+---------+
| id | int |
| num | varchar |
+-------------+---------+
id 是这个表的主键。
编写一个 SQL 查询,查找所有至少连续出现三次的数字。
返回的结果表中的数据可以按 任意顺序 排列。
查询结果格式如下面的例子所示:
Logs 表:
+----+-----+
| Id | Num |
+----+-----+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 2 |
| 7 | 2 |
+----+-----+
Result 表:
+-----------------+
| ConsecutiveNums |
+-----------------+
| 1 |
+-----------------+
1 是唯一连续出现至少三次的数字。
解法一:
-- 使用 Logs 并检查是否有 3 个连续的相同数字
select * from
Logs l1,Logs l2,Logs l3
where l1.id=l2.id-1 and l2.id=l3.id-1 and l1.num = l2.num and l2.num = l3.num;
Id Num Id Num Id Num
1 1 2 1 3 1
-- 然后我们从上表中选择任意的 Num 获得想要的答案。同时我们需要添加关键字 DISTINCT ,因为如果一个数字连续出现超过 3 次,会返回重复元素。
select distinct l1.num as ConsecutiveNums from
Logs l1,Logs l2,Logs l3
where l1.id=l2.id-1 and l2.id=l3.id-1 and l1.num = l2.num and l2.num = l3.num;
这种解法没有通用性,且效率不高
解法二:需要有主键id
-- 先用开窗函数进行分组排序(mysql版本8支持这些函数),再开窗排序
select *,row_number() over(partition by num order by id) as `rank`
,row_number() over(order by id) as `id2` from log
+----+-----+----+-------+
| Id | Num |rank|id2 |
+----+-----+----+------+|
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 2 |
| 3 | 1 | 3 | 3 |
| 5 | 1 | 4 | 4 |
| 4 | 2 | 1 | 5 |
| 6 | 2 | 2 | 6 |
| 7 | 2 | 3 | 7 |
+----+-----+----+-------+
-- 分组,分组条件为id - id2
select distinct t1.num
from(select *,row_number() over(partition by num order by id) as `rank`
,row_number() over(order by id) as `id2` from log) t1
group by (t1.id-t1.id2),t1.num
having count(*)>=3
6、181.超过经理收入的员工
Employee
表包含所有员工,他们的经理也属于员工。每个员工都有一个 Id,此外还有一列对应员工的经理的 Id。
+----+-------+--------+-----------+
| Id | Name | Salary | ManagerId |
+----+-------+--------+-----------+
| 1 | Joe | 70000 | 3 |
| 2 | Henry | 80000 | 4 |
| 3 | Sam | 60000 | NULL |
| 4 | Max | 90000 | NULL |
+----+-------+--------+-----------+
给定 Employee
表,编写一个 SQL 查询,该查询可以获取收入超过他们经理的员工的姓名。在上面的表格中,Joe 是唯一一个收入超过他的经理的员工。
+----------+
| Employee |
+----------+
| Joe |
+----------+
-- 自连接加条件
select t1.Name as Employee from Employee t1 join Employee t2 on t1.ManagerId=t2.Id where t1.Salary>t2.Salary
;
7、182.查找重复的电子邮箱
编写一个 SQL 查询,查找 Person
表中所有重复的电子邮箱。
+----+---------+
| Id | Email |
+----+---------+
| 1 | a@b.com |
| 2 | c@d.com |
| 3 | a@b.com |
+----+---------+
根据以上输入,你的查询应返回以下结果:
+---------+
| Email |
+---------+
| a@b.com |
+---------+
**说明:**所有电子邮箱都是小写字母。
-- 直接根据邮箱分组,加条件数大于1的就是重复的
select Email from Person group by Email having count(Email)>1;
8、183.从不订购的客户
某网站包含两个表,Customers
表和 Orders
表。编写一个 SQL 查询,找出所有从不订购任何东西的客户。
-- Customers
+----+-------+
| Id | Name |
+----+-------+
| 1 | Joe |
| 2 | Henry |
| 3 | Sam |
| 4 | Max |
+----+-------+
-- Orders
+----+------------+
| Id | CustomerId |
+----+------------+
| 1 | 3 |
| 2 | 1 |
+----+------------+
例如给定上述表格,你的查询应返回:
+-----------+
| Customers |
+-----------+
| Henry |
| Max |
+-----------+
-- 使用子查询作为条件
select Name as Customers from `Customers` where `Id` not in
(select CustomerId from Orders);
-- 使用左连接加条件
select c.Name Customers from `Customers` c left join Orders o on c.id=o.CustomerId where o.CustomerId is null;
9、184.部门工资最高的员工
-- Employee 表包含所有员工信息,每个员工有其对应的 Id, salary 和 department Id
+----+-------+--------+--------------+
| Id | Name | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1 | Joe | 70000 | 1 |
| 2 | Jim | 90000 | 1 |
| 3 | Henry | 80000 | 2 |
| 4 | Sam | 60000 | 2 |
| 5 | Max | 90000 | 1 |
+----+-------+--------+--------------+
-- Department 表包含公司所有部门的信息
+----+----------+
| Id | Name |
+----+----------+
| 1 | IT |
| 2 | Sales |
+----+----------+
-- 编写一个 SQL 查询,找出每个部门工资最高的员工。对于上述表,您的 SQL 查询应返回以下行(行的顺序无关紧要)。
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Jim | 90000 |
| Sales | Henry | 80000 |
+------------+----------+--------+
-- 解释:
Max 和 Jim 在 IT 部门的工资都是最高的,Henry 在销售部的工资最高。
-- 方法一:使用多行子查询
-- 第一步,先根据分组求出最大工资
select DepartmentId,max(Salary) from Employee
-- 连接两个表,使用join,因为有的表中可能为空,where条件2个字段用in
select
Department.Name as Department,
Employee.Name as Employee,
Salary
from
Employee
join
Department
on
Employee.DepartmentId=Department.Id
where
(Employee.DepartmentId,Salary)
in
(select DepartmentId,max(Salary) from Employee)
;
-- 方法二:先开窗排序,然后连接另外一张表在家条件查询
-- 先开窗排序
select
Name,
Salary,
DepartmentId,
dense_rank() over(partition by DepartmentId order by Salary desc) as rk
from
Employee;
-- join连接表格,并加条件查询,条件为排名为1的即为最大值
select
B.Name Department,
A.Name Employee,
A.Salary Salary
from (
select
Name,
Salary,
DepartmentId,
dense_rank() over(partition by DepartmentId order by Salary desc) as rk
from
Employee
) A
join
Department B
on
A.DepartmentId=B.Id
where
A.rk=1
;
扩展:多行子查询
当是多行子查询时,主查询语句的条件语句中引用子查询结果时必须用多行比较符号(IN,ANY,ALL 等)来进行比较
10、185.部门工资前三高的所有员工
-- `Employee` 表包含所有员工信息,每个员工有其对应的工号 `Id`,姓名 `Name`,工资 `Salary` 和部门编号 `DepartmentId` 。
+----+-------+--------+--------------+
| Id | Name | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1 | Joe | 85000 | 1 |
| 2 | Henry | 80000 | 2 |
| 3 | Sam | 60000 | 2 |
| 4 | Max | 90000 | 1 |
| 5 | Janet | 69000 | 1 |
| 6 | Randy | 85000 | 1 |
| 7 | Will | 70000 | 1 |
+----+-------+--------+--------------+
-- Department 表包含公司所有部门的信息
+----+----------+
| Id | Name |
+----+----------+
| 1 | IT |
| 2 | Sales |
+----+----------+
-- 编写一个 SQL 查询,找出每个部门获得前三高工资的所有员工。例如,根据上述给定的表,查询结果应返回:
+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT | Max | 90000 |
| IT | Randy | 85000 |
| IT | Joe | 85000 |
| IT | Will | 70000 |
| Sales | Henry | 80000 |
| Sales | Sam | 60000 |
+------------+----------+--------+
-- 解释:
IT 部门中,Max 获得了最高的工资,Randy 和 Joe 都拿到了第二高的工资,Will 的工资排第三。销售部门(Sales)只有两名员工,Henry 的工资最高,Sam 的工资排第二。
-- 解法一:先开窗,在联合加条件
-- 开窗排序
select Name,Salary,DepartmentId,dense_rank() over(partition by DepartmentId order by Salary desc) from Employee
-- 联合查询加条件
select
B.Name Department,
A.Name Employee,
A.Salary Salary
from (select Name,Salary,DepartmentId,dense_rank() over(partition by DepartmentId order by Salary desc) as rk from Employee) as A
join Department as B on A.DepartmentId=B.Id
where A.rk<=3
-- 解法二:先联合,再开窗,最后再查询
-- 先内连接
select Employee.Name,Employee.Salary,Employee.DepartmentId,Department.Name
from Employee
join Department
on Employee.DepartmentId=Department.Id
-- 开窗排序
select A.eName eName,A.Salary Salary,A.DepartmentId DepartmentId,A.Name Name,
dense_rank() over(partition by A.DepartmentId order by A.Salary desc) as rk
from(
select Employee.Name eName,Employee.Salary Salary,Employee.DepartmentId DepartmentId,Department.Name dName
from Employee
join Department
on Employee.DepartmentId=Department.Id
) as A
-- 条件查询
select
B.dName Department,
B.eName Employee,
B.Salary Salary
from
(select A.eName eName,A.Salary Salary,A.DepartmentId DepartmentId,A.dName dName,
dense_rank() over(partition by A.DepartmentId order by A.Salary desc) as rk
from(
select Employee.Name eName,Employee.Salary Salary,Employee.DepartmentId DepartmentId,Department.Name dName
from Employee
join Department
on Employee.DepartmentId=Department.Id
) as A) as B where B.rk<=3
11、196.删除重复的电子邮箱
编写一个 SQL 查询,来删除 Person
表中所有重复的电子邮箱,重复的邮箱里只保留 Id 最小 的那个。
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
Id 是这个表的主键。
例如,在运行你的查询语句之后,上面的 Person
表应返回以下几行:
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
+----+------------------+
-- 自连接条件为
select p1.*
from Person p1,Person p2 where p1.Email=p2.Email and p1.Id>p2.Id
;
-- 删除操作
delete p1 from Person p1,Person p2 where p1.Email=p2.Email and p1.Id>p2.Id
12、197.上升的温度
-- Weather
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| id | int |
| recordDate | date |
| temperature | int |
+---------------+---------+
id 是这个表的主键
该表包含特定日期的温度信息
编写一个 SQL 查询,来查找与之前(昨天的)日期相比温度更高的所有日期的 id
。
返回结果 不要求顺序 。
查询结果格式如下例:
Weather
+----+------------+-------------+
| id | recordDate | Temperature |
+----+------------+-------------+
| 1 | 2015-01-01 | 10 |
| 2 | 2015-01-02 | 25 |
| 3 | 2015-01-03 | 20 |
| 4 | 2015-01-04 | 30 |
+----+------------+-------------+
Result table:
+----+
| id |
+----+
| 2 |
| 4 |
+----+
2015-01-02 的温度比前一天高(10 -> 25)
2015-01-04 的温度比前一天高(20 -> 30)
-- 直接自连接,条件为前一天温度大于第二天温度
select
w1.id id
from
Weather w1,Weather w2
where
date_add(w1.recordDate,INTERVAL -1 DAY)=w2.recordDate
and
w1.Temperature>w2.Temperature
;
扩展:MySQL中日期函数
-- date_add
date_add(datetime,interval expr type)
参数类型:year、month、day、hour、minute、second、
举例:
select date_add('2021-09-24',interval -1 day) === 2021-09-23 -- 可以是负数
select date_add('2021-09-24',interval '1_1' year_month) == 2022-10-24 -- 需要单引号
13、262.行程和用户
-- 表:Trips
+-------------+----------+
| Column Name | Type |
+-------------+----------+
| Id | int |
| Client_Id | int |
| Driver_Id | int |
| City_Id | int |
| Status | enum |
| Request_at | date |
+-------------+----------+
Id 是这张表的主键。
这张表中存所有出租车的行程信息。每段行程有唯一 Id ,其中 Client_Id 和 Driver_Id 是 Users 表中 Users_Id 的外键。
Status 是一个表示行程状态的枚举类型,枚举成员为(‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’) 。
-- 表:Users
+-------------+----------+
| Column Name | Type |
+-------------+----------+
| Users_Id | int |
| Banned | enum |
| Role | enum |
+-------------+----------+
Users_Id 是这张表的主键。
这张表中存所有用户,每个用户都有一个唯一的 Users_Id ,Role 是一个表示用户身份的枚举类型,枚举成员为 (‘client’, ‘driver’, ‘partner’) 。
Banned 是一个表示用户是否被禁止的枚举类型,枚举成员为 (‘Yes’, ‘No’) 。
写一段 SQL 语句查出 “2013-10-01” 至 “2013-10-03” 期间非禁止用户(乘客和司机都必须未被禁止)的取消率。非禁止用户即 Banned 为 No 的用户,禁止用户即 Banned 为 Yes 的用户。
取消率 的计算方式如下:(被司机或乘客取消的非禁止用户生成的订单数量) / (非禁止用户生成的订单总数)。
返回结果表中的数据可以按任意顺序组织。其中取消率 Cancellation Rate 需要四舍五入保留 两位小数 。
查询结果格式如下例所示:
Trips 表:
+----+-----------+-----------+---------+---------------------+------------+
| Id | Client_Id | Driver_Id | City_Id | Status | Request_at |
+----+-----------+-----------+---------+---------------------+------------+
| 1 | 1 | 10 | 1 | completed | 2013-10-01 |
| 2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 |
| 3 | 3 | 12 | 6 | completed | 2013-10-01 |
| 4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 |
| 5 | 1 | 10 | 1 | completed | 2013-10-02 |
| 6 | 2 | 11 | 6 | completed | 2013-10-02 |
| 7 | 3 | 12 | 6 | completed | 2013-10-02 |
| 8 | 2 | 12 | 12 | completed | 2013-10-03 |
| 9 | 3 | 10 | 12 | completed | 2013-10-03 |
| 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 |
+----+-----------+-----------+---------+---------------------+------------+
Users 表:
+----------+--------+--------+
| Users_Id | Banned | Role |
+----------+--------+--------+
| 1 | No | client |
| 2 | Yes | client |
| 3 | No | client |
| 4 | No | client |
| 10 | No | driver |
| 11 | No | driver |
| 12 | No | driver |
| 13 | No | driver |
+----------+--------+--------+
Result 表:
+------------+-------------------+
| Day | Cancellation Rate |
+------------+-------------------+
| 2013-10-01 | 0.33 |
| 2013-10-02 | 0.00 |
| 2013-10-03 | 0.50 |
+------------+-------------------+
2013-10-01:
- 共有 4 条请求,其中 2 条取消。
- 然而,Id=2 的请求是由禁止用户(User_Id=2)发出的,所以计算时应当忽略它。
- 因此,总共有 3 条非禁止请求参与计算,其中 1 条取消。
- 取消率为 (1 / 3) = 0.33
2013-10-02:
- 共有 3 条请求,其中 0 条取消。
- 然而,Id=6 的请求是由禁止用户发出的,所以计算时应当忽略它。
- 因此,总共有 2 条非禁止请求参与计算,其中 0 条取消。
- 取消率为 (0 / 2) = 0.00
2013-10-03:
- 共有 3 条请求,其中 1 条取消。
- 然而,Id=8 的请求是由禁止用户发出的,所以计算时应当忽略它。
- 因此,总共有 2 条非禁止请求参与计算,其中 1 条取消。
- 取消率为 (1 / 2) = 0.50
-- 第一步,先找到User表中被禁止的id
select
Users_id
from
Users
where
Banned='yes'
;
-- 第二步,从Trips中排除掉被禁止的ID
-- 注意:因为不确定被禁止的id为client_Id还是Driver_Id
select *
from Trips t
where
Client_Id
not in
(select Users_id from Users where Banned='yes')
and
Driver_Id
not in
(select Users_id from Users where Banned='yes')
;
-- 第三部,根据提供的计算方法,计算取消率
select
Request_at Day,
round((sum(if(Status='cancelled_by_driver' or Status='cancelled_by_driver',1,0)) / count(Status)),2) as `Cancellation Rate `
from Trips t
where
Client_Id
not in
(select Users_id from Users where Banned='yes')
and
Driver_Id
not in
(select Users_id from Users where Banned='yes')
and
Request_at
between '2013-10-01' and '2013-10-03'
group by
Request_at
;
14、511.游戏玩法分析I
-- 活动表 Activity:
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
表的主键是 (player_id, event_date)。
这张表展示了一些游戏玩家在游戏平台上的行为活动。
每行数据记录了一名玩家在退出平台之前,当天使用同一台设备登录平台后打开的游戏的数目(可能是 0 个)。
写一条 SQL 查询语句获取每位玩家 第一次登陆平台的日期。
查询结果的格式如下所示:
Activity 表:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result 表:
+-----------+-------------+
| player_id | first_login |
+-----------+-------------+
| 1 | 2016-03-01 |
| 2 | 2017-06-25 |
| 3 | 2016-03-02 |
+-----------+-------------+
-- 使用group by 分组,然后min求最小值
select
player_id,
min(event_date) first_login
from
Activity
group by
player_id
;
15、512.游戏玩法分析II
-- Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date) 是这个表的两个主键
这个表显示的是某些游戏玩家的游戏活动情况
每一行是在某天使用某个设备登出之前登录并玩多个游戏(可能为0)的玩家的记录
请编写一个 SQL 查询,描述每一个玩家首次登陆的设备名称
查询结果格式在以下示例中:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result table:
+-----------+-----------+
| player_id | device_id |
+-----------+-----------+
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
+-----------+-----------+
-- 在511的基础上联合加条件查询一下就行了
select
A.player_id player_id,
A.device_id device_id
from
Activity A,
(select
player_id,
min(event_date) first_login
from
Activity
group by
player_id) B
where
A.player_id=B.player_id
and
B.first_login=A.event_date
;
16、513.游戏玩法分析III
-- Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id,event_date)是此表的主键。
这张表显示了某些游戏的玩家的活动情况。
每一行是一个玩家的记录,他在某一天使用某个设备注销之前登录并玩了很多游戏(可能是 0 )。
编写一个 SQL 查询,同时报告每组玩家和日期,以及玩家到目前为止玩了多少游戏。也就是说,在此日期之前玩家所玩的游戏总数。详细情况请查看示例。
查询结果格式如下所示:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-05-02 | 6 |
| 1 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result table:
+-----------+------------+---------------------+
| player_id | event_date | games_played_so_far |
+-----------+------------+---------------------+
| 1 | 2016-03-01 | 5 |
| 1 | 2016-05-02 | 11 |
| 1 | 2017-06-25 | 12 |
| 3 | 2016-03-02 | 0 |
| 3 | 2018-07-03 | 5 |
+-----------+------------+---------------------+
对于 ID 为 1 的玩家,2016-05-02 共玩了 5+6=11 个游戏,2017-06-25 共玩了 5+6+1=12 个游戏。
对于 ID 为 3 的玩家,2018-07-03 共玩了 0+5=5 个游戏。
请注意,对于每个玩家,我们只关心玩家的登录日期。
解法一:开窗
-- 求和开窗
select
player_id,
event_date,
sum(games_played) over(partition by player_id order by event_date rows unbounded predecing and current row) as games_played_so_far
from Activity
解法二:内连接加条件
-- 内连接加条件
1 2 2016-03-01 5 1 2 2016-03-01 5
1 2 2016-03-01 5 1 2 2016-05-02 6
1 2 2016-05-02 6 1 2 2016-05-02 6
1 2 2016-03-01 5 1 3 2017-06-25 1
1 2 2016-05-02 6 1 3 2017-06-25 1
1 3 2017-06-25 1 1 3 2017-06-25 1
3 1 2016-03-02 0 3 1 2016-03-02 0
3 1 2016-03-02 0 3 4 2018-07-03 5
3 4 2018-07-03 5 3 4 2018-07-03 5
-- 通过b的id和日期进行分组,然后求和a的数量就是我们要求的数量
-- 内连接加条件
select
b.player_id player_id,
b.event_date event_date,
sum(a.games_played) as games_played_so_far
from
Activity a
inner join
Activity b
on
a.player_id=b.player_id
and
a.event_date<=b.event_date
group by
b.player_id,b.event_date
;
1 2016-03-01 5
1 2016-05-02 11
1 2017-06-25 12
3 2016-03-02 0
3 2018-07-03 5
扩展:over()具体用法
over()
:指定分析函数工作的数据窗口大小,这个数据窗口大小可能会随着行的改变而改变
current row
:当前行
n preceding
:往前n行数据
n following
:往后n行数据
unbounded
:起点
unbounded preceding
:表示从前面的起点
unbounded following
:表示到后面的终点
LAG(col,n,default_val)
:往前第n行数据
LEAD(col,n, default_val)
:往后第n行数据
NTILE(n)
:把有序窗口的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,NTILE返回 此行所属的组的编号。注意:n必须为int类型。
-- 用法
select name,orderdate,cost,
sum(cost) over() as sample1,--所有行相加
sum(cost) over(partition by name) as sample2,--按name分组,组内数据相加
sum(cost) over(partition by name order by orderdate) as sample3,--按name分组,组内数据累加
sum(cost) over(partition by name order by orderdate rows between UNBOUNDED PRECEDING and current row ) as sample4 ,--和sample3一样,由起点到当前行的聚合
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING and current row) as sample5, --当前行和前面一行做聚合
sum(cost) over(partition by name order by orderdate rows between 1 PRECEDING AND 1 FOLLOWING ) as sample6,--当前行和前边一行及后面一行
sum(cost) over(partition by name order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7 --当前行及后面所有行
from business
;
17、550.游戏玩法分析IV
-- Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id,event_date)是此表的主键。
这张表显示了某些游戏的玩家的活动情况。
每一行是一个玩家的记录,他在某一天使用某个设备注销之前登录并玩了很多游戏(可能是 0)。
编写一个 SQL 查询,报告在首次登录的第二天再次登录的玩家的比率,四舍五入到小数点后两位。换句话说,您需要计算从首次登录日期开始至少连续两天登录的玩家的数量,然后除以玩家总数。
查询结果格式如下所示:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Result table:
+-----------+
| fraction |
+-----------+
| 0.33 |
+-----------+
只有 ID 为 1 的玩家在第一天登录后才重新登录,所以答案是 1/3 = 0.33
-- 1、先查询出每个玩家第一天登录的id和日期,这个日期做一下处理,推后一天
select
player_id,
date_add(min(event_date),interval 1 day) event_date
from
Activity
group by
player_id
; --t1
-- 2、根据1的结果从Activity表中找到相关的数据(这里可以使用多列子查询,或者连接加条件)
-- 2.1、多列子查询
select
player_id,
event_date
from
Activity
where
(player_id,event_date)
in (t1)
;
-- 2.2、连接加条件
select
A.player_id player_id,
A.event_date event_date
from
Activity A
join
(t1) B
on
A.player_id=B.player_id
;
-- 3、做计算用到一个子查询,分母的统计总人数为(select count(distinct player_id) from Activity),分子为步骤2得到的player_id的数量为:count(player_id),最后再在外层加一个四舍五入的函数round
select
round(count(player_id)/(select count(distinct player_id) from Activity),2) as fraction
from
Activity
where
(player_id,event_date)
in (
select
player_id,
date_add(min(event_date),interval 1 day) event_date
from
Activity
group by
player_id
)
;
18、569.员工薪水中位数
-- Employee 表包含所有员工。Employee 表有三列:员工Id,公司名和薪水。
+-----+------------+--------+
|Id | Company | Salary |
+-----+------------+--------+
|1 | A | 2341 |
|2 | A | 341 |
|3 | A | 15 |
|4 | A | 15314 |
|5 | A | 451 |
|6 | A | 513 |
|7 | B | 15 |
|8 | B | 13 |
|9 | B | 1154 |
|10 | B | 1345 |
|11 | B | 1221 |
|12 | B | 234 |
|13 | C | 2345 |
|14 | C | 2645 |
|15 | C | 2645 |
|16 | C | 2652 |
|17 | C | 65 |
+-----+------------+--------+
请编写SQL查询来查找每个公司的薪水中位数。挑战点:你是否可以在不使用任何内置的SQL函数的情况下解决此问题。
+-----+------------+--------+
|Id | Company | Salary |
+-----+------------+--------+
|5 | A | 451 |
|6 | A | 513 |
|12 | B | 234 |
|9 | B | 1154 |
|14 | C | 2645 |
+-----+------------+--------+
-- 排序开窗,数量开窗
select Id,Company,Salary,row_number() over(partition by Company order by Salary) rk,count(Id) over(partition by Company) ct
from Employee
-- 上一步基础上加条件
select Id,Company,Salary
from
(
select Id,Company,Salary,row_number() over(partition by Company order by Salary) rk,count(Id) over(partition by Company) ct
from Employee
) t
where t.rk in (floor((t.ct+1)/2),floor((t.ct+2)/2))
19、至少有5名直接下属的经理
Employee
表包含所有员工和他们的经理。每个员工都有一个 Id,并且还有一列是经理的 Id。
+------+----------+-----------+----------+
|Id |Name |Department |ManagerId |
+------+----------+-----------+----------+
|101 |John |A |null |
|102 |Dan |A |101 |
|103 |James |A |101 |
|104 |Amy |A |101 |
|105 |Anne |A |101 |
|106 |Ron |B |101 |
+------+----------+-----------+----------+
给定 Employee
表,请编写一个SQL查询来查找至少有5名直接下属的经理。对于上表,您的SQL查询应该返回:
+-------+
| Name |
+-------+
| John |
+-------+
-- 对ManagerId进行分组找出大于等于5的ManagerId
select ManagerId from Employee group by ManagerId having count(ManagerId)>=5;
-- 然后根据求出的作为条件进行查询
select
Name
from
Employee
where
Id
in (
select ManagerId from Employee group by ManagerId having count(ManagerId)>=5
)
;
20、571.给定数字的频率查询中位数
-- Numbers 表保存数字的值及其频率。
+----------+-------------+
| Number | Frequency |
+----------+-------------|
| 0 | 7 |
| 1 | 1 |
| 2 | 3 |
| 3 | 1 |
+----------+-------------+
在此表中,数字为 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3,所以中位数是 (0 + 0) / 2 = 0。
+--------+
| median |
+--------|
| 0.0000 |
+--------+
请编写一个查询来查找所有数字的中位数并将结果命名为 median
。
解答:
--
select Number,
sum(Frequency) over(order by number) as asc_accumu,
sum(Frequency) over(order by number) as desc_accumu
from Numbers
-- 数字的总数
select sum(Frequency) total from Numbers
--
select avg(Number) median
from
(
select Number,
sum(Frequency) over(order by number) as asc_accumu,
sum(Frequency) over(order by number desc) as desc_accumu
from Numbers
) t1,
(select sum(Frequency) total from Numbers) t2
where asc_accumu>=total/2 and desc_accumu>=total/2