文章目录
- 1757. 可回收且低脂的产品
- 584. 寻找用户推荐人
- 595. 大的国家
- 1148. 文章浏览 I
- 1683. 无效的推文
- 1378. 使用唯一标识码替换员工ID
- 1068. 产品销售分析 I
- 1581. 进店却未进行过交易的顾客
- 197. 上升的温度
- 1661. 每台机器的进程平均运行时间
- 577. 员工奖金
- 1280. 学生们参加各科测试的次数 - 步骤挺多,多看几遍
- 570. 至少有5名直接下属的经理
- 1934. 确认率 - 挺麻烦的,多看几遍
- 620. 有趣的电影
- 1251. 平均售价
- 1075. 项目员工 I
- 1633. 各赛事的用户注册率
- 1211. 查询结果的质量和占比
- 1193. 每月交易 I
- 1174. 即时食物配送 II
- 550. 游戏玩法分析 IV
- 2356. 每位教师所教授的科目种类的数量
- 1141. 查询近30天活跃用户数
- 1084. 销售分析III
- 596. 超过5名学生的课
- 1729. 求关注者的数量
- 619. 只出现一次的最大数字 - 空值部分有空看看
- 1045. 买下所有产品的客户
- 1731. 每位经理的下属员工数量
- 1789. 员工的直属部门
- 610. 判断三角形
- 180. 连续出现的数字
- 1164. 指定日期的产品价格
- 1204. 最后一个能进入巴士的人
- 1907. 按分类统计薪水
- 子查询
1757. 可回收且低脂的产品
我的初始写法:
SELECT product_id FROM Products WHERE low_fats == 'Y' AND recyclable == 'Y'
改成:
SELECT product_id FROM Products WHERE low_fats = 'Y' AND recyclable = 'Y'
和编程代码不一样,判断一致不需要双等号。
584. 寻找用户推荐人
我的初始写法:
SELECT DISTINCT name FROM Customer WHERE referee_id != 2
改成:
SELECT name FROM Customer WHERE referee_id != 2 OR referee_id IS NULL
DISTINCT 要写在选择的字段前面。这道题不需要DISTINCT。
还有一种写法,用到ifnull
:
要注意null是不可以进行大小比较的,所以要把null转化为0。
select name
from customer
where ifnull(referee_id,0) != 2
-- 不等于还可以这么写:<>
ifnull(x1,x2)
意思为:如果x1是null,那么则转换为x2,否则仍旧输出x1
595. 大的国家
一次写对了。
SELECT name, population, area FROM World WHERE area >= 3000000 OR population >= 25000000
1148. 文章浏览 I
我的初始写法:
SELECT DISTINCT author_id as 'id' ORDER BY id FROM Views WHERE author_id = viewer_id
运行报错。
题解:
SELECT
DISTINCT author_id AS id
FROM
Views
WHERE
author_id = viewer_id
ORDER BY
id
总结:
- AS后面的命名不需要单引号或者双引号
- SELECT选择之后再ORDER BY,这里的ORDER BY之后可以用刚刚命名的id字段。
SQL ORDER BY 关键字
数据库:order by排序语句的用法
二刷一次写对:
select distinct author_id id
from views
where author_id = viewer_id
order by id
1683. 无效的推文
我的初始写法:
SELECT
tweet_id
FROM
Tweets
WHERE
len(content) > 15
运行报错。FUNCTION test.len does not exist
题解:
SELECT
tweet_id
FROM
Tweets
WHERE
CHAR_LENGTH(content) > 15
对于SQL表,用于计算字符串中字符数的最佳函数是 CHAR_LENGTH(str)
,它返回字符串 str
的长度。
另一个常用的函数 LENGTH(str)
在这个问题中也适用,因为列 content
只包含英文字符,没有特殊字符。否则,LENGTH()
可能会返回不同的结果,因为该函数返回字符串 str 的字节数,某些字符包含多于 1 个字节。
以字符 ‘¥’ 为例:CHAR_LENGTH()
返回结果为 1,而 LENGTH()
返回结果为 2,因为该字符串包含 2 个字节。
二刷也没写对
1378. 使用唯一标识码替换员工ID
我的写法:(忘了join怎么搞的)
SELECT name, unique_id
FROM Employees, EmployeeUNI
SELECT name, unique_id
FROM Employees
INNER JOIN EmployeeUNI
ON Employees.id = EmployeeUNI.id
正确答案:
SELECT name, unique_id
FROM Employees
LEFT JOIN EmployeeUNI
ON Employees.id = EmployeeUNI.id
题解:
SELECT
EmployeeUNI.unique_id, Employees.name
FROM
Employees
LEFT JOIN
EmployeeUNI
ON
Employees.id = EmployeeUNI.id;
二刷:
# Write your MySQL query statement below
select unique_id, name
from employees
left join employeeUNI
on employees.id = employeeUNI.id
只要使用左外链接就可以了,用where的链接是不行的。
比如下面这种:
SELECT IFNULL(e2.unique_id, NULL) AS 'unique_id',
e1.name AS 'name'
FROM Employees AS e1, EmployeeUNI AS e2
WHERE e1.id = e2.id
左连接 和 左外连接 是同义词
实在想确定null
也可以这样:
SELECT IFNULL(e2.unique_id, NULL) AS 'unique_id',
e1.name AS 'name'
FROM Employees AS e1
LEFT OUTER JOIN EmployeeUNI AS e2
ON e1.id = e2.id
1068. 产品销售分析 I
我的写法:
一次通过
SELECT product_name, year, price
FROM Sales
LEFT JOIN Product
ON Sales.product_id = Product.product_id
1581. 进店却未进行过交易的顾客
我的写法:
运行结果是空的
SELECT customer_id, transaction_id
FROM Visits
LEFT JOIN Transactions
ON Visits.visit_id = Transactions.visit_id
WHERE transaction_id = null
题解:
-
先LEFT JOIN看看结果
SELECT * FROM Visits LEFT JOIN Transactions ON Visits.visit_id = Transactions.visit_id
-
加上过滤条件
WHERE transaction_id IS NULL
或者WHERE amount IS NULL
,而不是=NULL
。SELECT * FROM Visits LEFT JOIN Transactions ON Visits.visit_id = Transactions.visit_id WHERE transaction_id IS NULL
-
发现customer_id为54的顾客出现了两次,使用GROUP BY聚合一下,然后数没付钱的次数。
SELECT customer_id, count(customer_id) AS count_no_trans FROM Visits LEFT JOIN Transactions ON Visits.visit_id = Transactions.visit_id WHERE transaction_id IS NULL GROUP BY customer_id
不要Count(transaction_id),会变成全是0。
197. 上升的温度
我的解法:
用了子查询,没有报错,但输出结果为空。
SELECT *
FROM Weather
WHERE temperature > (
SELECT Temperature
FROM Weather
WHERE recordDate = recordDate - 1
)
题解:
猴子数据分析
“日期”这一列的数据类型是日期类型(date)。
-
交叉联结
使用交叉联结(corss join)会将两个表中所有的数据两两组合。
直接使用交叉联结的业务需求比较少见,往往需要结合具体条件,对数据进行有目的的提取,本题需要结合的条件就是“前一天”。
这个交叉联结的结果表,可以看作左边三列是表a,右边三列是表b。SELECT * FROM Weather AS a CROSS JOIN Weather AS b
这里必须要
AS a/b
,不然会报错不知道用哪个Weather
。 -
另一个需要着重去考虑的,就是如何找到 “昨天”(前一天),这里为大家介绍两个时间计算的函数:
datediff(日期1, 日期2):
得到的结果是日期1与日期2相差的天数。
如果日期1比日期2大,结果为正;如果日期1比日期2小,结果为负。
另一个关于时间计算的函数是:timestampdiff(时间类型, 日期1, 日期2)
这个函数和上面diffdate的正、负号规则刚好相反。
日期1大于日期2,结果为负,日期1小于日期2,结果为正。
在“时间类型”的参数位置,通过添加“day”, “hour”, “second”等关键词,来规定计算天数差、小时数差、还是分钟数差。SELECT a.id FROM Weather AS a CROSS JOIN Weather AS b WHERE datediff(a.recordDate, b.recordDate) = 1 AND a.temperature > b.temperature
二刷写出来了:
select b.id
from weather a
join weather b
where datediff(b.recordDate, a.recordDate) = 1 and b.temperature > a.temperature
方法二:窗口函数lag()+datediff()
lag()
和lead()
这两个函数可以查询我们得到的结果集上下偏移相应行数的相应的结果。
lag()
函数:
查询当前行向上偏移n行对应的结果
该函数有三个参数:第一个为待查询的参数列名,第二个为向上偏移的位数,第三个参数为超出最上面边界的默认值。lead()
函数:
查询当前行向下偏移n行对应的结果
该函数有三个参数:第一个为待查询的参数列名,第二个为向下偏移的位数,第三个参数为超出最下面边界的默认值。
LAG()
是 窗口函数,它只需对数据进行一次扫描和排序,效率较高。对于大数据量的表来说,窗口函数通常比自连接的执行速度更快。
-
首先计算出今天的记录之前一天的记录
select id, recordDate, temperature, lag(recordDate, 1) over(order by recordDate) as last_date, lag(temperature, 1) over(order by recordDate) as last_temperature from Weather
id recordDate temperature last_date last_temperature 1 2015-01-01 10 null null 2 2015-01-02 25 2015-01-01 10 3 2015-01-03 20 2015-01-02 25 4 2015-01-04 30 2015-01-03 20 last_date 是按 日期顺序 排列后获取的上一行日期。
last_temperature 是按 日期顺序 排列后获取的上一行温度。
⚠️注意:一定要last_date和last_temperature都按日期顺序排序,这样获得的才是上一天的温度。 -
筛选出今天比昨天温度高的
还需要再判断一下datediff(recordDate, last_date) = 1
,不然有的测试用例日期不是连续的会报错select id from a where temperature > last_temperature and datediff(recordDate, last_date) = 1
完整代码:
with a as (
select id, recordDate, temperature,
lag(recordDate, 1) over(order by recordDate) as last_date,
lag(temperature, 1) over(order by recordDate) as last_temperature
from Weather
)
select id
from a
where temperature > last_temperature and datediff(recordDate, last_date) = 1
1661. 每台机器的进程平均运行时间
我的初始答案:
SELECT a.machine_id, SUM(b.timestamp - a.timestamp)/COUNT(a.process_id) AS processing_time
FROM Activity AS a
CROSS JOIN Activity AS b
ON a.machine_id = b.machine_id AND a.process_id = b.process_id
WHERE a.activity_type = 'start' AND b.activity_type = 'end'
GROUP BY machine_id
少了四舍五入保留3位小数:
ROUND(x, 3)
最终答案:
SELECT a.machine_id, ROUND(SUM(b.timestamp - a.timestamp)/COUNT(a.process_id), 3) AS processing_time
FROM Activity AS a
CROSS JOIN Activity AS b
ON a.machine_id = b.machine_id AND a.process_id = b.process_id
WHERE a.activity_type = 'start' AND b.activity_type = 'end'
GROUP BY machine_id
看题解可以优化:
使用自连接而不是交叉连接
# Write your MySQL query statement below
select
a1.machine_id,
round(avg(a2.timestamp -a1.timestamp ),3) as processing_time
from Activity as a1 join Activity as a2 on
a1.machine_id=a2.machine_id and
a1.process_id=a2.process_id and
a1.activity_type ='start' and
a2.activity_type ='end'
group by machine_id;
二刷写出来了:
select a.machine_id, round(avg(b.timestamp - a.timestamp), 3) processing_time
from activity a
join activity b
where a.machine_id = b.machine_id and a.process_id = b.process_id and a.activity_type = 'start' and b.activity_type = 'end'
group by a.machine_id
577. 员工奖金
我的答案:
SELECT e.name, b.bonus
FROM Employee AS e
LEFT JOIN Bonus AS b
ON e.empId = b.empId
WHERE b.bonus < 1000 OR b.bonus IS NULL
测试用例通过了,但耗时太长。
题解:
SELECT name, bonus
FROM Employee
LEFT JOIN Bonus
ON Employee.empId = Bonus.empId
WHERE bonus < 1000 OR bonus IS NULL
结论:将表格重命名也会花费查询的时间。
二刷通过了:
select name, bonus
from employee e
left join bonus b
on e.empId = b.empId
where bonus < 1000 or bonus is null
1280. 学生们参加各科测试的次数 - 步骤挺多,多看几遍
我的答案:
SELECT Students.student_id, Students.student_name, Examinations.subject_name, COUNT(Examinations.subject_name) AS attended_exams
FROM Students
JOIN Examinations
ON Students.student_id = Examinations.student_id
# JOIN Subjects
# ON Examinations.subject_name = Subjects.subject_name
GROUP BY Students.student_id AND Examinations.subject_name
结果:
| student_id | student_name | subject_name | attended_exams |
| ---------- | ------------ | ------------ | -------------- |
| 1 | Alice | Math | 11 |
并没有成功运行出来结果。
题解:
JOIN和CROSS JOIN的区别
JOIN和CROSS JOIN的主要区别在于JOIN需要指定连接条件,而CROSS JOIN不需要。因此,JOIN通常比CROSS JOIN更实用,因为连接条件确保了只有相关的数据行会被连接在一起,而CROSS JOIN可能会产生巨大的结果集。
另一个区别在于CROSS JOIN的语句通常比JOIN的语句更慢,因为它需要计算表的笛卡尔积,而JOIN只需要关注两个表中的相关数据行。
以下是CROSS JOIN和INNER JOIN的示例,以说明它们之间的区别:
SELECT *
FROM table1
CROSS JOIN table2
WHERE table1.column1 = table2.column2;
该语句与以下INNER JOIN语句等效:
SELECT *
FROM table1
INNER JOIN table2
ON table1.column1 = table2.column2;
INNER JOIN在查询之前仅返回相关数据行,而CROSS JOIN将返回两个表的笛卡尔积,并需要额外的WHERE子句来指定连接条件。
-
通过一个子查询创建表
grouped
,它统计每个学生参加每个科目的考试次数。
GROUP BY的条件字段不能用AND连接,如果有两个或以上的查询条件就用","
连接SELECT student_id, subject_name, COUNT(*) AS attended_exams FROM Examinations GROUP BY student_id, subject_name
得到结果:
| student_id | subject_name | attended_exams | | ---------- | ------------ | -------------- | | 1 | Math | 3 | | 1 | Physics | 2 | | 1 | Programming | 1 | | 2 | Programming | 1 | | 13 | Math | 1 | | 13 | Programming | 1 | | 13 | Physics | 1 | | 2 | Math | 1 |
-
为了获得 (
student_id,subject_name
) 的所有组合,我们使用交叉联接将表 Student 中的每一行与表 Subject 中的每一行组合在一起,从而得到两个表中的student_id
和subject_name
的所有可能组合。SELECT * FROM Students CROSS JOIN Subjects
得到结果:
| student_id | student_name | subject_name | | ---------- | ------------ | ------------ | | 1 | Alice | Programming | | 1 | Alice | Physics | | 1 | Alice | Math | | 2 | Bob | Programming | | 2 | Bob | Physics | | 2 | Bob | Math | | 13 | John | Programming | | 13 | John | Physics | | 13 | John | Math | | 6 | Alex | Programming | | 6 | Alex | Physics | | 6 | Alex | Math |
-
将1,2步骤的表
Left join
LEFT JOIN
后面如果有两个及以上字段,需要用AND
连接,这里不能用","
连接。SELECT Students.student_id, Subjects.subject_name, attended_exams FROM Students CROSS JOIN Subjects LEFT JOIN( SELECT student_id, subject_name, COUNT(*) AS attended_exams FROM Examinations GROUP BY student_id, subject_name ) grouped ON Students.student_id=grouped.student_id AND Subjects.subject_name=grouped.subject_name
用
COUNT(*)
而不是COUNT(subject_name)
,不然如果该列出现的NULL值,统计结果会是0.
结果如下:| student_id | subject_name | attended_exams | | ---------- | ------------ | -------------- | | 1 | Programming | 1 | | 1 | Physics | 2 | | 1 | Math | 3 | | 2 | Programming | 1 | | 2 | Physics | null | | 2 | Math | 1 | | 13 | Programming | 1 | | 13 | Physics | 1 | | 13 | Math | 1 | | 6 | Programming | null | | 6 | Physics | null | | 6 | Math | null |
可以看到,结果不是按
student_id
和subject_name
排序的。attended_exams
有一些空值。 -
将3的结果按
student_id
和subject_name
排序,使用IFNULL()
函数将attended_exams
中的空值替换为0,加上之前忘记SELECT
的student_name
。SELECT Students.student_id, Students.student_name, Subjects.subject_name, IFNULL(attended_exams, 0) AS attended_exams FROM Students CROSS JOIN Subjects LEFT JOIN( SELECT student_id, subject_name, COUNT(*) AS attended_exams FROM Examinations GROUP BY student_id, subject_name ) grouped ON Students.student_id=grouped.student_id AND Subjects.subject_name=grouped.subject_name ORDER BY Students.student_id, Subjects.subject_name
结果如下:
| student_id | student_name | subject_name | attended_exams | | ---------- | ------------ | ------------ | -------------- | | 1 | Alice | Math | 3 | | 1 | Alice | Physics | 2 | | 1 | Alice | Programming | 1 | | 2 | Bob | Math | 1 | | 2 | Bob | Physics | 0 | | 2 | Bob | Programming | 1 | | 6 | Alex | Math | 0 | | 6 | Alex | Physics | 0 | | 6 | Alex | Programming | 0 | | 13 | John | Math | 1 | | 13 | John | Physics | 1 | | 13 | John | Programming | 1 |
AC!
这道题2刷要好好看看,多练几遍。
二刷也没写对,
一开始写成这样:
select st.student_id, st.student_name, su.subject_name, count(*) attended_exams
from students st, subjects su, examinations ex
where st.student_id = ex.student_id and ex.subject_name = su.subject_name
group by st.student_id, su.subject_name
order by st.student_id, su.subject_name
后来AC了:
with a as (
select student_id, subject_name, count(*) as attended_exams
from examinations
group by student_id, subject_name
)
select students.student_id, students.student_name, subjects.subject_name, ifnull(attended_exams, 0) attended_exams
from students
cross join subjects
left join a
on students.student_id = a.student_id and subjects.subject_name = a.subject_name
order by students.student_id, subjects.subject_name
注意:
- 可以把a表单独with出来,记得要加as. 完整表达是:“
with a as (...)
” - 本题不要把下面的
students cross join subjects
给with 成b表,不如不with选字段方便,会造成id都有很多空值;这样先students cross join subjects
再一起left join a
就很好 - a里面已经group by了,后面就不需要group by了
- 如果真的有需要with出来两个表的情况,可以这样写:“
with a as (...), b as (...)
”,语法才是正确的
570. 至少有5名直接下属的经理
我的解法:
考虑用子查询
- 先统计
managerId
出现的次数。SELECT managerId, COUNT(*) FROM Employee GROUP BY managerId
- 怎么找出
managerId
出现次数大于等于5的id呢?
我的尝试失败了:SELECT managerId, COUNT(*) AS number FROM Employee GROUP BY managerId WHERE number >= 5
结果:SELECT managerId, COUNT(*) FROM Employee GROUP BY managerId WHERE COUNT(*) >= 5
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE COUNT(*) >= 5' at line 5
错误原因:
GROUP BY
之后的判断条件不能用WHERE
,要用HAVING
.
修正:
SELECT managerId
FROM Employee
GROUP BY managerId
HAVING COUNT(*) >= 5
- 想要在原表中找到id在子查询结果中的。
SELECT name FROM Employee WHERE id in ( SELECT managerId FROM Employee GROUP BY managerId HAVING COUNT(*) >= 5 ) bigManager
结果返回错误:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'Manager' at line 9
题解做法:
将此表命名为 Manager,随后与 Employee 表做连接操作,得到每个经理的名字。
SELECT name
FROM Employee
JOIN (
SELECT managerId
FROM Employee
GROUP BY managerId
HAVING COUNT(*) >= 5
) Manager
ON Employee.id = Manager.managerId
AC!
二刷AC,跟上面解法一样的
1934. 确认率 - 挺麻烦的,多看几遍
我的解法:
-
先选出每个用户的requst数量
SELECT user_id, COUNT(*) AS request_number FROM Confirmations GROUP BY user_id
结果:
| user_id | request_number | | ------- | -------------- | | 3 | 2 | | 7 | 3 | | 2 | 2 |
-
将
Signups
表LEFT JOIN
Confirmations
表,这样没有request过的用户的request值就是null.SELECT Signups.user_id, action FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id
结果:
| user_id | action | | ------- | --------- | | 3 | timeout | | 3 | timeout | | 7 | confirmed | | 7 | confirmed | | 7 | confirmed | | 2 | timeout | | 2 | confirmed | | 6 | null |
-
在2表的基础上计算出每个用户confirmed的数量,得到confirmed表。
SELECT Signups.user_id, count(*) AS confirmed FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id WHERE action="confirmed" GROUP BY user_id
结果:
| user_id | confirmed | | ------- | --------- | | 7 | 3 | | 2 | 1 |
-
在2表的基础上GROUP BY user_id + COUNT(action),得到request_number表。
注意是要COUNT(action)而不是COUNT(*)
,区别在于COUNT(*)
是action的数量为0也会计算为1. 也就是说COUNT(action)计算的是action的值相加,COUNT(*)计算的是action的行数。SELECT Signups.user_id, COUNT(action) FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id GROUP BY user_id
结果:
| user_id | COUNT(action) | | ------- | ------------- | | 3 | 2 | | 7 | 3 | | 2 | 2 | | 6 | 0 |
-
将confirmed表LEFT JOIN到request_number表上,用0替换变为NULL的值。
SELECT * FROM ( SELECT Signups.user_id, COUNT(action) FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id GROUP BY user_id ) request_number LEFT JOIN ( SELECT Signups.user_id, count(*) AS confirmed FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id WHERE action="confirmed" GROUP BY user_id ) confirmed ON request_number.user_id=confirmed.user_id
得到结果:
| user_id | COUNT(action) | user_id | confirmed | | ------- | ------------- | ------- | --------- | | 3 | 2 | null | null | | 7 | 3 | 7 | 3 | | 2 | 2 | 2 | 1 | | 6 | 0 | null | null |
SELECT request_number.user_id, request_number, IFNULL(confirmed, 0) AS confirmed_number FROM ( SELECT Signups.user_id, COUNT(action) AS request_number FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id GROUP BY user_id ) request_number LEFT JOIN ( SELECT Signups.user_id, count(*) AS confirmed FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id WHERE action="confirmed" GROUP BY user_id ) confirmed ON request_number.user_id=confirmed.user_id
结果:
user_id request_number confirmed_number 3 2 0 7 3 3 2 2 1 6 0 0 -
用
confirmed_number/request_number
,并ROUND到小数点后2位。因为user_id
为6的用户的request数为0,因此分母为0,rate的结果是null,这时候再用一次IFNULL。
完整代码如下:SELECT request_number.user_id, IFNULL(ROUND(IFNULL(confirmed, 0)/request_number, 2), 0) AS confirmation_rate FROM ( SELECT Signups.user_id, COUNT(action) AS request_number FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id GROUP BY user_id ) request_number LEFT JOIN ( SELECT Signups.user_id, count(*) AS confirmed FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id WHERE action="confirmed" GROUP BY user_id ) confirmed ON request_number.user_id=confirmed.user_id
运行结果:
user_id confirmation_rate 3 0 7 1 2 0.5 6 0 AC!
二刷竟然不会写了,怕麻烦了!
再写一遍:
# Write your MySQL query statement below
with confirmed as (
select user_id, count(*) as confirmed
from confirmations
where action="confirmed"
group by user_id
),
requests as (
select user_id, count(*) as requests
from confirmations
group by user_id
)
select a.user_id, ifnull(round(confirmed/requests, 2), 0) as confirmation_rate
from (
select s.user_id, ifnull(requests, 0) as requests
from signups s
left join requests r
on s.user_id = r.user_id
) a
left join confirmed c
on a.user_id = c.user_id
620. 有趣的电影
- ORDER BY column1, column2 ASC|DESC
ASC升序,DESC降序
自己写的:
SELECT *
FROM cinema
WHERE id % 2 = 1 and description != 'boring'
ORDER BY rating DESC
一次通过。
- 总结一下SQL的哪种需要对应需要什么函数
1251. 平均售价
我自己的:
SELECT Prices.product_id, ROUND(SUM(price * units)/SUM(units), 2) AS average_price
FROM Prices
CROSS JOIN UnitsSold
ON Prices.product_id = UnitsSold.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUP BY Prices.product_id
结果:
解答错误。
输入:
Prices =
| product_id | start_date | end_date | price |
| ---------- | ---------- | ---------- | ----- |
| 1 | 2019-02-17 | 2019-02-28 | 5 |
| 1 | 2019-03-01 | 2019-03-22 | 20 |
| 2 | 2019-02-01 | 2019-02-20 | 15 |
| 2 | 2019-02-21 | 2019-03-31 | 30 |
| 3 | 2019-02-21 | 2019-03-31 | 30 |
UnitsSold =
| product_id | purchase_date | units |
| ---------- | ------------- | ----- |
| 1 | 2019-02-25 | 100 |
| 1 | 2019-03-01 | 15 |
| 2 | 2019-02-10 | 200 |
| 2 | 2019-03-22 | 30 |
输出:
| product_id | average_price |
| ---------- | ------------- |
| 1 | 6.96 |
| 2 | 16.96 |
预期结果:
| product_id | average_price |
| ---------- | ------------- |
| 1 | 6.96 |
| 2 | 16.96 |
| 3 | 0 |
改成:
# Write your MySQL query statement below
SELECT Prices.product_id, IFNULL(ROUND(SUM(price * units)/SUM(units), 2), 0) AS average_price
FROM Prices
LEFT JOIN UnitsSold
ON Prices.product_id = UnitsSold.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUP BY Prices.product_id
CROSS JOIN
改成LEFT JOIN
- 加上
IFNULL(x, 0)
【二刷】
select p.product_id, ifnull(round(sum(price * units)/sum(units), 2), 0) as average_price
from prices p
left join unitssold u
on u.purchase_date <= p.end_date and u.purchase_date >= start_date and p.product_id = u.product_id
group by p.product_id
1075. 项目员工 I
我的答案:
SELECT project_id, ROUND(SUM(experience_years)/COUNT(Project.employee_id), 2) AS average_years
FROM Project
LEFT JOIN Employee
ON Project.employee_id = Employee.employee_id
WHERE experience_years IS NOT NULL
GROUP BY project_id
通过。
要注意的点:如果experience_years是空的,就不计算这个员工——WHERE experience_years IS NOT NULL
二刷:
select project_id, round(avg(experience_years), 2) as average_years
from project p
left join employee e
on p.employee_id = e.employee_id
group by project_id
1633. 各赛事的用户注册率
我的答案:
SELECT contest_id, ROUND(COUNT(user_id) * 100 / (
SELECT COUNT(user_id)
FROM Users
), 2) AS percentage
FROM Register
GROUP BY contest_id
ORDER BY percentage DESC
返回的结果表按 percentage 的 降序 排序了,若相同则按 contest_id 的 升序 排序怎么排序?
——ORDER BY percentage DESC
后面加个contest_id
就可以了。
二刷直接通过,跟上面答案一样。
# Write your MySQL query statement below
SELECT contest_id, ROUND(count(distinct user_id) / (SELECT COUNT(user_id) FROM Users) * 100, 2) AS percentage
FROM Register
GROUP BY contest_id
ORDER BY percentage DESC, contest_id
1211. 查询结果的质量和占比
我的答案:
SELECT table_a.query_name, table_a.quality, IFNULL(ROUND(table_b.poor_query/table_a.count_by_query*100, 2), 0) AS poor_query_percentage
FROM (SELECT query_name, COUNT(result) AS count_by_query, ROUND(SUM(rating/position)/COUNT(result), 2) AS quality
FROM Queries
GROUP BY query_name) table_a
LEFT JOIN (
SELECT query_name, COUNT(rating) AS poor_query
FROM Queries
WHERE rating < 3
GROUP BY query_name
) table_b
ON table_a.query_name = table_b.query_name
WHERE table_a.query_name IS NOT NULL
用了两个子查询,写得很复杂。
官方题解:
SELECT
query_name,
ROUND(AVG(rating/position), 2) quality,
ROUND(SUM(IF(rating < 3, 1, 0)) * 100 / COUNT(*), 2) poor_query_percentage
FROM Queries
GROUP BY query_name
比我多用了AVG
和IF(rating < 3, 1, 0)
,显得很简洁。
为什么不用判断WHERE table_a.query_name IS NOT NULL
?——试了一下,不需要。
最终答案:
SELECT query_name,
ROUND(AVG(rating/position), 2) quality,
ROUND(SUM(IF(rating < 3, 1, 0))/COUNT(*)*100, 2) poor_query_percentage
FROM Queries
GROUP BY query_name
二刷用Pandas
鲸鲸说数据题解
import pandas as pd
def queries_stats(queries: pd.DataFrame) -> pd.DataFrame:
queries['quality'] = queries['rating'] / queries['position']
grouped = queries.groupby('query_name').agg(
quality = ('quality', lambda x: round(x.mean(), 2)),
poor_query_percentage=('rating', lambda x: round((x < 3).sum() / len(x) * 100, 2))
).reset_index()
return grouped
- Pandas.DataFrameGroupBy.aggregate()
reset_index()
:当我们对 queries 进行groupby('query_name').agg(...)
时,query_name 默认会成为 索引,而不是普通的一列。在 groupby().agg() 之后,reset_index() 的作用是 把 groupby 产生的索引转换回普通列,使得返回的 DataFrame 结构更加直观,方便后续处理。- 在grouped里面用round就可以,计算queries[‘quality’]的时候不要用,会导致计算结果不准确
这样的结果仍然会有的测试用例不通过,因为python的round函数保留小数位的问题。
import pandas as pd
from decimal import Decimal, ROUND_HALF_UP
def my_round(x):
return Decimal(x).quantize(Decimal('.00'), rounding=ROUND_HALF_UP)
def queries_stats(queries: pd.DataFrame) -> pd.DataFrame:
queries['quality'] = queries['rating'] / queries['position']
grouped = queries.groupby('query_name').agg(
quality=('quality', lambda x: my_round(x.mean())),
poor_query_percentage=('rating', lambda x: my_round((x < 3).sum() / len(x) * 100))
).reset_index()
return grouped
1193. 每月交易 I
如何提取date中的月份?
官方题解:
本题要求 查找每个月和每个国家/地区的事务数及其总金额、已批准的事务数及其总金额,我们可以将这句话拆分成几个子任务:
- 查找每个月和每个国家/地区。
数据表中的trans_date
是精确到日,我们可以使用DATE_FORMAT()
函数将日期按照年月%Y-%m
输出。比如将2019-01-02
转换成2019-01
。
获取到所有的月份后,使用DATE_FORMAT(trans_date, '%Y-%m')
GROUP BY
聚合每个月和每个国家的记录就完成了第一步。 - 查找总的事务数。
第一步已经将数据按月和国家聚合,只需要使用COUNT
函数就能获取到总的事务数。COUNT(*) AS trans_count
- 查找总金额。
使用SUM
函数计算总金额。SUM(amount) AS trans_total_amount
- 查找已批准的事物数。
已批准的事物的state
标记为approved
。首先使用IF
函数将state = 'approved'
的记录标记为 1,否则为NULL
。再使用COUNT
计算总量。
NULL 不能改成0,否则COUNT会把他计数COUNT(IF(state = 'approved', 1, NULL)) AS approved_count
- 查找已批准的事物的总金额。
和第四步一样,先使用IF
函数,再使用SUM
函数。SUM(IF(state = 'approved', amount, 0)) AS approved_total_amount
看了题解后我的答案:
SELECT DATE_FORMAT(trans_date, '%Y-%m') AS month, country,
COUNT(*) AS trans_count,
COUNT(IF(state='approved', 1, NULL)) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(IF(state='approved', amount, 0)) AS approved_total_amount
FROM Transactions
GROUP BY month, country
1174. 即时食物配送 II
我的思路:
我可以计算即时订单。
SELECT delivery_id, customer_id, IF(customer_pref_delivery_date = order_date,
'immediate', 'plan') AS status
FROM Delivery
结果:
| delivery_id | customer_id | status |
| ----------- | ----------- | --------- |
| 1 | 1 | plan |
| 2 | 2 | immediate |
| 3 | 1 | plan |
| 4 | 3 | immediate |
| 5 | 3 | plan |
| 6 | 2 | plan |
| 7 | 4 | immediate |...
首次订单怎么计算,需要看题解。
本题最重要的就是求每一个用户的首单数据:我们使用 group by
聚合每个用户的数据,再使用 min
函数求出首单的时间。将 (customer_id, order_date)
作为查询条件,使用 where in
便可查出具体的数据。
SELECT delivery_id, customer_id, min(order_date)
FROM Delivery
GROUP BY customer_id
结果:
| delivery_id | customer_id | min(order_date) |
| ----------- | ----------- | --------------- |
| 1 | 1 | 2019-08-01 |
| 2 | 2 | 2019-08-02 |
| 4 | 3 | 2019-08-21 |
| 7 | 4 | 2019-08-09 |...
与上一步我的题解结合一下:
注意WHERE 后面的条件要加括号
SELECT delivery_id, customer_id, IF(customer_pref_delivery_date = order_date,
'immediate', 'plan') AS status
FROM Delivery
WHERE (customer_id, order_date) IN (
SELECT customer_id, min(order_date)
FROM Delivery
GROUP BY customer_id
)
结果:
delivery_id | customer_id | status |
---|---|---|
1 | 1 | plan |
2 | 2 | immediate |
5 | 3 | plan |
7 | 4 | immediate |
如何计算即时订单在所有用户的首次订单中的比例呢?
题解又给了很聪明的做法,不需要新增一列status,计算customer_pref_delivery_date = order_date
的总数就行:
SELECT ROUND(SUM(customer_pref_delivery_date = order_date)/COUNT(*)*100, 2) AS immediate_percentage
FROM Delivery
WHERE (customer_id, order_date) IN (
SELECT customer_id, min(order_date)
FROM Delivery
GROUP BY customer_id
)
结果:
AC!
【解法二】:排名函数rank() over (partition by customer_id order by order_date asc)
select round(sum(if(d.order_date = d.customer_pref_delivery_date, 1, 0)) * 100
/ count(d.customer_id), 2) as immediate_percentage
from (select customer_id,
order_date,
customer_pref_delivery_date,
rank() over (partition by customer_id order by order_date asc) as rk
from Delivery) as d
where d.rk = 1;
550. 游戏玩法分析 IV
我的解法:
和上题求用户首单一样,这次是求用户首次登录的时间。同样可以先用group by
,再用min
求出。
SELECT ROUND(SUM(IF(event_date-start_date=1, 1, 0))/MAX(Activity.player_id), 2) AS fraction
FROM Activity
LEFT JOIN (
SELECT player_id, min(event_date) AS start_date
FROM Activity
GROUP BY player_id
) A
ON Activity.player_id = A.player_id
不知道怎么计算用户数量,暂且用MAX(Activity.player_id)
。
结果:
解答错误。当用户id只有1和3的时候就不对了。
查了sql语法,改成COUNT(DISTINCT Activity.player_id)
。
SELECT ROUND(SUM(IF(event_date-start_date=1, 1, 0))/COUNT(DISTINCT Activity.player_id), 2) AS fraction
FROM Activity
LEFT JOIN (
SELECT player_id, min(event_date) AS start_date
FROM Activity
GROUP BY player_id
) A
ON Activity.player_id = A.player_id
结果:
依然解答错误。
错误的用例:
| player_id | device_id | event_date | games_played |
| --------- | --------- | ---------- | ------------ |
| 1 | 2 | 2016-02-29 | 5 |
| 1 | 2 | 2016-03-01 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
跨天减不了吗?
——看题解:
首先,求出所有用户首次登录的第二天的时间。方法是查询出 Activity 表中每个用户的第一天时间,并加上 1.
select player_id, DATE_ADD(MIN(event_date), INTERVAL 1 DAY) as second_date
from Activity
group by player_id
将此表命名为 Expected。随后我们要从 Activity 表中查询 event_date 与 Expected.sencond_date 重叠的部分,注意此判定要限定在用户相同的前提下。这部分用户即为在首次登录后第二天也登录了的用户:
select Activity.player_id as player_id
from (
select player_id, DATE_ADD(MIN(event_date), INTERVAL 1 DAY) as second_date
from Activity
group by player_id
) as Expected, Activity
where Activity.event_date = Expected.second_date and Activity.player_id = Expected.player_id
将此表命名为 Result。随后我们只需要得到 Result 表中用户的数量,以及 Activity 表中用户的数量,相除并保留两位小数即可。
select IFNULL(round(count(distinct(Result.player_id)) /
count(distinct(Activity.player_id)), 2), 0) as fraction
得到启发,可以用INTERVAL 1 DAY
代替减号进行的时间计算。
SELECT ROUND(SUM(IF(event_date=DATE_ADD(start_date, INTERVAL 1 DAY), 1, 0))/COUNT(DISTINCT Activity.player_id), 2) AS fraction
FROM Activity
LEFT JOIN (
SELECT player_id, min(event_date) AS start_date
FROM Activity
GROUP BY player_id
) A
ON Activity.player_id = A.player_id
结果:
AC!
2356. 每位教师所教授的科目种类的数量
太简单了,直接AC.
1141. 查询近30天活跃用户数
太久没写了,完全没思路,还是得一天练一道sql保持手感。
一开始写了这个代码:
SELECT *
FROM Activity
WHERE A.acticity_type != 'end_session'
完全不对,甚至语法错误。
直接看题解:
select activity_date day, count(distinct user_id) active_users
from activity
where datediff('2019-07-27', activity_date) >= 0 AND datediff('2019-07-27', activity_date) <30
group by activity_date
作者:MythicMyuu
链接:https://leetcode.cn/problems/user-activity-for-the-past-30-days-i/solutions/1484069/by-mythicmyuu-mlj9/
笔记:
- 函数和变量都可以用全小写
- select后面的重命名可以不加as,直接重命名
- 如何计算时间间隔:
datediff(时间1,时间2)
- datediff还要限制>=0不然会查询到07-27之后的数据。但< 30不应该有=,因为没有说包含30天当天的数据。
1084. 销售分析III
先自己写了一版:
select product_id, product_name
from product
where product_id in (
select distinct product_id
from sales
where datediff(sale_date, '2019-01-01') >= 0 and datediff(sale_date, '2019-03-31') <= 0
)
这样的结果计算出来多了product_id=2,因为,没有排除除了在春季售出,还是其他时候售出的商品。
改了一版:
select product_id, product_name
from product
where product_id in (
select distinct product_id
from sales
where datediff(sale_date, '2019-01-01') >= 0 and datediff(sale_date, '2019-03-31') <= 0
) and not in (
select distinct product_id
from sales
where datediff(sale_date, '2019-01-01') < 0 and datediff(sale_date, '2019-03-31') > 0
)
语法错误。
看题解:
select product_id, product_name
from product
where product_id not in (
select s.product_id
from sales s
where sale_date < '2019-01-01' or sale_date > '2019-03-31'
)
and product_id in (
select s.product_id
from sales s
)
作者:欢乐马
链接:https://leetcode.cn/problems/sales-analysis-iii/solutions/2254466/yong-bu-ji-de-si-xiang-lai-zuo-not-in-by-3qfs/
笔记:
- 虽然where后面两个条件都是针对product_id的,但第二个条件还是需要加上product_id,不然语法不对
- 这题只是判断时间是否在某段时间之前或之后,可以直接用
'>', '<'
- 刚才的第二段判断写错了,两个区间并不想交,要用
or
- 也可以用题解的思路:第三个条件没什么,直接in sales这个表就好了,限制这个product_id必须是有过出售记录的。
可能是我自己的:
# Write your MySQL query statement below
select product_id, product_name
from product
where product_id in (
select distinct product_id
from sales
where datediff(sale_date, '2019-01-01') >= 0 and datediff(sale_date, '2019-03-31') <= 0
) and product_id not in (
select distinct product_id
from sales
where datediff(sale_date, '2019-01-01') < 0 or datediff(sale_date, '2019-03-31') > 0
)
596. 超过5名学生的课
select class
from courses
group by class
having count(student) >= 5
直接AC了。
⚠️HAVING 后面的聚合函数 可以 出现在 SELECT 后面,但 不是必须的。严格来说,没有必须的情况。
1729. 求关注者的数量
select user_id, count(*) as followers_count
from followers
group by user_id
order by user_id
直接AC了。
619. 只出现一次的最大数字 - 空值部分有空看看
自己先写出来一版:
select num
from MyNumbers
group by num
having count(num) = 1
order by num asc
结果:返回的是一列,而不是最大值
原因:涉及到一个语法——select ... from .... limit a ,b
🌟一万五千字!你最常用的 SQL 这些核心知识点,我都帮你准备好了!
加上之后:
select num
from MyNumbers
group by num
having count(num) = 1
order by num desc
limit 0, 1
结果:
少了返回空值的情况。
试了试ifnull,结果老是不对。
正确写法:
select IFNULL((select num
from MyNumbers
group by num
having count(num) = 1
order by num desc
limit 0, 1)
, null) num
⚠️注意:
ifnull用法注意事项,结果集为空,select中不调用ifnull函数
因为SQL在执行时是按照 FROM -> ON -> OUTER(JOIN) -> WHERE -> GROUP BY -> HAVING -> SELECT -> DISTINCT -> ORDER BY -> limit 这个执行顺序执行的
1045. 买下所有产品的客户
没思路。
题解1:
因为product table的产品数目固定, 所以只要 groupby customer_id 后只要他们中
having count distinct product_key= product table的产品数目,
相当于购买了全部的产品
只需要总产品数量相等就可以了,不需要每个产品都相等去比较一个list
完整代码:
select customer_id
from customer
group by customer_id
having count(distinct product_key) in (
select count(distinct product_key)
from product
)
题解2:
首先会过滤掉Customer表中不在Product表中的product_key,然后再进行分组和计数,保证每个客户购买的产品种类数量等于Product表中产品的总数,从而得到符合条件的客户的customer_id。
完整代码:
select customer_id
from customer
where product_key in (select product_key from product)
group by customer_id
having count(distinct product_key) = (
select count(*)
from product
)
这版更严谨。二刷怎么感觉这版很多余呢。
1731. 每位经理的下属员工数量
select b.employee_id, b.name, count(a.name) as reports_count, round(avg(a.age), 0) as average_age
from employees a
left join employees b
on a.reports_to = b.employee_id
where b.employee_id is not null
group by b.employee_id
order by b.employee_id
AC!
1789. 员工的直属部门
没什么思路,直接看题解了。
题解:强烈建议不要使用UNION,窗口才是正道!官方BUG
【用Union的写法】—— 不严谨
select employee_id, department_id
from employee
where primary_flag = 'Y'
union
select employee_id, department_id
from employee
group by employee_id
having count(department_id) = 1
MySQL不允许SELECT没有在GROUP BY中使用字段。如果启用了ONLY_FULL_GROUP_BY模式,MySQL要求SELECT列表、HAVING条件和ORDER BY子句仅引用GROUP BY子句中的列或聚合函数。
(很迷,上面那么多解法都是这么操作的)
【窗口函数COUNT() OVER
写法】
with q as(
select
employee_id,
department_id,
primary_flag,
count(*) over(partition by employee_id) as count_over
from employee
)
select
employee_id,
department_id
from q
where primary_flag = 'Y' or count_over = 1
笔记:
with...as
:在SQL中被称为公用表表达式(CTE, Common Table Expression),用于定义一个临时的命名结果集,可以在后续的SELECT、INSERT、UPDATE或DELETE语句中引用。在MySQL中,CTE从MySQL 8.0开始被支持。相当于定义一个子查询。- SQL窗口函数详解
610. 判断三角形
判断三角形的条件:
可以使用三角不等式定理。根据这个定理,如果三条边长满足以下条件,那么它们就能构成一个三角形:
- 任意两边之和大于第三边。
- 任意两边之差小于第三边。
知道了判断三角形的条件也不知道怎么把条件的结果筛选出来。
看官方题解:使用 case...when...
select x, y, z, # 三个逗号一个都不能少,因为triangle是另一个筛选值
case
when x + y > z and x + z > y and y + z > x then 'Yes'
else 'No'
end as 'triangle'
from triangle
另一种普通解法:
select x, y, z, IF(x + y > z and x + z > y and y + z > x, 'Yes', 'No') as triangle
from Triangle
180. 连续出现的数字
知道怎么判断出现次数,但是怎么判断连续出现的次数呢?
看官方题解:用 DISTINCT 和 WHERE 语句
连续出现的意味着相同数字的 Id 是连着的,由于这题问的是至少连续出现 3 次,我们使用 Logs 并检查是否有 3 个连续的相同数字。
SELECT *
FROM
Logs l1,
Logs l2,
Logs l3
WHERE
l1.Id = l2.Id - 1
AND l2.Id = l3.Id - 1
AND l1.Num = l2.Num
AND l2.Num = l3.Num
;
id | num | id | num | id | num |
---|---|---|---|---|---|
1 | 1 | 2 | 1 | 3 | 1 |
然后我们从上表中选择任意的 Num 获得想要的答案。同时我们需要添加关键字 DISTINCT ,因为如果一个数字连续出现超过 3 次,会返回重复元素。
SELECT DISTINCT
l1.Num AS ConsecutiveNums
FROM
Logs l1,
Logs l2,
Logs l3
WHERE
l1.Id = l2.Id - 1
AND l2.Id = l3.Id - 1
AND l1.Num = l2.Num
AND l2.Num = l3.Num
;
1164. 指定日期的产品价格
写了一种自己的写法:
select *
from products
group by product_id
order by change_date
返回的结果总是最小的日期。
product_id | new_price | change_date |
---|---|---|
1 | 20 | 2019-08-14 |
2 | 50 | 2019-08-14 |
3 | 20 | 2019-08-18 |
看官方题解:
方法一:left join
和 ifnull
思路
本题的关键点在找到 2019-08-16 前所有有改动的产品及其最新价格和没有没有修改过价格的产品。
我们可以先找到所有的产品,再找到所有 2019-08-16 前有修改的产品和他们最新的价格,使用 left join 将两个查询联合。如果产品没有价格,说明没有修改过,设置为 10,如果有价格,设置为最新的价格。
算法
- 找出所有的产品:
select distinct product_id from products
product_id 1 2 3 - 找到 2019-08-16 前所有有改动的产品的最新价格。
- 使用 max 函数找到产品最新修改的时间。使用 where 查询限制时间小于等于 2019-08-16:
select distinct product_id, max(change_date) from products where change_date <= '2019-08-16' group by product_id
product_id max(change_date) 1 2019-08-16 2 2019-08-14 - 使用 where 子查询,根据 product_id 和 change_date 找到对应的价格:
💡注意:where 查询后面如果是有两个及以上的查询字段,要用括号括起来:(product_id, change_date)select product_id, new_price as price from products where (product_id, change_date) in ( select distinct product_id, max(change_date) from products where change_date <= '2019-08-16' group by product_id )
product_id price 2 50 1 35 - 上面两步已经找到了所有的产品和已经修改过价格的产品。使用 left join 得到所有产品的最新价格,如果没有设置为 10。
select p1.product_id, ifnull(p2.new_price, 10) as price from ( select distinct product_id from products ) as p1 -- 所有的产品 left join ( select product_id, new_price from products where (product_id, change_date) in ( select product_id, max(change_date) from products where change_date <= '2019-08-16' group by product_id ) ) as p2 -- 在 2019-08-16 之前有过修改的产品和最新的价格 on p1.product_id = p2.product_id
- 使用 max 函数找到产品最新修改的时间。使用 where 查询限制时间小于等于 2019-08-16:
方法二:窗口函数 dense_rank()
思路
本题的关键点在找到 2019-08-16 前所有有改动的产品及其最新价格和没有修改过价格的产品,所以 2019-08-16是个重要的分界线!
根据观察输出可知,我们必须找到所有的product_id,但是 2019-08-16这个日期把product_id隔开了,那么我们也必须拆开去分析。
(1)找到2019-08-16以及之前的每个product的new_price(改过的)
(2)用所有的product_id左连接上面的表(记作表1),其中没在表1中出现的product_id所对应的price则是没改动的price,也就是10。
算法
- 找出所有的产品:
select distinct product_id from Products
- 找到 2019-08-16 前所有有改动的产品的最新价格。
方法一:窗口函数 dense_rank()
方法二:不用窗口函数 where in 联合键匹配-
方法一:窗口函数 dense_rank()
找到价格修改的记录,并用dense_rank() 排序select product_id, new_price, change_date, dense_rank() over( partition by product_id order by change_date desc ) as rnk from products where change_date <= '2019-08-16'
💡注意:
- Dense_rank后面要加括号
- order by 后面要加desc
product_id new_price change_date rnk 1 35 2019-08-16 1 1 30 2019-08-15 2 1 20 2019-08-14 3 2 50 2019-08-14 1 子查询 找到价格修改的最新纪录。
select product_id, new_price from ( select product_id, new_price, change_date, dense_rank() over( partition by product_id order by change_date ) as rnk from products where change_date <= '2019-08-16' ) t where rnk = 1
-
这时我们已经找到2019-08-16以及之前的每个product的new_price
- left join补全2019-08-16之后的没改动的product_id和price
select distinct p1.product_id, ifnull(p2.new_price, 10) as price
from products p1
left join (
select product_id, new_price
from (
select product_id, new_price, change_date,
dense_rank() over(
partition by product_id
order by change_date desc
) as rnk
from products
where change_date <= '2019-08-16'
) t
where rnk = 1
) p2
on p1.product_id = p2.product_id
1204. 最后一个能进入巴士的人
先自己写了一下:
select *, sum(weight)
from queue
order by turn asc
where sum(weight) <= 1000
说语法不对,那应该怎么写呢?
题解提示了用窗口函数,自己试一下先:
select *,
sum(weight) over(
order by turn asc
) as weight_sum
from queue
where weight_sum <= 1000
依然语法错误。
看题解和官方题解:
方法一:窗口函数+子查询
思路
本题目的思路为如何找到累积重量达到1000的最后一个上车的乘客,那么我们可以根据turn来去对weight做累积和。然后找到cumu_weight<=1000的最后一个乘客(根据turn排倒序)。
⚠️ 不可能存在weight=0的情况,也就是cumu_weight必然递增,所以order by + limit 1可行。
算法
- 窗口函数求累积和
select
turn,
person_name,
sum(weight) over(order by turn) as cumu_weight
from Queue
- 子查询筛选cumu_weight <= 1000
select
person_name
from
(select
turn,
person_name,
sum(weight) over(order by turn) as cumu_weight
from Queue) t
where cumu_weight <= 1000
- 根据turn排倒序,找到最后一个上车的人
order by turn desc
limit 1
完整代码:
select person_name
from (
select *,
sum(weight) over(
order by turn
) as weight_sum
from queue
) t
where weight_sum <= 1000
order by weight_sum desc
limit 1
AC!
为什么之前那种写法不对:
在 SQL 中,窗口函数(如 SUM() OVER())的使用需要注意执行的顺序和范围。你的问题涉及到为什么不能在一个查询中直接使用 weight_sum 作为条件,而需要嵌套查询。
原因如下:
- SQL 执行顺序:
SQL 查询是按照特定的顺序执行的:
FROM 子句
WHERE 子句
GROUP BY 子句
HAVING 子句
SELECT 子句
ORDER BY 子句
窗口函数是在 SELECT 阶段执行的,但 WHERE 子句在 SELECT 之前执行。这意味着你不能在 WHERE 子句中直接使用窗口函数的结果,因为窗口函数的计算结果尚未生成。- 窗口函数的计算:
在窗口函数的计算中,SUM(weight) OVER (ORDER BY turn ASC)
会为每一行计算一个累积和,而这个累积和在 SELECT 子句计算之前是不可用的。- 嵌套查询的必要性:
为了使用窗口函数的计算结果作为过滤条件,我们需要先计算窗口函数的结果,并在外层查询中使用这些结果进行过滤。这就是为什么需要嵌套查询的原因。嵌套查询确保窗口函数的计算结果已经存在,然后外层查询可以使用这些结果进行过滤。
方法二:自连接
思路
将 b 表中的每一条数据和 a 表的每一条数据连接。
select *
from queue a, queue b
接下来对连接后的数据进行处理.
我们使用 a 表的 person_id 表示自身,b 表中的数据表示为包括自己在内的所有人。使用 GROUP BY a.person_id 处理每个人的数据。因为要计算每个人的 weight 加上之前所有人的 weight,使用查询条件 a.turn >= b.turn 找到所有在他之前以及他自己的重量。
select *
from queue a, queue b
where a.turn >= b.turn
group by a.person_id
再使用 SUM 计算总和并过滤掉大于 1000 的数据。
select a.person_name
from queue a, queue b
where a.turn >= b.turn
group by a.person_id
having sum(b.weight) <= 1000
拿到所有满足条件的数据后,只需要再对 a.turn 倒序取第一条即可。
select a.person_name
from queue a, queue b
where a.turn >= b.turn
group by a.person_id
having sum(b.weight) <= 1000
order by a.turn desc
limit 1
AC!
体会一下自连接的做法,不过还是没窗口函数清晰。
1907. 按分类统计薪水
自己的想法,做3个子查询:
select count(account_id) as low_count
from accounts
where income < 20000
select count(account_id) as high_count
from accounts
where income > 50000
select count(account_id) as ave_count
from accounts
where income between 20000 and 50000
但是怎么union到一起呢?
题解:Uinon
# Write your MySQL query statement below
SELECT 'High Salary' category,count(1) accounts_count FROM accounts WHERE income>50000
UNION
SELECT 'Average Salary' category,count(1) accounts_count FROM accounts WHERE income>=20000 AND income<=50000
UNION
SELECT 'Low Salary' category,count(1) accounts_count FROM accounts WHERE income<20000
官方题解:case...when...
+ union
SELECT
'Low Salary' AS category,
SUM(CASE WHEN income < 20000 THEN 1 ELSE 0 END) AS accounts_count
FROM
Accounts
UNION
SELECT
'Average Salary' category,
SUM(CASE WHEN income >= 20000 AND income <= 50000 THEN 1 ELSE 0 END)
AS accounts_count
FROM
Accounts
UNION
SELECT
'High Salary' category,
SUM(CASE WHEN income > 50000 THEN 1 ELSE 0 END) AS accounts_count
FROM
Accounts
子查询
1978. 上级经理已离职的公司员工
用子查询直接AC!
626. 换座位
自己按子查询写了一版,除了如果表格是奇数不知道怎么处理,其他都搞出来了:
# Write your MySQL query statement below
-- select *
-- from seat
-- where id % 2 = 1
-- select *
-- from seat
-- where id % 2 = 0
select *
from (
select a.id as id, b.student as student#, b.id, a.student
from (
select *
from seat
where id % 2 = 1
) a
join (
select *
from seat
where id % 2 = 0
) b
where b.id - a.id = 1
UNION
select b.id as id, a.student as student
from (
select *
from seat
where id % 2 = 1
) a
join (
select *
from seat
where id % 2 = 0
) b
where b.id - a.id = 1
) c
order by id
select if(id % 2 = 0, id - 1,
if(id = (select count(distinct id) from seat), id, id + 1)
) as id, student
from seat
order by id
同时还解决了多一个奇数行的问题。
1341. 电影评分
这一题特别体现了UNION和UNION ALL的区别:UNION ALL不去重,UNION要去重。
# Write your MySQL query statement below
(select name as results
from MovieRating
left join users
on MovieRating.user_id = users.user_id
group by MovieRating.user_id
order by count(rating) desc, name
limit 0, 1)
UNION ALL
(select title as results
from MovieRating
left join movies
on MovieRating.movie_id = movies.movie_id
where created_at between '2020-02-01' and '2020-02-28'
group by MovieRating.movie_id
order by avg(rating) desc, title
limit 1)
笔记:
- 日期条件:
date_format(created_at, '%Y-%m') = '2020-02'
- 其实order by里面不能出现有聚合函数,之前哪一道题解里面说了——复习的时候看看人家怎么解的。
1321.餐馆营业额变化增长
我自己写的用窗口函数的代码:
select visited_on,
sum(amount) over(
partition by visited_on
rows between 6 preceding and current row
) as amount,
avg(amount) over(
partition by visited_on
rows between 6 preceding and current row
) as average_amount
from customer
结果:
visited_on | amount | average_amount |
---|---|---|
2019-01-01 | 100 | 100 |
2019-01-02 | 110 | 110 |
2019-01-03 | 120 | 120 |
2019-01-04 | 130 | 130 |
2019-01-05 | 110 | 110 |
2019-01-06 | 140 | 140 |
2019-01-07 | 150 | 150 |
2019-01-08 | 80 | 80 |
2019-01-09 | 110 | 110 |
2019-01-10 | 130 | 130 |
2019-01-10 | 280 | 140 |
select visited_on,
sum(amount) over(
partition by visited_on
rows between 6 preceding and current row
) as amount,
round(avg(amount) over(
partition by visited_on
rows between 6 preceding and current row
), 2) as average_amount
from customer
where datediff(visited_on, (select min(visited_on) from customer)) >= 6
结果:
visited_on | amount | average_amount |
---|---|---|
2019-01-07 | 150 | 150 |
2019-01-08 | 80 | 80 |
2019-01-09 | 110 | 110 |
2019-01-10 | 130 | 130 |
2019-01-10 | 280 | 140 |
以前我的解法:
select visited_on,
sum(amount) over (
order by visited_on
rows between 6 preceding and current row
) as amount,
round(avg(amount) over (
order by visited_on
rows between 6 preceding and current row
), 2) as average_amount
from customer
where visited_on >= (select min(visited_on) from customer) + interval 6 day
order by visited_on;
⚠️注意窗口函数over里面不要加逗号。
结果:
visited_on | amount | average_amount |
---|---|---|
2019-01-07 | 150 | 150 |
2019-01-08 | 230 | 115 |
2019-01-09 | 340 | 113.33 |
2019-01-10 | 470 | 117.5 |
2019-01-10 | 620 | 124 |
两种结果都不正确。
跟我的结果比较相近的题解:窗口函数求移动平均和总和
SELECT visited_on,amount,average_amount
FROM (
SELECT visited_on,
SUM(amount) OVER (ORDER BY visited_on ROWS 6 PRECEDING) AS amount,
ROUND(AVG(amount)OVER(ORDER BY visited_on ROWS 6 PRECEDING),2) AS average_amount
FROM (
SELECT visited_on,SUM(amount) AS amount
FROM Customer
GROUP BY visited_on
) TABLE_1
) TABLE_2
WHERE DATEDIFF(visited_on,(SELECT MIN(visited_on) FROM Customer)) >=6
笔记:
- 就是说先用子查询把sum计算出来,再用窗口函数,然后再从刚才的表里面查询。
- 为什么不是partition by而是order by呢?
【chatgpt解答】
3. 为什么要像题解一样先用子查询把sum计算出来,再用窗口函数,然后再从刚才的表里面查询?
假设你使用了如下的窗口函数来计算累计和:
SELECT visited_on,
amount,
SUM(amount) OVER (ORDER BY visited_on) AS running_total
FROM customer;
这个查询会按照visited_on的顺序对amount进行累加。但如果在同一天有多个记录(比如这里的2019-01-10),查询结果中会分别显示每一条记录的累计和。
其结果可能是:
visited_on amount running_total
2019-01-01 100 100
2019-01-02 210 310
2019-01-03 330 640
2019-01-04 460 1100
2019-01-05 570 1670
2019-01-06 710 2380
2019-01-07 860 3240
2019-01-08 840 4080
2019-01-09 840 4920
2019-01-10 850 5770
2019-01-10 870 6640
为了避免这种情况,你可以对visited_on进行分组,然后计算每一天的总和,再对总和进行累计计算。例如:
WITH daily_totals AS (
SELECT visited_on,
SUM(amount) AS daily_amount
FROM customer
GROUP BY visited_on
)
SELECT visited_on,
daily_amount,
SUM(daily_amount) OVER (ORDER BY visited_on) AS running_total
FROM daily_totals;
- 为什么不是partition by而是order by呢?
在窗口函数中使用 PARTITION BY 是可以的,但它的作用和我们预期的不一样。PARTITION BY 的主要作用是将数据按某个字段分组,然后在每个组内进行窗口函数的计算。如果在你的查询中使用 PARTITION BY visited_on,每一组内的数据都会被单独处理,导致每组内的累加都是从零开始的,这和我们想要的结果——累加总和(running total)是不一致的。
只有上面依然是不够的,我写出了下面的代码:
with daily_total as (
select visited_on, sum(amount) as daily_amount
from customer
group by visited_on
)
select visited_on,
sum(daily_amount) over(
order by visited_on
rows between 6 preceding and current row
) as amount,
round(avg(daily_amount) over(
order by visited_on
rows between 6 preceding and current row
), 2) as average_amount
from daily_total
where datediff(visited_on, (select min(visited_on) from customer)) >= 6
依然不能正确计算结果。
原因如下:
- 题解代码正确的原因是它按照以下顺序执行:
- 首先对每个日期计算总金额。
- 使用窗口函数对总金额进行滚动计算,确保计算的是每个日期的前7天(包括当前天)的总和和平均值。
- 最后筛选出从第7天开始的数据,这样窗口函数的结果包含了准确的前7天数据。
- 我的代码可能在窗口函数的计算过程中因为数据的过滤而导致计算结果不准确。
题解代码段在数据过滤之后进行窗口函数的计算,确保了计算的准确性。with daily_total as ( select visited_on, sum(amount) as daily_amount from customer group by visited_on ) select visited_on, amount, average_amount from ( select visited_on, sum(daily_amount) over( order by visited_on rows between 6 preceding and current row ) as amount, round(avg(daily_amount) over( order by visited_on rows between 6 preceding and current row ), 2) as average_amount from daily_total ) daily_total_average where datediff(visited_on, (select min(visited_on) from customer)) >= 6
AC!
关于日期间隔的两种表示方法复习的时候再总结。
602. 好友申请 II :谁有最多的好友
不会写,直接看官方题解。
题解:官方代码写复杂了
统计某个id在requester_id和accepter_id中出现的次数和。
直接使用union会合并重复数据,因此这里使用union all。
统计后排序再取表中第一项即可。
with t1 as (
select requester_id id from RequestAccepted
union all
select accepter_id id from RequestAccepted
)
select id, count(*) num
from t1
group by id
order by num desc
limit 1
585. 2016年的投资
select *
from Insurance a
join Insurance b
where a.pid != b.pid and a.tiv_2015 = b.tiv_2015
– 自己的想法,剩下的不会了
优秀题解:
分析
这题的难点有两个:
- 一个人在 2015 年投保额至少和任何一个人在 2015年的投保额相同
- 筛选出不重复的地点
解决了这两个方法,这题就解决了。 方法一使用的是窗口函数,方法二使用的是子查询。
方法一
解析
使用窗口函数进行分组
- 筛选出一个人在 2015 年投保额至少和任何一个人在 2015 年的投保额相同,就是按照 2015 年的投保额进行分组,并计算个数。
count(*) over(partition by tiv_2015) count_tiv_2015
count_tiv_2015 3 3 3 1 - 筛选不同的地点,就是按照经纬度进行分组,计算每个经纬度的个数
count(*) over(partition by lat, lon) count_lat_lon
count_lat_lon 1 2 2 1 - 将 1 、 2 两步和 tiv_2016 字段作为临时表 temp
tiv_2016 count_tiv_2015 count_lat_lon 5 3 1 30 3 2 20 1 2 40 3 1 - 查询
temp
筛选出问题中的两步- count_tiv_2015 > 1 投保额 tiv_2015 至少有两个人是相等的
- count_lat_lon = 1 经纬度是唯一的
- 通过筛选条件后使用
sum()
计算出tiv_2016
并且使用round()
保留两个小数select round(sum(tiv_2016), 2) tiv_2016 from ( select tiv_2016, count(*) over(partition by tiv_2015) count_tiv_2015, count(*) over(partition by lat, lon) count_lat_lon from insurance ) as temp where count_lat_lon = 1 and count_tiv_2015 > 1
方法二
解析
使用子查询
-
筛选出一个人在 2015 年投保额至少和任何一个人在 2015 年的投保额相同,使用
group by
按照tiv_2015
分组,并且使用count()
计算,筛选出大于 1 的数据,因为大于 1 代表至少有两个人在 2015 年投保额相同select tiv_2015 from insurance group by tiv_2015 having count(*) > 1;
-
筛选不同的地点,就是按照
lat
和lon
进行分组,这里查询的字段是使用concat()
连接lat
,lon
后的值,并且使用count()
计算,筛选出等于 1 的数据,因为等于 1 代表地点唯一ps:使用 in 操作会消耗性能
select concat(lat, lon) from insurance group by lat, lon having count(*) = 1;
-
查询
insurance
,筛选条件是两个子查询,tiv_2015 in 1
和concat(lat, lon) in 2
-
通过筛选条件后使用
sum()
计算出tiv_2016
并且使用round()
保留两个小数select round(sum(tiv_2016), 2) tiv_2016 from insurance where tiv_2015 in ( select tiv_2015 from insurance group by tiv_2015 having count(*) > 1 ) and concat(lat, lon) in ( select concat(lat, lon) from insurance group by lat, lon having count(*) = 1 );
185. 部门工资前三高的所有员工
自己写了个,但是不完整,想到用窗口函数,但是没有合适的:
select d.name Department, e.name employee, salary
from employee e
left join department d
on e.departmentId = d.id
看官方题解,感觉乱七八糟的。
看高赞题解,感觉也论七八糟的。
猴子学数据分析题解-图解SQL面试题:经典TOPN问题:使用了我想法中的窗口函数,缺点是会报错,需要改一下,直接再和department表内联结一下获得部门名字就可以通过了
with a as (
select *,
dense_rank() over (
partition by departmentId
order by salary desc
) as r # 注意这里不要用rank做别名,rank是保留字
from employee
)
select b.name Department, a.name employee, salary
from a, Department b
where a.departmentId = b.id and r <= 3
笔记:
-
注意这里不要用rank做别名,rank是保留字
-
只能是dense_rank,row_number和rank都不行
这是dense_rank的排名结果:| Department | employee | salary | r | | ---------- | -------- | ------ | - | | IT | Max | 90000 | 1 | | IT | Joe | 85000 | 2 | | IT | Randy | 85000 | 2 | | IT | Will | 70000 | 3 | | Sales | Henry | 80000 | 1 | | Sales | Sam | 60000 | 2 |
这是row_number的排名结果:
| Department | employee | salary | r | | ---------- | -------- | ------ | - | | IT | Max | 90000 | 1 | | IT | Joe | 85000 | 2 | | IT | Randy | 85000 | 3 | | Sales | Henry | 80000 | 1 | | Sales | Sam | 60000 | 2 |
这是rank的排名结果:
| Department | employee | salary | r | | ---------- | -------- | ------ | - | | IT | Max | 90000 | 1 | | IT | Joe | 85000 | 2 | | IT | Randy | 85000 | 3 | | Sales | Henry | 80000 | 1 | | Sales | Sam | 60000 | 2 |
怎么看出来题目需要的是dense_rank?可以从输出答案看出来。
1667. 修复表中的名字
完全没见过,直接看题解。
考察的是SQL中相关字符串函数。
一、计算字段
其实本题主要考察的就是计算字段的使用。
二、知识点
2.1 CONCAT()
函数
CONCAT 可以将多个字符串拼接在一起。
2.2 LEFT(str, length)
函数
从左开始截取字符串,length 是截取的长度。
2.3 UPPER(str)
与 LOWER(str)
UPPER(str)
将字符串中所有字符转为大写
LOWER(str)
将字符串中所有字符转为小写
2.4 SUBSTRING(str, begin, end)
截取字符串,end 不写默认为空。
SUBSTRING(name, 2)
从第二个截取到末尾,注意并不是下标,就是第二个。
● CONCAT 用来拼接字符串 ● LEFT 从左边截取字符 ● RIGHT 从右边截取字符 ● UPPER 变为大写 ● LOWER 变为小写 ● LENGTH 获取字符串长度
select user_id, concat(upper(left(name, 1)), lower(right(name, length(name)-1))) as name
from users
order by user_id
1527. 患某种疾病的患者
MySQL 正则表达式|菜鸟教程
MySQL LIKE 子句|菜鸟教程
上面两种都可以。第一遍刷只看了正则表达式。
题解
select *
from patients
where conditions regexp "^DIAB1|\\sDIAB1"
笔记:正则表达式中的第二种情况,没有使用'^'
和‘$’
就可以表示在开头|在中间|在结尾都可以了,不需要还专门用字符匹配前后的占位。
196. 删除重复的电子邮箱
对「官方」题解中 “delete” 和 “>” 的解释,推荐!
delete p1
from Person p1, Person p2
where p1.email = p2.email and p1.id > p2.id
176. 第二高的薪水
用ifnull
不是这样用的:
select ifnull(salary, null) as SecondHighestSalary
from employee
order by salary
limit 1, 1
是这样用的:
select ifnull((
select distinct salary
from employee
order by salary desc
limit 1, 1
), null) as SecondHighestSalary
1484. 按日期分组销售产品
参考题解,重点掌握group_concat
函数用法。
select sell_date, count(distinct product) num_sold,
group_concat(distinct product order by product separator ',') products
from activities
group by sell_date
order by sell_date
1327. 列出指定时间段内所有的下单产品
select product_name, sum(unit) unit
from orders
left join products
on orders.product_id = products.product_id
where date_format(order_date, '%Y-%m') = '2020-02'
group by product_name
having unit >= 100
笔记:
- 日期条件:
date_format(created_at, '%Y-%m') = '2020-02'
- 注意%m要小写,%Y要大写
1517. 查找拥有有效邮箱的用户
select *
from users
where mail REGEXP "^[a-z|A-Z][a-z|A-Z|0-9|_|/.|/-]*@leetcode[/.]com$"
笔记:
- 下划线
_
不需要/
- 域名里面的
.
需要/