一刷用MySQL
1757. 可回收且低脂的产品
我的初始写法:
SELECT product_id FROM Products WHERE low_fats == 'Y' AND recyclable == 'Y'
改成:
SELECT product_id FROM Products WHERE low_fats = 'Y' AND recyclable = 'Y'
和编程代码不一样,判断一致不需要双等号。
584. 寻找用户推荐人
我的初始写法:
SELECT DISTINCT name FROM Customer WHERE referee_id != 2
改成:
SELECT name FROM Customer WHERE referee_id != 2 OR referee_id IS NULL
DISTINCT 要写在选择的字段前面。这道题不需要DISTINCT。
595. 大的国家
一次写对了。
SELECT name, population, area FROM World WHERE area >= 3000000 OR population >= 25000000
1148. 文章浏览 I
我的初始写法:
SELECT DISTINCT author_id as 'id' ORDER BY id FROM Views WHERE author_id = viewer_id
运行报错。
题解:
SELECT
DISTINCT author_id AS id
FROM
Views
WHERE
author_id = viewer_id
ORDER BY
id
总结:
- AS后面的命名不需要单引号或者双引号
- SELECT选择之后再ORDER BY,这里的ORDER BY之后可以用刚刚命名的id字段。
SQL ORDER BY 关键字
数据库:order by排序语句的用法
1683. 无效的推文
我的初始写法:
SELECT
tweet_id
FROM
Tweets
WHERE
len(content) > 15
运行报错。FUNCTION test.len does not exist
题解:
SELECT
tweet_id
FROM
Tweets
WHERE
CHAR_LENGTH(content) > 15
对于SQL表,用于计算字符串中字符数的最佳函数是 CHAR_LENGTH(str),它返回字符串 str
的长度。
另一个常用的函数 LENGTH(str) 在这个问题中也适用,因为列 content
只包含英文字符,没有特殊字符。否则,LENGTH() 可能会返回不同的结果,因为该函数返回字符串 str 的字节数,某些字符包含多于 1 个字节。
以字符 ‘¥’ 为例:CHAR_LENGTH() 返回结果为 1,而 LENGTH() 返回结果为 2,因为该字符串包含 2 个字节。
1378. 使用唯一标识码替换员工ID
我的写法:(忘了join怎么搞的)
SELECT name, unique_id
FROM Employees, EmployeeUNI
SELECT name, unique_id
FROM Employees
INNER JOIN EmployeeUNI
ON Employees.id = EmployeeUNI.id
正确答案:
SELECT name, unique_id
FROM Employees
LEFT JOIN EmployeeUNI
ON Employees.id = EmployeeUNI.id
题解:
SELECT
EmployeeUNI.unique_id, Employees.name
FROM
Employees
LEFT JOIN
EmployeeUNI
ON
Employees.id = EmployeeUNI.id;
1068. 产品销售分析 I
我的写法:
一次通过
SELECT product_name, year, price
FROM Sales
LEFT JOIN Product
ON Sales.product_id = Product.product_id
1581. 进店却未进行过交易的顾客
我的写法:
运行结果是空的
SELECT customer_id, transaction_id
FROM Visits
LEFT JOIN Transactions
ON Visits.visit_id = Transactions.visit_id
WHERE transaction_id = null
题解:
-
先LEFT JOIN看看结果
SELECT * FROM Visits LEFT JOIN Transactions ON Visits.visit_id = Transactions.visit_id
-
加上过滤条件
WHERE transaction_id IS NULL
或者WHERE amount IS NULL
,而不是=NULL
。SELECT * FROM Visits LEFT JOIN Transactions ON Visits.visit_id = Transactions.visit_id WHERE transaction_id IS NULL
-
发现customer_id为54的顾客出现了两次,使用GROUP BY聚合一下,然后数没付钱的次数。
SELECT customer_id, count(customer_id) AS count_no_trans FROM Visits LEFT JOIN Transactions ON Visits.visit_id = Transactions.visit_id WHERE transaction_id IS NULL GROUP BY customer_id
不要Count(transaction_id),会变成全是0。
197. 上升的温度
我的解法:
用了子查询,没有报错,但输出结果为空。
SELECT *
FROM Weather
WHERE temperature > (
SELECT Temperature
FROM Weather
WHERE recordDate = recordDate - 1
)
题解:
猴子数据分析
“日期”这一列的数据类型是日期类型(date)。
-
交叉联结
使用交叉联结(corss join)会将两个表中所有的数据两两组合。
直接使用交叉联结的业务需求比较少见,往往需要结合具体条件,对数据进行有目的的提取,本题需要结合的条件就是“前一天”。
这个交叉联结的结果表,可以看作左边三列是表a,右边三列是表b。SELECT * FROM Weather AS a CROSS JOIN Weather AS b
这里必须要
AS a/b
,不然会报错不知道用哪个Weather
。 -
另一个需要着重去考虑的,就是如何找到 “昨天”(前一天),这里为大家介绍两个时间计算的函数:
datediff(日期1, 日期2):
得到的结果是日期1与日期2相差的天数。
如果日期1比日期2大,结果为正;如果日期1比日期2小,结果为负。
另一个关于时间计算的函数是:
timestampdiff(时间类型, 日期1, 日期2)
这个函数和上面diffdate的正、负号规则刚好相反。
日期1大于日期2,结果为负,日期1小于日期2,结果为正。
在“时间类型”的参数位置,通过添加“day”, “hour”, “second”等关键词,来规定计算天数差、小时数差、还是分钟数差。
SELECT a.id FROM Weather AS a CROSS JOIN Weather AS b WHERE datediff(a.recordDate, b.recordDate) = 1 AND a.temperature > b.temperature
1661. 每台机器的进程平均运行时间
我的初始答案:
SELECT a.machine_id, SUM(b.timestamp - a.timestamp)/COUNT(a.process_id) AS processing_time
FROM Activity AS a
CROSS JOIN Activity AS b
ON a.machine_id = b.machine_id AND a.process_id = b.process_id
WHERE a.activity_type = 'start' AND b.activity_type = 'end'
GROUP BY machine_id
少了四舍五入保留3位小数:
ROUND(x, 3)
最终答案:
SELECT a.machine_id, ROUND(SUM(b.timestamp - a.timestamp)/COUNT(a.process_id), 3) AS processing_time
FROM Activity AS a
CROSS JOIN Activity AS b
ON a.machine_id = b.machine_id AND a.process_id = b.process_id
WHERE a.activity_type = 'start' AND b.activity_type = 'end'
GROUP BY machine_id
看题解可以优化:
使用自连接而不是交叉连接
# Write your MySQL query statement below
select
a1.machine_id,
round(avg(a2.timestamp -a1.timestamp ),3) as processing_time
from Activity as a1 join Activity as a2 on
a1.machine_id=a2.machine_id and
a1.process_id=a2.process_id and
a1.activity_type ='start' and
a2.activity_type ='end'
group by machine_id;
577. 员工奖金
我的答案:
SELECT e.name, b.bonus
FROM Employee AS e
LEFT JOIN Bonus AS b
ON e.empId = b.empId
WHERE b.bonus < 1000 OR b.bonus IS NULL
测试用例通过了,但耗时太长。
题解:
SELECT name, bonus
FROM Employee
LEFT JOIN Bonus
ON Employee.empId = Bonus.empId
WHERE bonus < 1000 OR bonus IS NULL
结论:将表格重命名也会花费查询的时间。
1280. 学生们参加各科测试的次数
我的答案:
SELECT Students.student_id, Students.student_name, Examinations.subject_name, COUNT(Examinations.subject_name) AS attended_exams
FROM Students
JOIN Examinations
ON Students.student_id = Examinations.student_id
# JOIN Subjects
# ON Examinations.subject_name = Subjects.subject_name
GROUP BY Students.student_id AND Examinations.subject_name
结果:
| student_id | student_name | subject_name | attended_exams |
| ---------- | ------------ | ------------ | -------------- |
| 1 | Alice | Math | 11 |
并没有成功运行出来结果。
题解:
JOIN和CROSS JOIN的区别
JOIN和CROSS JOIN的主要区别在于JOIN需要指定连接条件,而CROSS JOIN不需要。因此,JOIN通常比CROSS JOIN更实用,因为连接条件确保了只有相关的数据行会被连接在一起,而CROSS JOIN可能会产生巨大的结果集。
另一个区别在于CROSS JOIN的语句通常比JOIN的语句更慢,因为它需要计算表的笛卡尔积,而JOIN只需要关注两个表中的相关数据行。
以下是CROSS JOIN和INNER JOIN的示例,以说明它们之间的区别:
SELECT *
FROM table1
CROSS JOIN table2
WHERE table1.column1 = table2.column2;
该语句与以下INNER JOIN语句等效:
SELECT *
FROM table1
INNER JOIN table2
ON table1.column1 = table2.column2;
INNER JOIN在查询之前仅返回相关数据行,而CROSS JOIN将返回两个表的笛卡尔积,并需要额外的WHERE子句来指定连接条件。
-
通过一个子查询创建表
grouped
,它统计每个学生参加每个科目的考试次数。
GROUP BY的条件字段不能用AND连接,如果有两个或以上的查询条件就用","
连接SELECT student_id, subject_name, COUNT(*) AS attended_exams FROM Examinations GROUP BY student_id, subject_name
得到结果:
| student_id | subject_name | attended_exams | | ---------- | ------------ | -------------- | | 1 | Math | 3 | | 1 | Physics | 2 | | 1 | Programming | 1 | | 2 | Programming | 1 | | 13 | Math | 1 | | 13 | Programming | 1 | | 13 | Physics | 1 | | 2 | Math | 1 |
-
为了获得 (
student_id,subject_name
) 的所有组合,我们使用交叉联接将表 Student 中的每一行与表 Subject 中的每一行组合在一起,从而得到两个表中的student_id
和subject_name
的所有可能组合。SELECT * FROM Students CROSS JOIN Subjects
得到结果:
| student_id | student_name | subject_name | | ---------- | ------------ | ------------ | | 1 | Alice | Programming | | 1 | Alice | Physics | | 1 | Alice | Math | | 2 | Bob | Programming | | 2 | Bob | Physics | | 2 | Bob | Math | | 13 | John | Programming | | 13 | John | Physics | | 13 | John | Math | | 6 | Alex | Programming | | 6 | Alex | Physics | | 6 | Alex | Math |
-
将1,2步骤的表
Left join
LEFT JOIN
后面如果有两个及以上字段,需要用AND
连接,这里不能用","
连接。SELECT Students.student_id, Subjects.subject_name, attended_exams FROM Students CROSS JOIN Subjects LEFT JOIN( SELECT student_id, subject_name, COUNT(*) AS attended_exams FROM Examinations GROUP BY student_id, subject_name ) grouped ON Students.student_id=grouped.student_id AND Subjects.subject_name=grouped.subject_name
用
COUNT(*)
而不是COUNT(subject_name)
,不然如果该列出现的NULL值,统计结果会是0.
结果如下:| student_id | subject_name | attended_exams | | ---------- | ------------ | -------------- | | 1 | Programming | 1 | | 1 | Physics | 2 | | 1 | Math | 3 | | 2 | Programming | 1 | | 2 | Physics | null | | 2 | Math | 1 | | 13 | Programming | 1 | | 13 | Physics | 1 | | 13 | Math | 1 | | 6 | Programming | null | | 6 | Physics | null | | 6 | Math | null |
可以看到,结果不是按
student_id
和subject_name
排序的。attended_exams
有一些空值。 -
将3的结果按
student_id
和subject_name
排序,使用IFNULL()
函数将attended_exams
中的空值替换为0,加上之前忘记SELECT
的student_name
。SELECT Students.student_id, Students.student_name, Subjects.subject_name, IFNULL(attended_exams, 0) AS attended_exams FROM Students CROSS JOIN Subjects LEFT JOIN( SELECT student_id, subject_name, COUNT(*) AS attended_exams FROM Examinations GROUP BY student_id, subject_name ) grouped ON Students.student_id=grouped.student_id AND Subjects.subject_name=grouped.subject_name ORDER BY Students.student_id, Subjects.subject_name
结果如下:
| student_id | student_name | subject_name | attended_exams | | ---------- | ------------ | ------------ | -------------- | | 1 | Alice | Math | 3 | | 1 | Alice | Physics | 2 | | 1 | Alice | Programming | 1 | | 2 | Bob | Math | 1 | | 2 | Bob | Physics | 0 | | 2 | Bob | Programming | 1 | | 6 | Alex | Math | 0 | | 6 | Alex | Physics | 0 | | 6 | Alex | Programming | 0 | | 13 | John | Math | 1 | | 13 | John | Physics | 1 | | 13 | John | Programming | 1 |
AC!
这道题2刷要好好看看,多练几遍。
570. 至少有5名直接下属的经理
我的解法:
考虑用子查询
- 先统计
managerId
出现的次数。SELECT managerId, COUNT(*) FROM Employee GROUP BY managerId
- 怎么找出
managerId
出现次数大于等于5的id呢?
我的尝试失败了:SELECT managerId, COUNT(*) AS number FROM Employee GROUP BY managerId WHERE number >= 5
结果:SELECT managerId, COUNT(*) FROM Employee GROUP BY managerId WHERE COUNT(*) >= 5
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE COUNT(*) >= 5' at line 5
错误原因:
GROUP BY
之后的判断条件不能用WHERE
,要用HAVING
.
修正:
SELECT managerId
FROM Employee
GROUP BY managerId
HAVING COUNT(*) >= 5
- 想要在原表中找到id在子查询结果中的。
SELECT name FROM Employee WHERE id in ( SELECT managerId FROM Employee GROUP BY managerId HAVING COUNT(*) >= 5 ) bigManager
结果返回错误:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'Manager' at line 9
题解做法:
将此表命名为 Manager,随后与 Employee 表做连接操作,得到每个经理的名字。
SELECT name
FROM Employee
JOIN (
SELECT managerId
FROM Employee
GROUP BY managerId
HAVING COUNT(*) >= 5
) Manager
ON Employee.id = Manager.managerId
AC!
1934. 确认率
我的解法:
-
先选出每个用户的requst数量
SELECT user_id, COUNT(*) AS request_number FROM Confirmations GROUP BY user_id
结果:
| user_id | request_number | | ------- | -------------- | | 3 | 2 | | 7 | 3 | | 2 | 2 |
-
将
Signups
表LEFT JOIN
Confirmations
表,这样没有request过的用户的request值就是null.SELECT Signups.user_id, action FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id
结果:
| user_id | action | | ------- | --------- | | 3 | timeout | | 3 | timeout | | 7 | confirmed | | 7 | confirmed | | 7 | confirmed | | 2 | timeout | | 2 | confirmed | | 6 | null |
-
在2表的基础上计算出每个用户confirmed的数量,得到confirmed表。
SELECT Signups.user_id, count(*) AS confirmed FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id WHERE action="confirmed" GROUP BY user_id
结果:
| user_id | confirmed | | ------- | --------- | | 7 | 3 | | 2 | 1 |
-
在2表的基础上GROUP BY user_id + COUNT(action),得到request_number表。
注意是要COUNT(action)而不是COUNT(*)
,区别在于COUNT(*)
是action的数量为0也会计算为1. 也就是说COUNT(action)计算的是action的值相加,COUNT(*)计算的是action的行数。SELECT Signups.user_id, COUNT(action) FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id GROUP BY user_id
结果:
| user_id | COUNT(action) | | ------- | ------------- | | 3 | 2 | | 7 | 3 | | 2 | 2 | | 6 | 0 |
-
将confirmed表LEFT JOIN到request_number表上,用0替换变为NULL的值。
SELECT * FROM ( SELECT Signups.user_id, COUNT(action) FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id GROUP BY user_id ) request_number LEFT JOIN ( SELECT Signups.user_id, count(*) AS confirmed FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id WHERE action="confirmed" GROUP BY user_id ) confirmed ON request_number.user_id=confirmed.user_id
得到结果:
| user_id | COUNT(action) | user_id | confirmed | | ------- | ------------- | ------- | --------- | | 3 | 2 | null | null | | 7 | 3 | 7 | 3 | | 2 | 2 | 2 | 1 | | 6 | 0 | null | null |
SELECT request_number.user_id, request_number, IFNULL(confirmed, 0) AS confirmed_number FROM ( SELECT Signups.user_id, COUNT(action) AS request_number FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id GROUP BY user_id ) request_number LEFT JOIN ( SELECT Signups.user_id, count(*) AS confirmed FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id WHERE action="confirmed" GROUP BY user_id ) confirmed ON request_number.user_id=confirmed.user_id
结果:
user_id request_number confirmed_number 3 2 0 7 3 3 2 2 1 6 0 0 -
用
confirmed_number/request_number
,并ROUND到小数点后2位。因为user_id
为6的用户的request数为0,因此分母为0,rate的结果是null,这时候再用一次IFNULL。
完整代码如下:SELECT request_number.user_id, IFNULL(ROUND(IFNULL(confirmed, 0)/request_number, 2), 0) AS confirmation_rate FROM ( SELECT Signups.user_id, COUNT(action) AS request_number FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id GROUP BY user_id ) request_number LEFT JOIN ( SELECT Signups.user_id, count(*) AS confirmed FROM Signups LEFT JOIN Confirmations ON Signups.user_id=Confirmations.user_id WHERE action="confirmed" GROUP BY user_id ) confirmed ON request_number.user_id=confirmed.user_id
运行结果:
user_id confirmation_rate 3 0 7 1 2 0.5 6 0 AC!
620. 有趣的电影
- ORDER BY column1, column2 ASC|DESC
ASC升序,DESC降序
自己写的:
SELECT *
FROM cinema
WHERE id % 2 = 1 and description != 'boring'
ORDER BY rating DESC
一次通过。
- 总结一下SQL的哪种需要对应需要什么函数
1251. 平均售价
我自己的:
SELECT Prices.product_id, ROUND(SUM(price * units)/SUM(units), 2) AS average_price
FROM Prices
CROSS JOIN UnitsSold
ON Prices.product_id = UnitsSold.product_id
WHERE purchase_date BETWEEN start_date AND end_date
GROUP BY Prices.product_id
结果:
解答错误。
输入:
Prices =
| product_id | start_date | end_date | price |
| ---------- | ---------- | ---------- | ----- |
| 1 | 2019-02-17 | 2019-02-28 | 5 |
| 1 | 2019-03-01 | 2019-03-22 | 20 |
| 2 | 2019-02-01 | 2019-02-20 | 15 |
| 2 | 2019-02-21 | 2019-03-31 | 30 |
| 3 | 2019-02-21 | 2019-03-31 | 30 |
UnitsSold =
| product_id | purchase_date | units |
| ---------- | ------------- | ----- |
| 1 | 2019-02-25 | 100 |
| 1 | 2019-03-01 | 15 |
| 2 | 2019-02-10 | 200 |
| 2 | 2019-03-22 | 30 |
输出:
| product_id | average_price |
| ---------- | ------------- |
| 1 | 6.96 |
| 2 | 16.96 |
预期结果:
| product_id | average_price |
| ---------- | ------------- |
| 1 | 6.96 |
| 2 | 16.96 |
| 3 | 0 |
改成:
# Write your MySQL query statement below
SELECT Prices.product_id, IFNULL(ROUND(SUM(price * units)/SUM(units), 2), 0) AS average_price
FROM Prices
LEFT JOIN UnitsSold
ON Prices.product_id = UnitsSold.product_id
WHERE purchase_date BETWEEN start_date AND end_date OR purchase_date is NUll
GROUP BY Prices.product_id
CROSS JOIN
改成LEFT JOIN
- 加上
IFNULL(x, 0)
1075. 项目员工 I
我的答案:
SELECT project_id, ROUND(SUM(experience_years)/COUNT(Project.employee_id), 2) AS average_years
FROM Project
LEFT JOIN Employee
ON Project.employee_id = Employee.employee_id
WHERE experience_years IS NOT NULL
GROUP BY project_id
通过。
要注意的点:如果experience_years是空的,就不计算这个员工——WHERE experience_years IS NOT NULL
1633. 各赛事的用户注册率
我的答案:
SELECT contest_id, ROUND(COUNT(user_id) * 100 / (
SELECT COUNT(user_id)
FROM Users
), 2) AS percentage
FROM Register
GROUP BY contest_id
ORDER BY percentage DESC
返回的结果表按 percentage 的 降序 排序了,若相同则按 contest_id 的 升序 排序怎么排序?
——ORDER BY percentage DESC
后面加个contest_id
就可以了。
1211. 查询结果的质量和占比
我的答案:
SELECT table_a.query_name, table_a.quality, IFNULL(ROUND(table_b.poor_query/table_a.count_by_query*100, 2), 0) AS poor_query_percentage
FROM (SELECT query_name, COUNT(result) AS count_by_query, ROUND(SUM(rating/position)/COUNT(result), 2) AS quality
FROM Queries
GROUP BY query_name) table_a
LEFT JOIN (
SELECT query_name, COUNT(rating) AS poor_query
FROM Queries
WHERE rating < 3
GROUP BY query_name
) table_b
ON table_a.query_name = table_b.query_name
WHERE table_a.query_name IS NOT NULL
用了两个子查询,写得很复杂。
官方题解:
SELECT
query_name,
ROUND(AVG(rating/position), 2) quality,
ROUND(SUM(IF(rating < 3, 1, 0)) * 100 / COUNT(*), 2) poor_query_percentage
FROM Queries
GROUP BY query_name
比我多用了AVG
和IF(rating < 3, 1, 0)
,显得很简洁。
为什么不用判断WHERE table_a.query_name IS NOT NULL
?——试了一下,也是需要的。
最终答案:
SELECT query_name,
ROUND(AVG(rating/position), 2) quality,
ROUND(SUM(IF(rating < 3, 1, 0))/COUNT(*)*100, 2) poor_query_percentage
FROM Queries
WHERE query_name IS NOT NULL
GROUP BY query_name
1193. 每月交易 I
如何提取date中的月份?
官方题解:
本题要求 查找每个月和每个国家/地区的事务数及其总金额、已批准的事务数及其总金额,我们可以将这句话拆分成几个子任务:
- 查找每个月和每个国家/地区。
数据表中的trans_date
是精确到日,我们可以使用DATE_FORMAT()
函数将日期按照年月%Y-%m
输出。比如将2019-01-02
转换成2019-01
。
获取到所有的月份后,使用DATE_FORMAT(trans_date, '%Y-%m')
GROUP BY
聚合每个月和每个国家的记录就完成了第一步。 - 查找总的事务数。
第一步已经将数据按月和国家聚合,只需要使用COUNT
函数就能获取到总的事务数。COUNT(*) AS trans_count
- 查找总金额。
使用SUM
函数计算总金额。SUM(amount) AS trans_total_amount
- 查找已批准的事物数。
已批准的事物的state
标记为approved
。首先使用IF
函数将state = 'approved'
的记录标记为 1,否则为NULL
。再使用COUNT
计算总量。COUNT(IF(state = 'approved', 1, NULL)) AS approved_count
- 查找已批准的事物的总金额。
和第四步一样,先使用IF
函数,再使用SUM
函数。SUM(IF(state = 'approved', amount, 0)) AS approved_total_amount
看了题解后我的答案:
SELECT DATE_FORMAT(trans_date, '%Y-%m') AS month, country,
COUNT(*) AS trans_count,
COUNT(IF(state='approved', 1, NULL)) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(IF(state='approved', amount, 0)) AS approved_total_amount
FROM Transactions
GROUP BY month, country
1174. 即时食物配送 II
我的思路:
我可以计算即时订单。
SELECT delivery_id, customer_id, IF(customer_pref_delivery_date = order_date,
'immediate', 'plan') AS status
FROM Delivery
结果:
| delivery_id | customer_id | status |
| ----------- | ----------- | --------- |
| 1 | 1 | plan |
| 2 | 2 | immediate |
| 3 | 1 | plan |
| 4 | 3 | immediate |
| 5 | 3 | plan |
| 6 | 2 | plan |
| 7 | 4 | immediate |...
首次订单怎么计算,需要看题解。
本题最重要的就是求每一个用户的首单数据:我们使用 group by
聚合每个用户的数据,再使用 min
函数求出首单的时间。将 (customer_id, order_date)
作为查询条件,使用 where in
便可查出具体的数据。
SELECT delivery_id, customer_id, min(order_date)
FROM Delivery
GROUP BY customer_id
结果:
| delivery_id | customer_id | min(order_date) |
| ----------- | ----------- | --------------- |
| 1 | 1 | 2019-08-01 |
| 2 | 2 | 2019-08-02 |
| 4 | 3 | 2019-08-21 |
| 7 | 4 | 2019-08-09 |...
与上一步我的题解结合一下:
注意WHERE 后面的条件要加括号
SELECT delivery_id, customer_id, IF(customer_pref_delivery_date = order_date,
'immediate', 'plan') AS status
FROM Delivery
WHERE (customer_id, order_date) IN (
SELECT customer_id, min(order_date)
FROM Delivery
GROUP BY customer_id
)
结果:
delivery_id | customer_id | status |
---|---|---|
1 | 1 | plan |
2 | 2 | immediate |
5 | 3 | plan |
7 | 4 | immediate |
如何计算即时订单在所有用户的首次订单中的比例呢?
题解又给了很聪明的做法,不需要新增一列status,计算customer_pref_delivery_date = order_date
的总数就行:
SELECT ROUND(SUM(customer_pref_delivery_date = order_date)/COUNT(*)*100, 2) AS immediate_percentage
FROM Delivery
WHERE (customer_id, order_date) IN (
SELECT customer_id, min(order_date)
FROM Delivery
GROUP BY customer_id
)
结果:
AC!
550. 游戏玩法分析 IV
我的解法:
和上题求用户首单一样,这次是求用户首次登录的时间。同样可以先用group by
,再用min
求出。
SELECT ROUND(SUM(IF(event_date-start_date=1, 1, 0))/MAX(Activity.player_id), 2) AS fraction
FROM Activity
LEFT JOIN (
SELECT player_id, min(event_date) AS start_date
FROM Activity
GROUP BY player_id
) A
ON Activity.player_id = A.player_id
不知道怎么计算用户数量,暂且用MAX(Activity.player_id)
。
结果:
解答错误。当用户id只有1和3的时候就不对了。
查了sql语法,改成COUNT(DISTINCT Activity.player_id)
。
SELECT ROUND(SUM(IF(event_date-start_date=1, 1, 0))/COUNT(DISTINCT Activity.player_id), 2) AS fraction
FROM Activity
LEFT JOIN (
SELECT player_id, min(event_date) AS start_date
FROM Activity
GROUP BY player_id
) A
ON Activity.player_id = A.player_id
结果:
依然解答错误。
错误的用例:
| player_id | device_id | event_date | games_played |
| --------- | --------- | ---------- | ------------ |
| 1 | 2 | 2016-02-29 | 5 |
| 1 | 2 | 2016-03-01 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
跨天减不了吗?
——看题解:
首先,求出所有用户首次登录的第二天的时间。方法是查询出 Activity 表中每个用户的第一天时间,并加上 1.
select player_id, DATE_ADD(MIN(event_date), INTERVAL 1 DAY) as second_date
from Activity
group by player_id
将此表命名为 Expected。随后我们要从 Activity 表中查询 event_date 与 Expected.sencond_date 重叠的部分,注意此判定要限定在用户相同的前提下。这部分用户即为在首次登录后第二天也登录了的用户:
select Activity.player_id as player_id
from (
select player_id, DATE_ADD(MIN(event_date), INTERVAL 1 DAY) as second_date
from Activity
group by player_id
) as Expected, Activity
where Activity.event_date = Expected.second_date and Activity.player_id = Expected.player_id
将此表命名为 Result。随后我们只需要得到 Result 表中用户的数量,以及 Activity 表中用户的数量,相除并保留两位小数即可。
select IFNULL(round(count(distinct(Result.player_id)) /
count(distinct(Activity.player_id)), 2), 0) as fraction
得到启发,可以用INTERVAL 1 DAY
代替减号进行的时间计算。
SELECT ROUND(SUM(IF(event_date=DATE_ADD(start_date, INTERVAL 1 DAY), 1, 0))/COUNT(DISTINCT Activity.player_id), 2) AS fraction
FROM Activity
LEFT JOIN (
SELECT player_id, min(event_date) AS start_date
FROM Activity
GROUP BY player_id
) A
ON Activity.player_id = A.player_id
结果:
AC!
2356. 每位教师所教授的科目种类的数量
太简单了,直接AC.
1141. 查询近30天活跃用户数
太久没写了,完全没思路,还是得一天练一道sql保持手感。
一开始写了这个代码:
SELECT *
FROM Activity
WHERE A.acticity_type != 'end_session'
完全不对,甚至语法错误。
直接看题解:
select activity_date day, count(distinct user_id) active_users
from activity
where datediff('2019-07-27', activity_date) >= 0 AND datediff('2019-07-27', activity_date) <30
group by activity_date
作者:MythicMyuu
链接:https://leetcode.cn/problems/user-activity-for-the-past-30-days-i/solutions/1484069/by-mythicmyuu-mlj9/
笔记:
- 函数和变量都可以用全小写
- select后面的重命名可以不加as,直接重命名
- 如何计算时间间隔:
datediff(时间1,时间2)
- datediff还要限制>=0不然会查询到07-27之后的数据。
1084. 销售分析III
先自己写了一版:
select product_id, product_name
from product
where product_id in (
select distinct product_id
from sales
where datediff(sale_date, '2019-01-01') >= 0 and datediff(sale_date, '2019-03-31') <= 0
)
这样的结果计算出来多了product_id=2,因为,没有排除除了在春季售出,还是其他时候售出的商品。
改了一版:
select product_id, product_name
from product
where product_id in (
select distinct product_id
from sales
where datediff(sale_date, '2019-01-01') >= 0 and datediff(sale_date, '2019-03-31') <= 0
) and not in (
select distinct product_id
from sales
where datediff(sale_date, '2019-01-01') < 0 and datediff(sale_date, '2019-03-31') > 0
)
语法错误。
看题解:
select product_id, product_name
from product
where product_id not in (
select s.product_id
from sales s
where sale_date < '2019-01-01' or sale_date > '2019-03-31'
)
and product_id in (
select s.product_id
from sales s
)
作者:欢乐马
链接:https://leetcode.cn/problems/sales-analysis-iii/solutions/2254466/yong-bu-ji-de-si-xiang-lai-zuo-not-in-by-3qfs/
笔记:
- 虽然where后面两个条件都是针对product_id的,但第二个条件还是需要加上product_id,不然语法不对
- 这题只是判断时间是否在某段时间之前或之后,可以直接用
'>', '<'
- 刚才的第二段判断写错了,两个区间并不想交,要用
or
- 也可以用题解的思路:第三个条件没什么,直接in sales这个表就好了,限制这个product_id必须是有过出售记录的。
596. 超过5名学生的课
直接AC了。
1729. 求关注者的数量
直接AC了。
619. 只出现一次的最大数字
自己先写出来一版:
select num
from MyNumbers
group by num
having count(num) = 1
order by num asc
结果:返回的是一列,而不是最大值
原因:涉及到一个语法——select ... from .... limit a ,b
🌟一万五千字!你最常用的 SQL 这些核心知识点,我都帮你准备好了!
加上之后:
select num
from MyNumbers
group by num
having count(num) = 1
order by num desc
limit 0, 1
结果:
少了返回空值的情况。
试了试ifnull,结果老是不对。
正确写法:
select IFNULL((select num
from MyNumbers
group by num
having count(num) = 1
order by num desc
limit 0, 1)
, null) num
⚠️注意:
ifnull用法注意事项,结果集为空,select中不调用ifnull函数
因为SQL在执行时是按照 FROM -> ON -> OUTER(JOIN) -> WHERE -> GROUP BY -> HAVING -> SELECT -> DISTINCT -> ORDER BY -> limit 这个执行顺序执行的
1045. 买下所有产品的客户
没思路。
题解1:
因为product table的产品数目固定, 所以只要 groupby customer_id 后只要他们中
having count distinct product_key= product table的产品数目,
相当于购买了全部的产品
只需要总产品数量相等就可以了,不需要每个产品都相等去比较一个list
完整代码:
select customer_id
from customer
group by customer_id
having count(distinct product_key) in (
select count(distinct product_key)
from product
)
题解2:
首先会过滤掉Customer表中不在Product表中的product_key,然后再进行分组和计数,保证每个客户购买的产品种类数量等于Product表中产品的总数,从而得到符合条件的客户的customer_id。
完整代码:
select customer_id
from customer
where product_key in (select product_key from product)
group by customer_id
having count(distinct product_key) = (
select count(*)
from product
)
这版更严谨。
1731. 每位经理的下属员工数量
select b.employee_id, b.name, count(a.name) as reports_count, round(avg(a.age), 0) as average_age
from employees a
left join employees b
on a.reports_to = b.employee_id
where b.employee_id is not null
group by b.employee_id
order by b.employee_id
AC!
1789. 员工的直属部门
没什么思路,直接看题解了。
题解:强烈建议不要使用UNION,窗口才是正道!官方BUG
【用Union的写法】—— 不严谨
select employee_id, department_id
from employee
where primary_flag = 'Y'
union
select employee_id, department_id
from employee
group by employee_id
having count(department_id) = 1
【窗口函数COUNT() OVER
写法】
with q as(
select
employee_id,
department_id,
primary_flag,
count(*) over(partition by employee_id) as count_over
from employee
)
select
employee_id,
department_id
from q
where primary_flag = 'Y' or count_over = 1
笔记:
with...as
:在SQL中被称为公用表表达式(CTE, Common Table Expression),用于定义一个临时的命名结果集,可以在后续的SELECT、INSERT、UPDATE或DELETE语句中引用。在MySQL中,CTE从MySQL 8.0开始被支持。相当于定义一个子查询。- SQL窗口函数详解
610. 判断三角形
判断三角形的条件:
可以使用三角不等式定理。根据这个定理,如果三条边长满足以下条件,那么它们就能构成一个三角形:
- 任意两边之和大于第三边。
- 任意两边之差小于第三边。
知道了判断三角形的条件也不知道怎么把条件的结果筛选出来。
看官方题解:使用 case...when...
select x, y, z, # 三个逗号一个都不能少,因为triangle是另一个筛选值
case
when x + y > z and x + z > y and y + z > x then 'Yes'
else 'No'
end as 'triangle'
from triangle
180. 连续出现的数字
知道怎么判断出现次数,但是怎么判断连续出现的次数呢?
看官方题解:用 DISTINCT 和 WHERE 语句
连续出现的意味着相同数字的 Id 是连着的,由于这题问的是至少连续出现 3 次,我们使用 Logs 并检查是否有 3 个连续的相同数字。
SELECT *
FROM
Logs l1,
Logs l2,
Logs l3
WHERE
l1.Id = l2.Id - 1
AND l2.Id = l3.Id - 1
AND l1.Num = l2.Num
AND l2.Num = l3.Num
;
id | num | id | num | id | num |
---|---|---|---|---|---|
1 | 1 | 2 | 1 | 3 | 1 |
然后我们从上表中选择任意的 Num 获得想要的答案。同时我们需要添加关键字 DISTINCT ,因为如果一个数字连续出现超过 3 次,会返回重复元素。
SELECT DISTINCT
l1.Num AS ConsecutiveNums
FROM
Logs l1,
Logs l2,
Logs l3
WHERE
l1.Id = l2.Id - 1
AND l2.Id = l3.Id - 1
AND l1.Num = l2.Num
AND l2.Num = l3.Num
;
1164. 指定日期的产品价格
写了一种自己的写法:
select *
from products
group by product_id
order by change_date
返回的结果总是最小的日期。
product_id | new_price | change_date |
---|---|---|
1 | 20 | 2019-08-14 |
2 | 50 | 2019-08-14 |
3 | 20 | 2019-08-18 |
看官方题解:
方法一:left join
和 ifnull
思路
本题的关键点在找到 2019-08-16 前所有有改动的产品及其最新价格和没有没有修改过价格的产品。
我们可以先找到所有的产品,再找到所有 2019-08-16 前有修改的产品和他们最新的价格,使用 left join 将两个查询联合。如果产品没有价格,说明没有修改过,设置为 10,如果有价格,设置为最新的价格。
算法
- 找出所有的产品:
select distinct product_id from products
product_id 1 2 3 - 找到 2019-08-16 前所有有改动的产品的最新价格。
- 使用 max 函数找到产品最新修改的时间。使用 where 查询限制时间小于等于 2019-08-16:
select distinct product_id, max(change_date) from products where change_date <= '2019-08-16' group by product_id
product_id max(change_date) 1 2019-08-16 2 2019-08-14 - 使用 where 子查询,根据 product_id 和 change_date 找到对应的价格:
💡注意:where 查询后面如果是有两个及以上的查询字段,要用括号括起来:(product_id, change_date)select product_id, new_price as price from products where (product_id, change_date) in ( select distinct product_id, max(change_date) from products where change_date <= '2019-08-16' group by product_id )
product_id price 2 50 1 35 - 上面两步已经找到了所有的产品和已经修改过价格的产品。使用 left join 得到所有产品的最新价格,如果没有设置为 10。
select p1.product_id, ifnull(p2.new_price, 10) as price from ( select distinct product_id from products ) as p1 -- 所有的产品 left join ( select product_id, new_price from products where (product_id, change_date) in ( select product_id, max(change_date) from products where change_date <= '2019-08-16' group by product_id ) ) as p2 -- 在 2019-08-16 之前有过修改的产品和最新的价格 on p1.product_id = p2.product_id
- 使用 max 函数找到产品最新修改的时间。使用 where 查询限制时间小于等于 2019-08-16:
方法二:窗口函数 dense_rank()
思路
本题的关键点在找到 2019-08-16 前所有有改动的产品及其最新价格和没有修改过价格的产品,所以 2019-08-16是个重要的分界线!
根据观察输出可知,我们必须找到所有的product_id,但是 2019-08-16这个日期把product_id隔开了,那么我们也必须拆开去分析。
(1)找到2019-08-16以及之前的每个product的new_price(改过的)
(2)用所有的product_id左连接上面的表(记作表1),其中没在表1中出现的product_id所对应的price则是没改动的price,也就是10。
算法
- 找出所有的产品:
select distinct product_id from Products
- 找到 2019-08-16 前所有有改动的产品的最新价格。
方法一:窗口函数 dense_rank()
方法二:不用窗口函数 where in 联合键匹配-
方法一:窗口函数 dense_rank()
找到价格修改的记录,并用dense_rank() 排序select product_id, new_price, change_date, dense_rank() over( partition by product_id order by change_date desc ) as rnk from products where change_date <= '2019-08-16'
💡注意:
- Dense_rank后面要加括号
- order by 后面要加desc
product_id new_price change_date rnk 1 35 2019-08-16 1 1 30 2019-08-15 2 1 20 2019-08-14 3 2 50 2019-08-14 1 子查询 找到价格修改的最新纪录。
select product_id, new_price from ( select product_id, new_price, change_date, dense_rank() over( partition by product_id order by change_date ) as rnk from products where change_date <= '2019-08-16' ) t where rnk = 1
-
这时我们已经找到2019-08-16以及之前的每个product的new_price
- left join补全2019-08-16之后的没改动的product_id和price
select distinct p1.product_id, ifnull(p2.new_price, 10) as price
from products p1
left join (
select product_id, new_price
from (
select product_id, new_price, change_date,
dense_rank() over(
partition by product_id
order by change_date desc
) as rnk
from products
where change_date <= '2019-08-16'
) t
where rnk = 1
) p2
on p1.product_id = p2.product_id
1204. 最后一个能进入巴士的人
先自己写了一下:
select *, sum(weight)
from queue
order by turn asc
where sum(weight) <= 1000
说语法不对,那应该怎么写呢?
题解提示了用窗口函数,自己试一下先:
select *,
sum(weight) over(
order by turn asc
) as weight_sum
from queue
where weight_sum <= 1000
依然语法错误。
看题解和官方题解:
方法一:窗口函数+子查询
思路
本题目的思路为如何找到累积重量达到1000的最后一个上车的乘客,那么我们可以根据turn来去对weight做累积和。然后找到cumu_weight<=1000的最后一个乘客(根据turn排倒序)。
⚠️ 不可能存在weight=0的情况,也就是cumu_weight必然递增,所以order by + limit 1可行。
算法
- 窗口函数求累积和
select
turn,
person_name,
sum(weight) over(order by turn) as cumu_weight
from Queue
- 子查询筛选cumu_weight <= 1000
select
person_name
from
(select
turn,
person_name,
sum(weight) over(order by turn) as cumu_weight
from Queue) t
where cumu_weight <= 1000
- 根据turn排倒序,找到最后一个上车的人
order by turn desc
limit 1
完整代码:
select person_name
from (
select *,
sum(weight) over(
order by turn
) as weight_sum
from queue
) t
where weight_sum <= 1000
order by weight_sum desc
limit 1
AC!
为什么之前那种写法不对:
在 SQL 中,窗口函数(如 SUM() OVER())的使用需要注意执行的顺序和范围。你的问题涉及到为什么不能在一个查询中直接使用 weight_sum 作为条件,而需要嵌套查询。
原因如下:
- SQL 执行顺序:
SQL 查询是按照特定的顺序执行的:
FROM 子句
WHERE 子句
GROUP BY 子句
HAVING 子句
SELECT 子句
ORDER BY 子句
窗口函数是在 SELECT 阶段执行的,但 WHERE 子句在 SELECT 之前执行。这意味着你不能在 WHERE 子句中直接使用窗口函数的结果,因为窗口函数的计算结果尚未生成。- 窗口函数的计算:
在窗口函数的计算中,SUM(weight) OVER (ORDER BY turn ASC)
会为每一行计算一个累积和,而这个累积和在 SELECT 子句计算之前是不可用的。- 嵌套查询的必要性:
为了使用窗口函数的计算结果作为过滤条件,我们需要先计算窗口函数的结果,并在外层查询中使用这些结果进行过滤。这就是为什么需要嵌套查询的原因。嵌套查询确保窗口函数的计算结果已经存在,然后外层查询可以使用这些结果进行过滤。
方法二:自连接
思路
将 b 表中的每一条数据和 a 表的每一条数据连接。
select *
from queue a, queue b
接下来对连接后的数据进行处理.
我们使用 a 表的 person_id 表示自身,b 表中的数据表示为包括自己在内的所有人。使用 GROUP BY a.person_id 处理每个人的数据。因为要计算每个人的 weight 加上之前所有人的 weight,使用查询条件 a.turn >= b.turn 找到所有在他之前以及他自己的重量。
select *
from queue a, queue b
where a.turn >= b.turn
group by a.person_id
再使用 SUM 计算总和并过滤掉大于 1000 的数据。
select a.person_name
from queue a, queue b
where a.turn >= b.turn
group by a.person_id
having sum(b.weight) <= 1000
拿到所有满足条件的数据后,只需要再对 a.turn 倒序取第一条即可。
select a.person_name
from queue a, queue b
where a.turn >= b.turn
group by a.person_id
having sum(b.weight) <= 1000
order by a.turn desc
limit 1
AC!
体会一下自连接的做法,不过还是没窗口函数清晰。