SQL相关

xwj1992930

已于 2023-09-22 10:29:46 修改

阅读量1.3k

点赞数

分类专栏： MySQL学习文章标签： sql

于 2023-06-29 10:34:51 首次发布

本文链接：https://blog.csdn.net/xwj1992930/article/details/131450880

版权

MySQL学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

1.join on and 和join on where

关于join on后面使用and时的情况，例如a left join b on a.id=b.id，会将左边表的列全部显示，如果添加a left join b on a.id=b.id and b.bonus<1000，条件and b.bonus<1000会对右表进行过滤，符合and b.bonus<1000条件的会与左表连接，显示具体的值，如果不符合and b.bonus<1000的条件，这行数据仍然显示，但值显示null

on 条件是在生成临时表时使用的条件，它不管 on 中的条件是否为真，都会返回左边表中的记录。而where是对生成后的临时表进行过滤，真正的过滤后就不显示

选出所有 bonus < 1000 的员工的 name 及其 bonus。

Employee 表单

+-------+--------+-----------+--------+
| empId |  name  | supervisor| salary |
+-------+--------+-----------+--------+
|   1   | John   |  3        | 1000   |
|   2   | Dan    |  3        | 2000   |
|   3   | Brad   |  null     | 4000   |
|   4   | Thomas |  3        | 4000   |
+-------+--------+-----------+--------+

empId 是这张表单的主关键字
Bonus 表单

+-------+-------+
| empId | bonus |
+-------+-------+
| 2     | 500   |
| 4     | 2000  |
+-------+-------+

empId 是这张表单的主关键字
输出示例：

+-------+-------+
| name  | bonus |
+-------+-------+
| John  | null  |
| Dan   | 500   |
| Brad  | null  |
+-------+-------+

select e.name, b.bonus from Employee e left join Bonus b on e.empId = b.empId where b.bonus < 1000 or b.bonus is null

2.Cross join

cross join为笛卡尔积，将两张表拼接，记得和含有条件的左右链接区分
要求写一段 SQL 语句，查询出每个学生参加每一门科目测试的次数，结果按 student_id 和 subject_name 排序。

示例

查询结构格式如下所示：

Students table:

+------------+--------------+
| student_id | student_name |
+------------+--------------+
| 1          | Alice        |
| 2          | Bob          |
| 13         | John         |
| 6          | Alex         |
+------------+--------------+

Subjects table:

+--------------+
| subject_name |
+--------------+
| Math         |
| Physics      |
| Programming  |
+--------------+

Examinations table:

+------------+--------------+
| student_id | subject_name |
+------------+--------------+
| 1          | Math         |
| 1          | Physics      |
| 1          | Programming  |
| 2          | Programming  |
| 1          | Physics      |
| 1          | Math         |
| 13         | Math         |
| 13         | Programming  |
| 13         | Physics      |
| 2          | Math         |
| 1          | Math         |
+------------+--------------+

Result table:

+------------+--------------+--------------+----------------+
| student_id | student_name | subject_name | attended_exams |
+------------+--------------+--------------+----------------+
| 1          | Alice        | Math         | 3              |
| 1          | Alice        | Physics      | 2              |
| 1          | Alice        | Programming  | 1              |
| 2          | Bob          | Math         | 1              |
| 2          | Bob          | Physics      | 0              |
| 2          | Bob          | Programming  | 1              |
| 6          | Alex         | Math         | 0              |
| 6          | Alex         | Physics      | 0              |
| 6          | Alex         | Programming  | 0              |
| 13         | John         | Math         | 1              |
| 13         | John         | Physics      | 1              |
| 13         | John         | Programming  | 1              |
+------------+--------------+--------------+----------------+

结果表需包含所有学生和所有科目（即便测试次数为0）：
Alice 参加了 3 次数学测试, 2 次物理测试，以及 1 次编程测试；
Bob 参加了 1 次数学测试, 1 次编程测试，没有参加物理测试；
Alex 啥测试都没参加；
John 参加了数学、物理、编程测试各 1 次。

select s.student_id, s.student_name, su.subject_name, count(e.subject_name) attended_exams from Students s cross join Subjects su left join Examinations e on s.student_id = e.student_id and su.subject_name = e.subject_name group by s.student_id, su.subject_name order by s.student_id, su.subject_name

3.Case when

简单函数

CASE [col_name] WHEN [value1] THEN [result1]…ELSE [default] END

搜索函数

CASE WHEN [expr] THEN [result1]…ELSE [default] END

MYSQL 中使用case when then 判断 null的方法

错误方法:
CASE columnName WHEN null THEN 0 ELSE columnName END
正确方法:
CASE WHEN columnName is null THEN 0 ELSE columnName END

示例

用户的确认率是 ‘confirmed’ 消息的数量除以请求的确认消息的总数。没有请求任何确认消息的用户的确认率为 0 。确认率四舍五入到小数点后两位。

编写一个SQL查询来查找每个用户的确认率。

以任意顺序返回结果表。

查询结果格式如下所示。

输入：
Signups 表:

+---------+---------------------+
| user_id | time_stamp          |
+---------+---------------------+
| 3       | 2020-03-21 10:16:13 |
| 7       | 2020-01-04 13:57:59 |
| 2       | 2020-07-29 23:09:44 |
| 6       | 2020-12-09 10:39:37 |
+---------+---------------------+

Confirmations 表:

+---------+---------------------+-----------+
| user_id | time_stamp          | action    |
+---------+---------------------+-----------+
| 3       | 2021-01-06 03:30:46 | timeout   |
| 3       | 2021-07-14 14:00:00 | timeout   |
| 7       | 2021-06-12 11:57:29 | confirmed |
| 7       | 2021-06-13 12:58:28 | confirmed |
| 7       | 2021-06-14 13:59:27 | confirmed |
| 2       | 2021-01-22 00:00:00 | confirmed |
| 2       | 2021-02-28 23:59:59 | timeout   |
+---------+---------------------+-----------+

输出:

+---------+-------------------+
| user_id | confirmation_rate |
+---------+-------------------+
| 6       | 0.00              |
| 3       | 0.00              |
| 7       | 1.00              |
| 2       | 0.50              |
+---------+-------------------+

解释:
用户 6 没有请求任何确认消息。确认率为 0。
用户 3 进行了 2 次请求，都超时了。确认率为 0。
用户 7 提出了 3 个请求，所有请求都得到了确认。确认率为 1。
用户 2 做了 2 个请求，其中一个被确认，另一个超时。确认率为 1 / 2 = 0.5。

select a.user_id, round(sum(a.confirmation)/count(a.confirmation), 2) confirmation_rate from  
(select s.user_id, case c.action when 'confirmed' then 1 when c.action is null then 0  else 0 end confirmation
from Signups s left join Confirmations c on s.user_id = c.user_id) a group by a.user_id

4.IF和聚合函数

IF函数能通过判断条件来返回特定值（类似case when），它的语法如下：

IF(expr,result_true,result_false)

常用的聚合函数包括count、sum、avg、min、max
使用如下：
1、不加条件查询

sum(money)

2.加条件查询

sum(if(status='0',money,0))         
sum(if(status in ('1','6'),money,0))         
sum(if(status >1,money,0))

avg（条件）相当于sum（if（条件，1，0））/count(全体)

示例

将查询结果的质量 quality 定义为：

各查询结果的评分与其位置之间比率的平均值。

将劣质查询百分比 poor_query_percentage 为：

评分小于 3 的查询结果占全部查询结果的百分比。

编写一组 SQL 来查找每次查询的名称(query_name)、质量(quality) 和劣质查询百分比(poor_query_percentage)。

质量(quality) 和劣质查询百分比(poor_query_percentage) 都应四舍五入到小数点后两位。

查询结果格式如下所示：

Queries table:

+------------+-------------------+----------+--------+
| query_name | result            | position | rating |
+------------+-------------------+----------+--------+
| Dog        | Golden Retriever  | 1        | 5      |
| Dog        | German Shepherd   | 2        | 5      |
| Dog        | Mule              | 200      | 1      |
| Cat        | Shirazi           | 5        | 2      |
| Cat        | Siamese           | 3        | 3      |
| Cat        | Sphynx            | 7        | 4      |
+------------+-------------------+----------+--------+

Result table:

+------------+---------+-----------------------+
| query_name | quality | poor_query_percentage |
+------------+---------+-----------------------+
| Dog        | 2.50    | 33.33                 |
| Cat        | 0.66    | 33.33                 |
+------------+---------+-----------------------+

Dog 查询结果的质量为 ((5 / 1) + (5 / 2) + (1 / 200)) / 3 = 2.50
Dog 查询结果的劣质查询百分比为 (1 / 3) * 100 = 33.33

Cat 查询结果的质量为 ((2 / 5) + (3 / 3) + (4 / 7)) / 3 = 0.66
Cat 查询结果的劣质查询百分比为 (1 / 3) * 100 = 33.33

select query_name, round(avg(rating/position), 2) quality, round(avg(rating < 3)*100, 2) poor_query_percentage from Queries group by query_name

5.IFNULL（可用于聚合函数）

常用于将null变成0，写法如下：

IFNULL(SUM(price),0)

5.group by和order by的顺序问题

在一般的sql中，执行顺序是先group by再order by
如果想先order by再group by可以使用下面子查询的写法：

select * from (select * from Delivery having 1 order by order_date asc) d group by customer_id

tips：
1.其中有一个having 1 ，因为子查询中使用排序不会生效，所以需要having 1强制生效。
2.子查询（表连接）需要给一个别名，否则会报错；结果作为输入不需要别名。

6.使用min函数来排序

示例：
配送表: Delivery

+-----------------------------+---------+
| Column Name                 | Type    |
+-----------------------------+---------+
| delivery_id                 | int     |
| customer_id                 | int     |
| order_date                  | date    |
| customer_pref_delivery_date | date    |
+-----------------------------+---------+

delivery_id 是表的主键。
该表保存着顾客的食物配送信息，顾客在某个日期下了订单，并指定了一个期望的配送日期（和下单日期相同或者在那之后）。

如果顾客期望的配送日期和下单日期相同，则该订单称为「即时订单」，否则称为「计划订单」。

「首次订单」是顾客最早创建的订单。我们保证一个顾客只会有一个「首次订单」。

写一条 SQL 查询语句获取即时订单在所有用户的首次订单中的比例。保留两位小数。

查询结果如下所示：

Delivery 表：

+-------------+-------------+------------+-----------------------------+
| delivery_id | customer_id | order_date | customer_pref_delivery_date |
+-------------+-------------+------------+-----------------------------+
| 1           | 1           | 2019-08-01 | 2019-08-02                  |
| 2           | 2           | 2019-08-02 | 2019-08-02                  |
| 3           | 1           | 2019-08-11 | 2019-08-12                  |
| 4           | 3           | 2019-08-24 | 2019-08-24                  |
| 5           | 3           | 2019-08-21 | 2019-08-22                  |
| 6           | 2           | 2019-08-11 | 2019-08-13                  |
| 7           | 4           | 2019-08-09 | 2019-08-09                  |
+-------------+-------------+------------+-----------------------------+

Result 表：

+----------------------+
| immediate_percentage |
+----------------------+
| 50.00                |
+----------------------+

1 号顾客的 1 号订单是首次订单，并且是计划订单。
2 号顾客的 2 号订单是首次订单，并且是即时订单。
3 号顾客的 5 号订单是首次订单，并且是计划订单。
4 号顾客的 7 号订单是首次订单，并且是即时订单。
因此，一半顾客的首次订单是即时的。

select round(avg(if(order_date = customer_pref_delivery_date, 1, 0)) * 100, 2) immediate_percentage from (select delivery_id,customer_id,min(order_date) order_date,customer_pref_delivery_date  from  Delivery group by customer_id) t

7.inner join 和 in

在做子查询后过滤的时候（有时需要把表B结果作为表A条件，其中表A和表B一样），可以使用inner join 和 in，其中in的用法为：

(col1, col2) in select col1, col2......

示例：
编写一个 SQL 查询，报告在首次登录的第二天再次登录的玩家的比率，四舍五入到小数点后两位。换句话说，您需要计算从首次登录日期开始至少连续两天登录的玩家的数量，然后除以玩家总数。

查询结果格式如下所示：

Activity table:

+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1         | 2         | 2016-03-01 | 5            |
| 1         | 2         | 2016-03-02 | 6            |
| 2         | 3         | 2017-06-25 | 1            |
| 3         | 1         | 2016-03-02 | 0            |
| 3         | 4         | 2018-07-03 | 5            |
+-----------+-----------+------------+--------------+

Result table:

+-----------+
| fraction  |
+-----------+
| 0.33      |
+-----------+

只有 ID 为 1 的玩家在第一天登录后才重新登录，所以答案是 1/3 = 0.33

select round(count(distinct player_id)/(select count(distinct player_id) from Activity), 2) fraction from Activity 
where (player_id, event_date) in (select player_id, DATE_ADD(min(event_date), interval 1 day) event_date from Activity group by player_id )

8.having 和 where

where是在group by分组之前进行条件筛选，后面不可以跟聚合函数。

having是在group by分组之后进行条件筛选，后面可以直接跟聚合函数。
示例：
MyNumbers 表：

+-------------+------+
| Column Name | Type |
+-------------+------+
| num         | int  |
+-------------+------+

这张表没有主键。可能包含重复数字。
这张表的每一行都含有一个整数。

单一数字是在 MyNumbers 表中只出现一次的数字。

请你编写一个 SQL 查询来报告最大的单一数字。如果不存在单一数字，查询需报告 null 。

select max(t.num) num from (select * from MyNumbers group by num having count(num) = 1) t

tips：此时子查询的结果将作为一张表数据，所以需要别名t

9. join 多表连接的写法

... from table1 t1 inner join table2 t2 on t1.col = t2.col inner join table3 t3 on t2.col = t3.col ...

示例：
表：Logs

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| num         | varchar |
+-------------+---------+

id 是这个表的主键。

编写一个 SQL 查询，查找所有至少连续出现三次的数字。

返回的结果表中的数据可以按任意顺序排列。

查询结果格式如下面的例子所示：

输入：
Logs 表：

+----+-----+
| Id | Num |
+----+-----+
| 1  | 1   |
| 2  | 1   |
| 3  | 1   |
| 4  | 2   |
| 5  | 1   |
| 6  | 2   |
| 7  | 2   |
+----+-----+

输出：
Result 表：

+-----------------+
| ConsecutiveNums |
+-----------------+
| 1               |
+-----------------+

解释：1 是唯一连续出现至少三次的数字。

select distinct l1.num ConsecutiveNums from Logs l1 inner join Logs l2 on l1.num = l2.num inner join Logs l3 on l2.num = l3.num where l1.id = l2.id - 1 and l2.id = l3.id -1

10. 常用函数

大小写切换：

upper(string)
lower(string)

截取：

left(string, length) #从左向右截取length长度
right(string, length) #从右向左截取length长度
substr(string，start，length) #从start开始截取length长度

日期相关：

Mysql中可以直接用字符串进行比较时间大小，只要你的日期格式是合法的就行。例如

where t.date > '2020-01-01'

DATE_ADD()函数：日期时间增加减少:

select date_add(now(), interval -1 day)

DATEDIFF()函数：计算两个日期差:

select datediff(date1, date2)

DATE_FORMAT函数：日期格式化

常用：DATE_FORMAT(SYSDATE(),‘%Y-%m-%d %H:%i:%s’)

select date_format(now(), '%Y-%m-%d')

11. group_contact

group_contact函数是mysql中的函数，主要是将分组后相同的行组成一行
语法是：

group_concat([DISTINCT] 要连接的字段 [Order BY ASC/DESC 排序字段] [Separator '分隔符'])

示例：
编写一个 SQL 查询来查找每个日期、销售的不同产品的数量及其名称。
每个日期的销售产品名称应按词典序排列。
返回按 sell_date 排序的结果表。

输入：
Activities 表：

+------------+-------------+
| sell_date  | product     |
+------------+-------------+
| 2020-05-30 | Headphone   |
| 2020-06-01 | Pencil      |
| 2020-06-02 | Mask        |
| 2020-05-30 | Basketball  |
| 2020-06-01 | Bible       |
| 2020-06-02 | Mask        |
| 2020-05-30 | T-Shirt     |
+------------+-------------+

输出：

+------------+----------+------------------------------+
| sell_date  | num_sold | products                     |
+------------+----------+------------------------------+
| 2020-05-30 | 3        | Basketball,Headphone,T-shirt |
| 2020-06-01 | 2        | Bible,Pencil                 |
| 2020-06-02 | 1        | Mask                         |
+------------+----------+------------------------------+

解释：
对于2020-05-30，出售的物品是 (Headphone, Basketball, T-shirt)，按词典序排列，并用逗号 ‘,’ 分隔。
对于2020-06-01，出售的物品是 (Pencil, Bible)，按词典序排列，并用逗号分隔。
对于2020-06-02，出售的物品是 (Mask)，只需返回该物品名。

select sell_date, count(distinct product) num_sold, group_concat(distinct product order by product) products from Activities group by sell_date

12. 两张相同结构的表关联

当两张相同结构的表关联时，左表的数据来自于右表，有两种写法：

select a.col1, a.col2 from A a, B b where a.col1 = b.col1 and a.col2 = b.col2; #使用连接

select col1, col2 from A where (col1, col2) in (select col1, col2 from B) #使用() in

示例：
写一段 SQL来查找在 2019-08-16 时全部产品的价格，假设所有产品在修改前的价格都是 10 。
输入：
Products 表:

+------------+-----------+-------------+
| product_id | new_price | change_date |
+------------+-----------+-------------+
| 1          | 20        | 2019-08-14  |
| 2          | 50        | 2019-08-14  |
| 1          | 30        | 2019-08-15  |
| 1          | 35        | 2019-08-16  |
| 2          | 65        | 2019-08-17  |
| 3          | 20        | 2019-08-18  |
+------------+-----------+-------------+

输出：

+------------+-------+
| product_id | price |
+------------+-------+
| 2          | 50    |
| 1          | 35    |
| 3          | 10    |
+------------+-------+

方法一使用连接：

select product_id, 10 price from Products  group by product_id having min(change_date) > '2019-08-16'
union
select p.product_id, p.new_price price from Products p, (select product_id, max(change_date) change_date from Products where product_id in (select product_id from Products  group by product_id having min(change_date) <= '2019-08-16') and change_date <= '2019-08-16' group by product_id) t where p.product_id = t.product_id and p.change_date=t.change_date

方法二使用in：

select product_id, 10 price from Products  group by product_id having min(change_date) > '2019-08-16'
union
select product_id, new_price price from Products where (product_id,change_date) in (select product_id, max(change_date) change_date from Products where product_id in (select product_id from Products  group by product_id having min(change_date) <= '2019-08-16') and change_date <= '2019-08-16' group by product_id)

13. 窗口函数用于排序

在这里插入图片描述
rank() 根据分组和排序给出排名，可以并列，绝对排名，排序效果 1 1 3 4
dense_rank() 根据分组和排序给出排名，可以并列，相对排名，排序效果 1 1 2 3
写法：

rank() over(partition by col1 order by col2 desc) as rk # 排序会拼接在最右边

示例：
公司的主管们感兴趣的是公司每个部门中谁赚的钱最多。一个部门的高收入者是指一个员工的工资在该部门的不同工资中排名前三。

编写一个SQL查询，找出每个部门中收入高的员工。
输入:
Employee 表:

+----+-------+--------+--------------+
| id | name  | salary | departmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 85000  | 1            |
| 2  | Henry | 80000  | 2            |
| 3  | Sam   | 60000  | 2            |
| 4  | Max   | 90000  | 1            |
| 5  | Janet | 69000  | 1            |
| 6  | Randy | 85000  | 1            |
| 7  | Will  | 70000  | 1            |
+----+-------+--------+--------------+

Department 表:

+----+-------+
| id | name  |
+----+-------+
| 1  | IT    |
| 2  | Sales |
+----+-------+

输出:

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| IT         | Joe      | 85000  |
| IT         | Randy    | 85000  |
| IT         | Will     | 70000  |
| Sales      | Henry    | 80000  |
| Sales      | Sam      | 60000  |
+------------+----------+--------+

解释:
在IT部门:

Max的工资最高
兰迪和乔都赚取第二高的独特的薪水
威尔的薪水是第三高的

在销售部:

亨利的工资最高
山姆的薪水第二高
没有第三高的工资，因为只有两名员工

select Department, Employee, Salary from (
select d.name Department,e.name Employee,e.salary Salary,dense_rank() over(partition by DepartmentId order by Salary desc) rk from Employee e,Department d where e.departmentId = d.id
) t where t.rk <= 3

14. 特殊情况下union需要括号

子查询中包含了 ORDER BY 或者 LIMIT 子句，需要将子查询放置在括号内

示例：
请你编写一组 SQL 查询：

查找评论电影数量最多的用户名。如果出现平局，返回字典序较小的用户名。
查找在 February 2020 平均评分最高的电影名称。如果出现平局，返回字典序较小的电影名称。
字典序，即按字母在字典中出现顺序对字符串排序，字典序较小则意味着排序靠前。

查询结果格式如下例所示。

输入：
Movies 表：

+-------------+--------------+
| movie_id    |  title       |
+-------------+--------------+
| 1           | Avengers     |
| 2           | Frozen 2     |
| 3           | Joker        |
+-------------+--------------+

Users 表：

+-------------+--------------+
| user_id     |  name        |
+-------------+--------------+
| 1           | Daniel       |
| 2           | Monica       |
| 3           | Maria        |
| 4           | James        |
+-------------+--------------+

MovieRating 表：

+-------------+--------------+--------------+-------------+
| movie_id    | user_id      | rating       | created_at  |
+-------------+--------------+--------------+-------------+
| 1           | 1            | 3            | 2020-01-12  |
| 1           | 2            | 4            | 2020-02-11  |
| 1           | 3            | 2            | 2020-02-12  |
| 1           | 4            | 1            | 2020-01-01  |
| 2           | 1            | 5            | 2020-02-17  | 
| 2           | 2            | 2            | 2020-02-01  | 
| 2           | 3            | 2            | 2020-03-01  |
| 3           | 1            | 3            | 2020-02-22  | 
| 3           | 2            | 4            | 2020-02-25  | 
+-------------+--------------+--------------+-------------+

输出：
Result 表：

+--------------+
| results      |
+--------------+
| Daniel       |
| Frozen 2     |
+--------------+

解释：
Daniel 和 Monica 都点评了 3 部电影（“Avengers”, “Frozen 2” 和 “Joker”）但是 Daniel 字典序比较小。
Frozen 2 和 Joker 在 2 月的评分都是 3.5，但是 Frozen 2 的字典序比较小。

(select u.name results from MovieRating r, Users u where r.user_id=u.user_id group by r.user_id order by count(distinct movie_id) desc,u.name asc limit 1)
union
(select m.title results from Movies m, MovieRating r where m.movie_id = r.movie_id and r.created_at like "2020-02%" order by r.rating desc,m.title asc limit 1)