【MySQL】关键字及运算符官方文档及笔记

20 篇文章 0 订阅

前言

写SQL题经常用,遇到的在这里记录一下。包括MySQL8.0的特性。一如既往的解读官方的原文。

1. join on 和 where 的条件区别

MySQL5.7 join 相关文档

1.1 用 left join 取代 right join

RIGHT JOIN works analogously to LEFT JOIN. To keep code portable across databases, it is recommended that you use LEFT JOIN instead of RIGHT JOIN.

right joinleft join 工作原理类似,为了保证写出的sql语句具有可移植性,建议用left join 取代 right join

1.2 相同点

INNER JOIN and , (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).

在没有连接条件或过滤条件的情况下,以下两个语句的语义相同,都生成笛卡尔积

select * from a join on b; # 当且仅当是inner join 而不是其他连接
select * from a, b;

1.3 不同点

The search_condition used with ON is any conditional expression of the form that can be used in a WHERE clause. Generally, the ON clause serves for conditions that specify how to join tables, and the WHERE clause restricts which rows to include in the result set.

ON连用的条件也可以写在WHERE 子句中。通常,ON子句用于指定如何联接表的条件,WHERE子句限制要在结果集中包括哪些行。

出现区别的情况,当且仅当在左(右)连接中,Eg:

CREATE TABLE `teacher` (
  `id` int(11) NOT NULL,
  `teacher_name` varchar(255) DEFAULT NULL,
  `teacher_sex` int(2) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `student` (
  `id` int(11) NOT NULL,
  `student_name` varchar(255) DEFAULT NULL,
  `student_sex` int(2) DEFAULT NULL,
  `teacher_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • 使用 join... —— 生成笛卡尔积
SELECT * FROM `teacher` a join `student` b;

在这里插入图片描述

  • 使用 join... on... 对笛卡尔积进行连接 —— 取符合连接条件的记录
SELECT * FROM `teacher` a join `student` b on a.id = b.teacher_id and b.student_sex =1;

在这里插入图片描述

  • 使用left join ... on ...and —— 出现了右表为空的记录
SELECT * FROM `teacher` a left join `student` b on a.id = b.teacher_id and b.student_sex =1;

在这里插入图片描述

  • 使用left join ... on ...where —— 未出现右表为空的记录
SELECT * FROM `teacher` a left join `student` b on a.id = b.teacher_id where b.student_sex =1;

在这里插入图片描述
总得来说, left join ... on (条件A) and (条件B) 先匹配(条件A) and (条件B) 成立的行数据,再补充左表未匹配任何数据的行记录,该行记录的右表全为空,换言之,左连接会把左表未匹配的值也加入结果集。如果left join时想过滤右边的null值数据,也万万不可在on条件后加入is null,需要在where条件忽略null值,最简洁的就是 b.xxx = ccc,究其原因是 where 的作用条件比on晚。

2. is null

MySQL5.7 IS NULL 相关文档

2.1 is null 可以走索引

MySQL can perform the same optimization on col_name IS NULL that it can use for col_name = constant_value. For example, MySQL can use indexes and ranges to search for NULL with IS NULL.

2.2 避免使用 where 字段 = null

以NULL做where条件过滤时应该写 IS NULL;

WHERE ... IS NULL;

3. where having 和聚合函数的筛选时机

3.1 未分组 where 与 having合并

MySQL 5.7 where 子句官方文档

HAVING is merged with WHERE if you do not use GROUP BY or aggregate functions (COUNT(), MIN(), and so on).

  • 没有使用group by分组的情况,having中的条件会和where中的条件合并
    在这里插入图片描述
    在这里插入图片描述

个人理解:没有group by分组,那么结果集可以视为一组,having当然就被合并了,但是没事还是不要这么用。

3.2 group by用法

3.2.1 select 语句和 group by的使用规范

MySQL 5.7 Group by 文档

  • If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them. (Before 5.7.5, MySQL does not detect functional dependency and ONLY_FULL_GROUP_BY is not enabled by default.
  • As of MySQL 5.7.5, the default SQL mode includes ONLY_FULL_GROUP_BY.
  • MySQL 5.7.5 and later also permits a nonaggregate column not named in a GROUP BY clause when ONLY_FULL_GROUP_BY SQL mode is enabled, provided that this column is limited to a single value, as shown in the following example:
  • 官网说了几段规范问题,总之,就是5.7.5的版本之前,group by中没有出现的字段,也可以出现在select 语句中。5.7.5的版本之后,这种使用情况变得严格了,需要保证被select 的表字段在分组中的值是唯一的。
mysql> CREATE TABLE mytable (
    ->    id INT UNSIGNED NOT NULL PRIMARY KEY,
    ->    a VARCHAR(10),
    ->    b INT
    -> );

mysql> INSERT INTO mytable
    -> VALUES (1, 'abc', 1000),
    ->        (2, 'abc', 2000),
    ->        (3, 'def', 4000);

mysql> SET SESSION sql_mode = sys.list_add(@@session.sql_mode, 'ONLY_FULL_GROUP_BY');

mysql> SELECT a, SUM(b) FROM mytable WHERE a = 'abc';
+------+--------+
| a    | SUM(b) |
+------+--------+
| abc  |   3000 |
+------+--------+
  • 值得一提的是,官网这个例子并没有用Group by,整个结果集视为一组,所以sum() 是可用的

SELECT name, address, MAX(age) FROM t GROUP BY name;

The query is valid if name is a primary key of t or is a unique NOT NULL column. In such cases, MySQL recognizes that the selected column is functionally dependent on a grouping column. For example, if name is a primary key, its value determines the value of address because each group has only one value of the primary key and thus only one row. As a result, there is no randomness in the choice of address value in a group and no need to reject the query.

  • 被分组的列是主键,那么形如address 的字段不存在组内的随机性,也能被select
3.2.1 where 和 group by 和 order by 的执行顺序

If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Result set sorting occurs after values have been chosen, and ORDER BY does not affect which value within each group the server chooses. Disabling ONLY_FULL_GROUP_BY is useful primarily when you know that, due to some property of the data, all values in each nonaggregated column not named in the GROUP BY are the same for each group.

  • 大意是,ONLY_FULL_GROUP_BY 限制关闭后,可以选择select没有出现在group by语句中的字段,但是这样做法是不提倡的(因为获取的结果不能被order by 限定)

  • 结论是where 、group by 先执行, order by 最后

3.2.2 业务上允许 select 未被 group by的字段

If you know that, for a given data set, each name value in fact uniquely determines the address value, address is effectively functionally dependent on name. To tell MySQL to accept the query, you can use the ANY_VALUE() function:

  • 虽然字段没有ddl确定唯一性,但是筛选的结果业务上保证是可以拿到唯一的,这个时候可以用ANY_VALUE() 告诉 MySQL 不用操心,随便返回个值即可
SELECT name, ANY_VALUE(address), MAX(age) FROM t GROUP BY name;

3.3 having

查了下文档,没有单独记录having的用法,在group by章节提到了having

Another MySQL extension to standard SQL permits references in the HAVING clause to aliased expressions in the select list. For example, the following query returns name values that occur only once in table orders:

# 返回在列表只出现了一次的名字
SELECT name, COUNT(name) FROM orders
  GROUP BY name
  HAVING COUNT(name) = 1;
SELECT name, COUNT(name) AS c FROM orders
  GROUP BY name
  HAVING c = 1;
  • MySQL对SQL规范做了个拓展,允许在having中使用select的别名
    我认为这是没有必要的,而且having执行得比select要早,会造成理解上的负担。在5.7.5后的版本也是默认禁止这种写法的。

4. CASE WHEN

MySQL有两种CASE的用法

  1. CASE Statement 在存储过程中使用的语句,目前还没遇到题目,暂时不关心
  2. Flow Control Functions —— CASE 流程控制语句,作用类似于IF ELSE,这里的CASE属于操作符,目前重点关心。

4.1 返回空值的处理

If no comparison or condition is true, the result after ELSE is returned, or NULL if there is no ELSE part.

  • 不写ELSE的情况下,未匹配任意一个when,返回空
mysql> SELECT CASE BINARY 'B'
    ->     WHEN 'a' THEN 1 WHEN 'b' THEN 2 END;
        -> NULL

4.2 分组查询使用聚合函数

select
c.country_name,
case when avg(w.weather_state)>=25 then "Hot" 
     when avg(w.weather_state)<=15 then "Cold"
     else "Warm" 
end as weather_type              
from Countries c left join Weather w on c.country_id=w.country_id
where date_format(day,"%Y-%m") ="2019-11"
group by c.country_name

5. 窗口函数

MySQL 8.0 窗口函数官方文档

A window function performs an aggregate-like operation on a set of query rows. However, whereas an aggregate operation groups query rows into a single result row, a window function produces a result for each query row:

  • 使用窗口聚合获得结果集表现得跟聚合函数很像,区别在于聚合函数一个组返回一个结果,窗口函数一个组可以返回多个结果。

5.1 SUM() OVER()

5.1.1 跟普通SUM() 的区别
  • 普通的sum() 使用
mysql> SELECT SUM(profit) AS total_profit
       FROM sales;
+--------------+
| total_profit |
+--------------+
|         7535 |
+--------------+
mysql> SELECT country, SUM(profit) AS country_profit
       FROM sales
       GROUP BY country
       ORDER BY country;
+---------+----------------+
| country | country_profit |
+---------+----------------+
| Finland |           1610 |
| India   |           1350 |
| USA     |           4575 |
+---------+----------------+
  • 使用开窗函数后
mysql> SELECT
         year, country, product, profit,
         SUM(profit) OVER() AS total_profit
       FROM sales
       ORDER BY country, year, product, profit;
+------+---------+------------+--------+--------------+
| year | country | product    | profit | total_profit | 
+------+---------+------------+--------+--------------+
| 2000 | Finland | Computer   |   1500 |         7535 |           
| 2000 | Finland | Phone      |    100 |         7535 |           
| 2001 | Finland | Phone      |     10 |         7535 |           
| 2000 | India   | Calculator |     75 |         7535 |           
| 2000 | India   | Calculator |     75 |         7535 |           
| 2000 | India   | Computer   |   1200 |         7535 |           
| 2000 | USA     | Calculator |     75 |         7535 |           
| 2000 | USA     | Computer   |   1500 |         7535 |           
| 2001 | USA     | Calculator |     50 |         7535 |           
| 2001 | USA     | Computer   |   1200 |         7535 |           
| 2001 | USA     | Computer   |   1500 |         7535 |           
| 2001 | USA     | TV         |    100 |         7535 |          
| 2001 | USA     | TV         |    150 |         7535 |          
+------+---------+------------+--------+--------------+

验证了窗口函数不折叠行的结果,还可以关注分组前的内容。(这个例子未使用group by 则整个表为一组)

结论是SUM(profit) OVER() 会对分组前的每行profit字段进行SUM(),该SUM()的结果可以返回到每一行记录上。
引出一个问题:SUM(profit) OVER() 能不能取到分组前的部分内容 —— partition的出场。

5.2 SUM() OVER(PARTITION BY …)

  • 上述的例子再增强
mysql> SELECT
         year, country, product, profit,
         SUM(profit) OVER() AS total_profit,
         SUM(profit) OVER(PARTITION BY country) AS country_profit
       FROM sales
       ORDER BY country, year, product, profit;
+------+---------+------------+--------+--------------+----------------+
| year | country | product    | profit | total_profit | country_profit |
+------+---------+------------+--------+--------------+----------------+
| 2000 | Finland | Computer   |   1500 |         7535 |           1610 |
| 2000 | Finland | Phone      |    100 |         7535 |           1610 |
| 2001 | Finland | Phone      |     10 |         7535 |           1610 |
| 2000 | India   | Calculator |     75 |         7535 |           1350 |
| 2000 | India   | Calculator |     75 |         7535 |           1350 |
| 2000 | India   | Computer   |   1200 |         7535 |           1350 |
| 2000 | USA     | Calculator |     75 |         7535 |           4575 |
| 2000 | USA     | Computer   |   1500 |         7535 |           4575 |
| 2001 | USA     | Calculator |     50 |         7535 |           4575 |
| 2001 | USA     | Computer   |   1200 |         7535 |           4575 |
| 2001 | USA     | Computer   |   1500 |         7535 |           4575 |
| 2001 | USA     | TV         |    100 |         7535 |           4575 |
| 2001 | USA     | TV         |    150 |         7535 |           4575 |
+------+---------+------------+--------+--------------+----------------+

取到分组前的部分内容—— 用PARTITION BY country 声明:想要行记录中在分组前记录中的SUM(profit)

The first OVER clause is empty, which treats the entire set of query rows as a single partition. The window function thus produces a global sum, but does so for each row.
The second OVER clause partitions rows by country, producing a sum per partition (per country). The function produces this sum for each partition row.

  • OVER() 内没有表达式,表示整个表被分为一个组块,每一行都能统计这一个组块的所有行
  • OVER() 内有表达式PARTITION BY country,表示按照country分成组块,每一行都能统计这一个组块的所有行

5.3 ROW_NUMBER()

SUM() OVER PARTITION BY的组合使用下,已经有了较强的分析能力了。现在提供一种流水化分析的能力:SUM() OVER(PARTITION BY ...ORDER BY...)。在此之前需要介绍ROW_NUMBER()

Returns the number of the current row within its partition. Rows numbers range from 1 to the number of partition rows.
ORDER BY affects the order in which rows are numbered. Without ORDER BY, row numbering is nondeterministic.

  • 结合PARTITION的语义,注意到被PARTITION成的组块,组快内是无序的,用ROW_NUMBER() 能返回单条记录在组快中的顺序。序号从1开始。
  • ORDER BY 关键字【这里指配合PARTITION BY时使用】让组块内排序,ROW_NUMBER() 变得有意义了,接下来会使用SUM() OVER(PARTITION BY ...ORDER BY...)实现个业务功能。
mysql> SELECT
         year, country, product, profit,
         ROW_NUMBER() OVER(PARTITION BY country) AS row_num1,
         ROW_NUMBER() OVER(PARTITION BY country ORDER BY year, product) AS row_num2
       FROM sales;
+------+---------+------------+--------+----------+----------+
| year | country | product    | profit | row_num1 | row_num2 |
+------+---------+------------+--------+----------+----------+
| 2000 | Finland | Computer   |   1500 |        2 |        1 |
| 2000 | Finland | Phone      |    100 |        1 |        2 |
| 2001 | Finland | Phone      |     10 |        3 |        3 |
| 2000 | India   | Calculator |     75 |        2 |        1 |
| 2000 | India   | Calculator |     75 |        3 |        2 |
| 2000 | India   | Computer   |   1200 |        1 |        3 |
| 2000 | USA     | Calculator |     75 |        5 |        1 |
| 2000 | USA     | Computer   |   1500 |        4 |        2 |
| 2001 | USA     | Calculator |     50 |        2 |        3 |
| 2001 | USA     | Computer   |   1500 |        3 |        4 |
| 2001 | USA     | Computer   |   1200 |        7 |        5 |
| 2001 | USA     | TV         |    150 |        1 |        6 |
| 2001 | USA     | TV         |    100 |        6 |        7 |
+------+---------+------------+--------+----------+----------+

5.4 SUM() OVER(PARTITION BY …ORDER BY…)

题目

写一条SQL语句查询每种性别在每一天的总分,并按性别和日期对查询结果排序

Scores表:
+-------------+--------+------------+--------------+
| player_name | gender | day        | score_points |
+-------------+--------+------------+--------------+
| Aron        | F      | 2020-01-01 | 17           |
| Alice       | F      | 2020-01-07 | 23           |
| Bajrang     | M      | 2020-01-07 | 7            |
| Khali       | M      | 2019-12-25 | 11           |
| Slaman      | M      | 2019-12-30 | 13           |
| Joe         | M      | 2019-12-31 | 3            |
| Jose        | M      | 2019-12-18 | 2            |
| Priya       | F      | 2019-12-31 | 23           |
| Priyanka    | F      | 2019-12-30 | 17           |
+-------------+--------+------------+--------------+
结果表:
+--------+------------+-------+
| gender | day        | total |
+--------+------------+-------+
| F      | 2019-12-30 | 17    |
| F      | 2019-12-31 | 40    |
| F      | 2020-01-01 | 57    |
| F      | 2020-01-07 | 80    |
| M      | 2019-12-18 | 2     |
| M      | 2019-12-25 | 13    |
| M      | 2019-12-30 | 26    |
| M      | 2019-12-31 | 29    |
| M      | 2020-01-07 | 36    |
+--------+------------+-------+
女性队伍:
第一天是 2019-12-30,Priyanka 获得 17 分,队伍的总分是 17 分
第二天是 2019-12-31, Priya 获得 23 分,队伍的总分是 40 分
第三天是 2020-01-01, Aron 获得 17 分,队伍的总分是 57 分
第四天是 2020-01-07, Alice 获得 23 分,队伍的总分是 80 分
男性队伍:
第一天是 2019-12-18, Jose 获得 2 分,队伍的总分是 2 分
第二天是 2019-12-25, Khali 获得 11 分,队伍的总分是 13 分
第三天是 2019-12-30, Slaman 获得 13 分,队伍的总分是 26 分
第四天是 2019-12-31, Joe 获得 3 分,队伍的总分是 29 分
第五天是 2020-01-07, Bajrang 获得 7 分,队伍的总分是 36
答案
select gender, day, sum(score_points) over(partition by gender order by day) total
from scores
order by gender, day

partition by gender order by dayorder by 本质上是使分组快内的数据按天数排列,这样的好处是ROW_NUMBER() 按照日期从远至近排序,每一组快的每一行都获取 <= ROW_NUMBER() 的数据 ,即对历史日期进行sum(score_points)操作,是在性别分组中按当前日期,取当前日期前累计的得分。这么说可能有点绕,看一下等价的答案(性能稍差):

  • 表连接:
SELECT
    S1.gender,
    S1.day,
    SUM(S2.score_points) total
FROM
    Scores S1
LEFT JOIN
    Scores S2
ON S1.gender = S2.gender AND S1.day >= S2.day
GROUP BY
    S1.gender,
    S1.day
ORDER BY
    S1.gender,
    S1.day
  • 子查询
select gender, day, sum(score_points) over(partition by gender order by day) total
from scores
order by gender, day 

5.5 rank() 和 dense_rank()

当且仅当跟over(order by…) 一起使用才有实际用途

  • 深层次的原因是order by 让ROW_NUMBER()起作用,看官网的例子再区分rank() 和 dense_rank() 的区别
mysql> SELECT
         val,
         ROW_NUMBER() OVER w AS 'row_number',
         RANK()       OVER w AS 'rank',
         DENSE_RANK() OVER w AS 'dense_rank'
       FROM numbers
       WINDOW w AS (ORDER BY val);
+------+------------+------+------------+
| val  | row_number | rank | dense_rank |
+------+------------+------+------------+
|    1 |          1 |    1 |          1 |
|    1 |          2 |    1 |          1 |
|    2 |          3 |    3 |          2 |
|    3 |          4 |    4 |          3 |
|    3 |          5 |    4 |          3 |
|    3 |          6 |    4 |          3 |
|    4 |          7 |    7 |          4 |
|    4 |          8 |    7 |          4 |
|    5 |          9 |    9 |          5 |
+------+------------+------+------------+
  • 共同点是都允许并列排名,不同点是dense 排名是连续的

6. union

MySQL union 关键字官方文档

UNION combines the result from multiple SELECT statements into a single result set

  • UNION 是将多个select语句的结果集组合到一起

6.1 union 的列字段名取值

The column names for a UNION result set are taken from the column names of the first SELECT statement.

  • 列名取第一个select 。

6.2 union distinct 和 union all

union distinct 是 union的全称。简明的解释:union == union distinct,重复的行只会在结果集重出现一次。
union all, 没有去重逻辑。

  • 题外话,5.7版本的union语句强制select语句之间使用圆括号,如下:
(SELECT a FROM t1 WHERE a=10 AND B=1 ORDER BY a LIMIT 10)
UNION
(SELECT a FROM t2 WHERE a=11 AND B=2 ORDER BY a LIMIT 10);
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值