MySQL 8 Anti-Join 几点总结

导读:

作者:郑松华,知数堂SQL优化班老师,网名:骑龟的兔子

今天给大家分享下,关于not in ,not exists相关的文章。其实这个可以归纳为exists to in的一类,mysql以前的版本中对in和exists的处理是完全不一样的,直到8.0.16版本。

16版本之前in可以优化成semi join那些关于semi join的几个优化,如:loosescan=on,firstmatch=on,duplicateweedout=on,materialization=on,都是对于in的 , exists只有一种运算就是DEPENDENT SUBQUERY,直到16版本开始exists可以和in等价了,也可以享受到semi join,这回最新版本中mysqlservertiam.com网站中新出现了关于anti join的文章。

Antijoin in MySQL 8                     mysqlserverteam.com

我们可以简单理解为not exists和not in在直白点就是子查询变成半连接了,下面我举个5.7和8.0.18版本同一种sql的两种不同的执行计划

首先5.7

--------------mysql  Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using  EditLine wrapper
Connection id:          2
Current database:       employees
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.7.14-log MySQL Community Server (GPL)
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /tmp/mysql3306.sock
Uptime:                 5 min 4 sec
Threads: 1  Questions: 21  Slow queries: 0  Opens: 116  Flush tables: 1  Open tables: 109  Queries per second avg: 0.069
--------------
root@mysql3306.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref                | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
|  1 | PRIMARY            | t     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | e     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.t.emp_no |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+

如上述5.7.14版本中not exists执行计划是DEPENDENT SUBQUERY

我们在8.0.18 版本中看下同一个sql:

root@mysql3308.sock>[employees]>\s
--------------
/usr/local/mysql8/bin/mysql  Ver 8.0.18 for linux-glibc2.12 on x86_64 (MySQL Community Server - GPL)
Connection id:          7
Current database:       employees
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         8.0.18 MySQL Community Server - GPL
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
mb4Db     characterset:    utf8
mb4Client characterset:    utf8
mb4Conn.  characterset:    utf8
mb4UNIX socket:            /tmp/mysql3308.sock
Uptime:                 19 min 24 sec
Threads: 1  Questions: 46  Slow queries: 0  Opens: 175  Flush tables: 3  Open tables: 95  Queries per second avg: 0.039
root@mysql3308.sock>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref                | rows | filtered | Extra                                |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
|  1 | SIMPLE      | t     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               |   10 |   100.00 | NULL                                 |
|  1 | SIMPLE      | e     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.t.emp_no |    1 |   100.00 | Using where; Not exists; Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+

我们可以看到变成了anti join

root@mysql3308.sock>[employees]>desc format=tree  select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop anti-join  (cost=12.25 rows=10)
   -> Table scan on t  (cost=1.25 rows=10)
   -> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no)  (cost=1.01 rows=1) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

因为5.7中没有8.0.18版本新增的explain format=tree查看执行计划的方法,我们比较show warnings\G

5.7

root@mysql3306.sock>[employees]>show warnings\G
*************************** 1. row ***************************  
Level: Note   
Code: 1276
Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************  
Level: Note   
Code: 1003
Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` where (not(exists(/* select#2 */ select 1 from `employees`.`employees` `e` where (`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`))))
2 rows in set (0.00 sec)

8.0

root@mysql3308.sock>[employees]>show warnings\G
*************************** 1. row ***************************  
Level: Note   
Code: 1276
Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************  
Level: Note   
Code: 1003
Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` anti join (`employees`.`employees` `e`) on((`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`)) where true
2 rows in set (0.00 sec)

可以看出5.7中是还是not exists而8.0.18版本中变成了anti join,那有人问了 DEPENDENT SUBQUERY也好,anti join也好到底有啥用?

简单来说DEPENDENT SUBQUERY类似于函数调用 ,需要1个重要的前提:必须接受参数 !也就是说只能是nested loop join方式。但是anti join就不一定了它也可以是nested loop join方式,也可以是hash join方式;而hash join就是 8.0.18版本的新特性!

关于hash join的文章 可以看如下:

hash join                          mysqlserverteam.com

而且DEPENDENT SUBQUERY随着外层表的结果集的数据量的增大而执行时间增大,以前如果碰到这种极端的问题,我们可以利用 left join来解决 ,具体的方法可以看我的公开课。

现在随着新版的anti join这些我们就可以省略了。

最后我想说的是,优化不能离开版本,离开版本谈优化,是不行的。

还有在这次的8.0.18 版本新引入的

explain format=tree / analyze 查看执行计划的方式中发现了一个很严重的问题

root@mysql3308.sock>[employees]>desc  select * from t_group t where not exists (select 1 from employees e   where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref                | rows | filtered | Extra                                |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
|  1 | SIMPLE      | t     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               |   10 |   100.00 | NULL                                 |
|  1 | SIMPLE      | e     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.t.emp_no |    1 |   100.00 | Using where; Not exists; Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
root@mysql3308.sock>[employees]>desc format=tree select * from t_group t where not exists (select 1 from employees e   where e.emp_no = t.emp_no) ;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop anti-join  (cost=12.17 rows=10)
   -> Table scan on t  (cost=1.25 rows=10)
   -> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no)  (cost=1.00 rows=1) 
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

如上所述,可以看到发生了nested loop anti-join,现在,我要把这个改成如下:

root@mysql3308.sock>[employees]>desc  select * from t_group t where not exists (select 1 from employees e ignore index(pri)  where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                                                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+
|  1 | SIMPLE      | t     | NULL       | ALL  | NULL          | NULL | NULL    | NULL |     10 |   100.00 | NULL                                                           |
|  1 | SIMPLE      | e     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 299246 |   100.00 | Using where; Not exists; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+root@mysql3308.sock>[employees]>desc analyze  select * from t_group t where not exists (select 1 from employees e ignore index(pri)  where e.emp_no = t.emp_no) ;ERROR 1235 (42000): This version of MySQL doesn't yet support 'EXPLAIN ANALYZE on this query'

如上述所示,这回desc会出现执行计划,但是desc analyze就会发生错误!

我的推测这个有可能是一个bug ! 或者是还没来得及支持,毕竟新特性,希望以后版本中更改。


水平有限,如有错误,欢迎纠正。

谢谢大家~ 欢迎转发~

我是知数堂SQL 优化班老师~ ^^

如有关于SQL优化方面疑问和一起交流的请加:

高性能MySQL,SQL优化群

并且 @兔子@知数堂SQL优化

欢迎加入 知数堂大家庭。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值