mysql semi join_MySQL 通过semi join 优化子查询

本文介绍了MySQL中的半连接(semi join)优化,它在5.6.5版本引入,常用于子查询exists操作。通过示例展示了如何在查询存在学生的班级时,利用半连接提高性能,避免了distinct带来的影响。通过explain分析,解释了半连接的工作原理,包括Start temporary和End temporary的使用,以及MySQL如何根据成本选择优化策略。
摘要由CSDN通过智能技术生成

半连接是MySQL 5.6.5引入的,多在子查询exists中使用,对外部row source的每个键值,查找到内部row source匹配的第一个键值后就返回,如果找到就不用再查找内部row source其他的键值了。

测试环境

mysql> descclass;+------------+-------------+------+-----+---------+-------+

| Field | Type | Null | Key | Default | Extra |

+------------+-------------+------+-----+---------+-------+

| class_num | int(11) | NO | PRI | NULL | |

| class_name | varchar(20) | YES | | NULL | |

+------------+-------------+------+-----+---------+-------+

2 rows in set (0.00sec)

mysql> descroster;+-------------+---------+------+-----+---------+-------+

| Field | Type | Null | Key | Default | Extra |

+-------------+---------+------+-----+---------+-------+

| class_num | int(11) | YES | | NULL | |

| student_num | int(11) | YES | | NULL | |

+-------------+---------+------+-----+---------+-------+

2 rows in set (0.00 sec)

roster表中记录的是学生的学号以及对应的教室,多个学生可能在同一个教室,所以字段class_num有重复值

class表中记录的是教室及对应的班级名,字段class_num为唯一值

如果要查询存在学生的班级有哪些

mysql> SELECT class.class_num, class.class_name FROM class INNER JOIN roster WHERE class.class_num =roster.class_num;+-----------+------------+

| class_num | class_name |

+-----------+------------+

| 2 | class 2 |

| 3 | class 3 |

| 3 | class 3 |

+-----------+------------+

3 rows in set (0.00sec)

可以通过distinct去除重复值,但这样做影响性能,所以通过子查询来得出结果

mysql> SELECT class_num, class_name FROM class WHERE class_num IN (SELECT class_num FROMroster);+-----------+------------+

| class_num | class_name |

+-----------+------------+

| 2 | class 2 |

| 3 | class 3 |

+-----------+------------+

2 rows in set (0.00 sec)

优化器实际上是将子查询改写为了半连接

mysql> explain SELECT class_num, class_name FROM class WHERE class_num IN (SELECT class_num FROMroster);+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |

+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+

| 1 | SIMPLE | roster | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Start temporary |

| 1 | SIMPLE | class | NULL | ALL | PRIMARY | NULL | NULL | NULL | 4 | 25.00 | Using where; End temporary; Using join buffer (Block Nested Loop) |

+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------------------------------------------------------------------+

2 rows in set, 1 warning (0.00sec)

mysql>show warnings;+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Level | Code | Message |

+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Note | 1003 | /*select#1*/ select `test`.`class`.`class_num` AS `class_num`,`test`.`class`.`class_name` AS `class_name` from `test`.`class` semi join (`test`.`roster`) where (`test`.`class`.`class_num` = `test`.`roster`.`class_num`) |

+-------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

Start temporary 和 End temporary的使用表明使用了临时表来去除重复值

如果 select_type 的值为 MATERIALIZED 并且 字段 rows的输出是 则表明临时表用于了物化表

select_type value of MATERIALIZED and rows with a table value of .

如果子查询符合准则(参考文献:http://dev.mysql.com/doc/refman/5.7/en/subquery-optimization.html#semi-joins),MySQL将其转化为semi-join并从以下策略中作出基于cost的选择

Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. Table pullout pulls a table out from the subquery to the outer query.

Duplicate Weedout: Run the semi-join as if it was a join and remove duplicate records using a temporary table.

FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.

LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.

Materialize the subquery into a temporary table with an index and use the temporary table to perform a join. The index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned

不确定的内容不敢随意翻译,摘出来原汁原味的文献内容

系统变量optimizer_switch中的semi join 标签控制着半连接是否可用,5.6默认是开启的

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值