exists和in区别

最新推荐文章于 2023-07-18 22:56:10 发布

田螺小伙丶

最新推荐文章于 2023-07-18 22:56:10 发布

阅读量273

点赞数

分类专栏： mysql

本文链接：https://blog.csdn.net/Eye9093/article/details/103168686

版权

mysql 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

mysql exists和in区别

废话

以员工表（user）和部门表（department）为例，建表语句自行了结，看过网上的很多博客，都说分表数据大小，通过本人阅览无数精英的见解和自己的实践，这里想发表下个人看法，本博客所言所写仅代表个人观点，还望各位大神路过多多指教，不胜感激！！！

真材实料一（关于MySQL exists和in检索数据的过程）

下面是两条sql

select  * from user  a  where name='tom' and   exists  (select stuid from  department  b where  depname='yunwei'  and  a.stuid =b.stuid )；

select  * from user where name='tom' and   stuid  in (select stuid from  department where  depname='yunwei')；

提前声明：注意其中A代表（user a where name=‘tom’）的结果集，B代表（department b where depname=‘yunwei’ ）的结果集：

exists检索过程

exists对外表A用loop逐条查询，每次查询都会去验证exists的条件语句（也就是exists后面括号里面的语句），当 exists里的条件语句能够返回记录行时(只要能返回结果即可，不管你查询的是什么内容！！！)，条件就为真，就会返回当前loop到的A的这条记录，反之如果exists里的条件语句不能返回记录行，条件为假，则当前loop到的A的这条记录被丢弃，注意：exists的条件就像一个bool条件，当能返回结果集则为true，不能返回结果集则为 false；
对于exists的检索过程可以用下面的脚本概括：

for ($i = 0; $i < count(A); $i++) {

　　$a = get_record(A, $i); #从A表逐条获取记录

　　if (B.id = $a[id]) #如果子条件成立，即返回true

　　　　$result[] = $a;

}

return $result;

举例说明：

例如：

select * from user where exists (select stuid  from  department where depname='yunwei' );

对user表的记录逐条取出，由于exists条件中的select stuid from department where depname=‘yunwei’ 永远能返回记录行，那么user表的所有记录都将被加入结果集，所以与 select * from user;是一样的

例如：

select * from user where exists (select stuid  from  department where depname='yunwei+' );

not exists与exists相反，也就是当exists条件有结果集返回时，loop到的记录将被丢弃，否则将loop到的记录加入结果集

总的来说，如果user表结果集有n条记录，那么exists查询就是将这n条记录逐条取出，然后判断n遍exists条件。

in检索过程

MySQL先将子查询结果存入临时表T（可能在内存中，也可能磁盘中），确保子查询只执行一次，该表不记录重复数据且采用哈希索引遍历数据，然后通过T表的数据去遍历外表，通过关联关系得到外表的需要的数据，in查询相当于多个or条件的叠加，这个比较好理解，比如下面的查询

select * from user where userId in (1, 2, 3);

等效于

select * from user where userId = 1 or userId = 2 or userId = 3;

not in与in相反，如下

select * from user where userId not in (1, 2, 3);

等效于

select * from user where userId != 1 and userId != 2 and userId != 3;

总的来说，in查询就是先将子查询条件的记录全都查出来，假设结果集为B，共有m条记录，

然后在将子查询条件的结果集分解成m个，再进行m次主查询，值得一提的是，in查询的子条件返回结果必须只有一个字段，例如

select * from user where userId in (select id from B);

而不能是

select * from user where userId in (select id, age from B);

而exists就没有这个限制

exists和in的性能

select  * from user  a  where name='tom' and   exists  (select stuid from  department  b where  depname='yunwei'  and  a.stuid =b.stuid )；

select  * from user where name='tom' and   stuid  in (select stuid from  department where  depname='yunwei')；

1）根据前面介绍的检索数据的过程，可以知道，针对上面的两条sql中exists这种方式，是需要遍历user表name='liuwenhe’的所有数据行N，并且判断exists条件N次；并且如果department表的stuid 有索引，exists子查询可以使用连接关系（也就是stuid）上的索引；所以exists方式适合 user表的结果集小，子查询的结果集大的情况；一句话总结：子查询可以使用关联关系列上的索引，所以效率高，故内表大的适合使用exists;
2）not exists类似于exists的遍历方式，也是loop外表，然后判断exists条件

3）in是把外表user结果集和内表department结果集做hash连接（应该说类似hash join，因为MySQL不支持hash join的方式），先查询内表department结果集，再把内表结果集与外表结果集匹配，对外表可以使用关系索引（也就是stuid列上的索引），而内表结果集多大都需要查询，也就是说department where depname='yunwei’的结果集D多大，都得遍历全部的D，不可避免，故外表大的使用in，可加快效率。一句话总结：主查询可以使用关联关系列上的索引，所以效率高，故外表结果集合大的适合使用in;

3）如果用not in ，和in一样，内表结果集需要全部扫描，由于not in ，所以外表的结果集也需要权标扫描，都无法使用关系列上的索引（这种!=的范围查询无法使用任何索引），效率低，可考虑使用not exists，也可使用A left join B on A.id=B.id where B.id is null 进行优化。

总结：

exists先对外表结果集loop循环再对内表结果集进行查询。一直大家都认为exists比in语句的效率要高，这种说法其实是不准确的。这个是要区分环境的。如果查询的两个表大小相当，那么用in和exists差别不大。如果两个表中一个较小，一个是大表，则子查询表结果集大的用exists，如果外表结果集大的则适合使用in，然后就是网络中说的外表的和内表大的说法也不准确，应该是外表结果集和内表结果集合的大小，至于结果集前面已经解释过了。