exists和not exists的嵌套使用及求关系的除法

最新推荐文章于 2023-06-14 14:45:38 发布

gerogeya

最新推荐文章于 2023-06-14 14:45:38 发布

阅读量4.1k

点赞数 5

文章标签： database mysql sql

本文链接：https://blog.csdn.net/gerogeya/article/details/8147191

版权

（用 exists 和 not exists可以求集合的交集(intersection)和差集(except)，请参考另一份如名文档）

（本文中所有SQL语句均在Mysql v5.1中验证）

初学SQL，遇到exist和not exist，看了一会弄明白了，结果发现还有他们的嵌套使用，终于又懵了，看了很长时间才摸出来个大概，下面用几个例子分析一下，
便于以后忘记时帮助理解。

题外：首先说一下not exists的用法。
exists是存在一个即可返回真，not exists是全都不存在才返回真
mysql> select * from t1;
+----+--------+
| id | value1 |
+----+--------+
| 1 | 10 |
| 2 | NULL |
| 3 | 10 |
| 4 | 3 |
+----+--------+
4 rows in set (0.00 sec)

mysql> select * from t2;
+----+--------+--------+
| id | value1 | value2 |
+----+--------+--------+
| 1 | 10 | 100 |
| 2 | NULL | 100 |
| 3 | 5 | 80 |
| 4 | 11 | 11 |
| 6 | 5 | 80 |
| 7 | 5 | 5 |
+----+--------+--------+
6 rows in set (0.00 sec)

mysql> select * from t1 where not exists (select * from t2 where value1=t1.value1);
+----+--------+
| id | value1 |
+----+--------+
| 2 | NULL |
| 4 | 3 |
+----+--------+
2 rows in set (0.00 sec)

进入正题：
example 1：（参考 http://blog.csdn.net/mtawaken/article/details/6573122，加以修改，变成适合自己理解的方式）

关系模式：
student(学号stuid，姓名stdname，年龄age)
course (课程号cid，课程名cname，学时hour)
choosed(学号stuid，课程号cid，成绩score)：代表学号为stuid的学生选了课程cid，成绩为score

mysql> select * from student;
+-------+---------+------+
| stuid | stdname | age |
+-------+---------+------+
| 1 | Bob | 23 |
| 2 | Jessy | 24 |
| 3 | Ivy | 24 |
| 4 | Grace | 25 |
| 5 | Candise | 26 |
| 6 | Angela | 25 |
| 7 | Tracy | 27 |
| 8 | Sun | 26 |
| 9 | Kevin | 26 |
| 10 | Richard | 26 |
+-------+---------+------+
10 rows in set (0.00 sec)

mysql> select * from course;
+-----+----------+------+
| cid | cname | hour |
+-----+----------+------+
| 101 | English | 108 |
| 102 | Math | 136 |
| 103 | Chinese | 110 |
| 104 | Music | 48 |
| 105 | Painting | 56 |
+-----+----------+------+
5 rows in set (0.00 sec)

mysql> select * from choosed;
+-------+------+-------+
| stuid | cid | score |
+-------+------+-------+
| 1 | 101 | 80 |
| 1 | 102 | 95 |
| 1 | 103 | 88 |
| 1 | 104 | 70 |
| 1 | 105 | 98 |
| 2 | 103 | 84 |
| 2 | 105 | 79 |
| 3 | 101 | 95 |
| 3 | 102 | 84 |
| 3 | 103 | 91 |
| 3 | 104 | 85 |
| 3 | 105 | 79 |
| 4 | 101 | 94 |
| 4 | 104 | 86 |
| 6 | 101 | 85 |
| 6 | 102 | 82 |
| 6 | 103 | 91 |
| 6 | 104 | 79 |
| 6 | 105 | 81 |
| 7 | 103 | 92 |
| 10 | 102 | 87 |
| 10 | 104 | 68 |
+-------+------+-------+
22 rows in set (0.00 sec)

1)所有选了课（至少一节）的人的姓名 exists -> exists
mysql> select stuid, stdname from student where exists (select * from course where exists (select * from choosed where stuid=student.stuid and cid=course.cid));
+-------+---------+
| stuid | stdname |
+-------+---------+
| 1 | Bob |
| 2 | Jessy |
| 3 | Ivy |
| 4 | Grace |
| 6 | Angela |
| 7 | Tracy |
| 10 | Richard |
+-------+---------+
7 rows in set (0.00 sec)

2)所有没选课（一节课都没选）的人的姓名 not exists -> exists
mysql> select stuid, stdname from student where not exists (select * from course where exists (select * from choosed where stuid=student.stuid and cid=course.cid));
+-------+---------+
| stuid | stdname |
+-------+---------+
| 5 | Candise |
| 8 | Sun |
| 9 | Kevin |
+-------+---------+
3 rows in set (0.00 sec)

3)所有没选全部课程的人的姓名 exists -> not exists
mysql> select stuid, stdname from student where exists (select * from course where not exists (select * from choosed where stuid=student.stuid and cid=course.cid));
+-------+---------+
| stuid | stdname |
+-------+---------+
| 2 | Jessy |
| 4 | Grace |
| 5 | Candise |
| 7 | Tracy |
| 8 | Sun |
| 9 | Kevin |
| 10 | Richard |
+-------+---------+
7 rows in set (0.00 sec)

4)所有选了全部课程的人的姓名 not exists -> not exists
mysql> select stuid, stdname from student where not exists (select * from course where not exists (select * from choosed where stuid=student.stuid and cid=course.cid));
+-------+---------+
| stuid | stdname |
+-------+---------+
| 1 | Bob |
| 3 | Ivy |
| 6 | Angela |
+-------+---------+
3 rows in set (0.00 sec)

1)首先分析一下简单的情况（双exists嵌套）：
mysql> select stuid, stdname from student where exists (select * from course where exists (select * from choosed where stuid=student.stuid and cid=course.cid));

i.所有被选过的课程的数据集：
select * from course where exists(select * from choosed where cid=course.cid);

ii.所有被某位学号为 @stuid 的学生选过的课程的记录集（@stuid学生的已选课程）：
select * from course where exists(select * from choosed where stuid=@stuid and cid=course.cid);
请注意，多出了学号的筛选即，stuid=@stuid

iii.遍历每一个主查询的学号，每一个学号都按第二筛选方法筛选出：已选课的学生的学号。
请注意：用主查询中的 student.stuid 代替了@stuid。

select stuid, stdname from student where exists( (1)
select * from course where exists( (2)
select * from choosed where stuid=student.stuid and cid=course.cid)) (3)

这已经是整个查询语句，可以看出子查询中用student.stuid替换了第2步中的 @stuid，（没忘了吧，第二步求的是选过了课程的某个学生）。

实际上此句的功能就是，针对每个student中的学生s，把s和每个course中的课程c组成元组对(s,c)，把(s,c)放到已选课程choosed中进行匹配，如果存在记录c'，则说明学生s选择了某门课程c'，
于是，对这个确定的学生s，结果集(select * from choosed...)会返回一条包含元组(s,c')的记录，从而(2)处的exists会返回真，
则对这个确定的学生s，结果集(select * from course...)会因为(2)处的where为真，而返回这个学生s所选择的所有课程，如果课程不为空的话，(1)处的exists返回真，否则为假。
当(1)处的exists为真时，最后的结果集(select student.stdname from student...)中就应该出现学生s的记录，代表这个学生选择了至少一门课。

于是，根据这个分析，可以把这个双exists嵌套的语句改写成如下方式：
mysql> select distinct stuid, stdname from student, course where (student.stuid, course.cid) in (select stuid, cid from choosed);
+-------+---------+
| stuid | stdname |
+-------+---------+
| 1 | Bob |
| 2 | Jessy |
| 3 | Ivy |
| 4 | Grace |
| 6 | Angela |
| 7 | Tracy |
| 10 | Richard |
+-------+---------+
7 rows in set (0.00 sec)

或者改成更为简单的方式：
mysql> select distinct student.stuid, stdname from student, choosed where student.stuid=choosed.stuid;
+-------+---------+
| stuid | stdname |
+-------+---------+
| 1 | Bob |
| 2 | Jessy |
| 3 | Ivy |
| 4 | Grace |
| 6 | Angela |
| 7 | Tracy |
| 10 | Richard |
+-------+---------+
7 rows in set (0.00 sec)

或者改成使用自然连接的方式：
mysql> select distinct stuid, stdname from student natural join choosed;
+-------+---------+
| stuid | stdname |
+-------+---------+
| 1 | Bob |
| 2 | Jessy |
| 3 | Ivy |
| 4 | Grace |
| 6 | Angela |
| 7 | Tracy |
| 10 | Richard |
+-------+---------+
7 rows in set (0.00 sec)

2)再分析一下比较复杂的情况（双 not exists 嵌套）：
select stuid, stdname from student where not exists (select * from course where not exists (select * from choosed where stuid=student.stuid and cid=course.cid));

i.所有没被选过的课程的数据集：
select * from course where not exists(select * from choosed where cid=course.cid);

ii.所有没被某位学号为 @stuid 的学生选过的课程的记录集（@stuid学生的未选课程）：
select * from course where not exists(select * from choosed where stuid=@stuid and cid=course.cid);
请注意，多出了学号的筛选即，stuid=@stuid

iii.遍历每一个主查询的学号，每一个学号都按第二筛选方法筛选出：已选全部课的学生的学号。
请注意：用主查询中的 student.stuid 代替了@stuid。

select stuid, stdname from student where not exists ( (1)
select * from course where not exists ( (2)
select * from choosed where stuid=student.stuid and cid=course.cid)); (3)

这已经是整个查询语句，可以看出子查询中用student.stuid替换了第2步中的 @stuid，（没忘了吧，第二步求的是没有选修全部课程的某个学生）。

实际上此句的功能就是，针对每个student中的学生s，把s和每个course中的课程c组成元组对(s,c)，把(s,c)放到已选课程choosed中进行匹配，如果不存在记录c'，则说明学生s没有选择某门课程c’。
于是，对这个确定的学生s，结果集(select * from choosed...)中不会存在一条包含元组(s,c’)的记录，即(s,c')使得(2)处的not exists会返回真，从而该课程会出现在(select * from course...)
的结果集中，表示该学生s没有选择该门课程c'；否则，若(2)处的not exists返回假，则说明学生s选择了该课程c'。
针对某个确定的学生s，对所有的课程c进行操作(2)，根据(select * from course...)的结果集以判断那些课程是学生s没选的。如果(select * from course...)的结果集为空，则说明该学生s没有任何
一门课程没选，即他选择了所有课程。同时(1)处的not exists会返回真，(select stuid, stdname from student)也会因为(1)处的not exists为真而把该学生s'放入其结果集中，代表该学生选择了所有课程。
于是对这个确定的学生s，结果集(select * from course...)会因为(2)处where的结果，而返回这个学生s没选择的所有课程，如果课程不为空的话，(1)处的not exists返回假，否则为真，“真”即代表学生s选择了所有课程。
最后再根据上述分析对所有学生进行如上操作，从而得出所有选择了所有课程的学生。

（注1：）针对此问题（双 not exists 嵌套），我还想出了下面的方式，但该方式不具有一般性，不过可以根据不同情况加以适当的修改，使其能用在别的地方。
该方法是先对choosed表中的数据按stuid分组，计算出每组所含有的数据数量Ci(i=1,2...)，然后再计算出course中数据的数量C，找出那些满足Ci=C的元组的stuid，然后根据这些stuid找出所有学生信息。
mysql> select distinct * from student where stuid in
(select distinct stuid
from (select stuid, count(*) as amount from choosed group by stuid) as result
where amount=(select count(*) from course));

实际上，还可以根据sql中exist和not exist的使用，运用逻辑的方式得出，上述四种情况分别代表的含义，
exists -> exists 所有选了课（至少一节）的人的姓名 vs not exists -> exists 所有没选课（一节课都没选）的人的姓名
exists -> not exists 所有没选全部课程的人的姓名 vs not exists -> not exists 所有选了全部课程的人的姓名

更进一步，根据2)中的分析，可以看出来实际上关系代数中的除法可以用此法(双not exists嵌套)解决，原因如下：
首先看一下双not exists嵌套的sql语句
select * from student where not exists ( (1)
select * from course where not exists ( (2)
select * from choosed where stuid=student.stuid and cid=course.cid)); (3)
实际上该sql语句做的就是用表choosed（第3个select语句中指定的表）中的某些属性集A（第3个select语句中where后指定的属性）做被除数，
用表courese（第2个select语句中指定的表）中的某些属性集B做除数，这个属性集应该是A的真子集。
需要得到的结果是表student（第1个select语句中指定的表）中的某些属性集C。即 A ÷ B => C

实际上就是要求出A中的某些实例，这些实例要满足条件：他们都所有干了B中的所有事情，而且已知对应的结果存在C中，反过来求A中满足条件的这些实例。

于是我们可以把上面的问题转化为一个除法问题，即已知：
a)选课信息表choosed，其中给出了所有已选课程(cid)和选择该课的学生的学号(stuid)的对应关系；
b)课程信息表course；
c)学生信息表student；
需要求出所有选了所有课程的学生的信息。
此时恰好可以用双not exists嵌套解决，sql就是上面的4)号语句。

说起来可能还是比较抽象，下面给出一个不是那么直观的但仍可以使用此法解决的例子2，
example 2：（参考：http://www.blogjava.net/decode360/archive/2009/06/12/282804.html，用不同于该文中的另一种方式说明解决方案）
（注2：）此问题就不能用example 1中的（注1）的方法解决。

已知关系模式：tt(person，event)
tt表中person代表人员编号，event代表事件编号。要求出tt表中所有"包含人员2所对应的所有事件"的人员编号。

mysql> select * from tt;
+--------+-------+
| person | event |
+--------+-------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 2 | 3 |
| 4 | 1 |
| 4 | 2 |
| 4 | 3 |
| 5 | 1 |
| 5 | 6 |
| 6 | 1 |
+--------+-------+
11 rows in set (0.00 sec)

实际上可以把该问题中涉及的表分成如下三个：
a)人员和事件的对应关系表tt（为了避免重名，将之改名为tt3）；
b)人员2所对应的所有事件表tt2 由（select event from tt where person='2'）而得，将之命名为tt2；
mysql> select event from tt where person='2';
+-------+
| event |
+-------+
| 2 |
| 3 |
+-------+
2 rows in set (0.00 sec)
c)包含所有人员的表tt1，由（select distinct person from tt）而得，将之命名为tt1；
mysql> select distinct person from tt;
+--------+
| person |
+--------+
| 1 |
| 2 |
| 4 |
| 5 |
| 6 |
+--------+
5 rows in set (0.00 sec)
于是，问题可以变成求 tt3 ÷ tt2 = tt1' (tt1'包含于tt1)，下面给出了3个解决此问题的sql语句，实际上都差不多，只是在后两个中加入了as命名，将问题修改成更容易直观看出来的除法问题了。
尤其在 method 3 中，实际上所执行的sql语句就是
select * from tt1 where not exists (
select * from tt2 where not exists (
select * from tt3 where tt3.event=tt2.event and tt3.person=tt1.person));
正是上面分析的除法的形式。

method 1:
mysql> select distinct person from tt tt1 where not exists (select * from tt tt2 where person='2' and not exists (select * from tt tt3 where tt3.person=tt1.person and tt3.event=tt2.event));
+--------+
| person |
+--------+
| 1 |
| 2 |
| 4 |
+--------+
3 rows in set (0.00 sec)

method 2:
mysql> select distinct person from tt tt1 where not exists (select * from (select event from tt where person='2') as tt2 where not exists (select * from tt tt3 where tt3.event=tt2.event and tt3.person=tt1.person));
+--------+
| person |
+--------+
| 1 |
| 2 |
| 4 |
+--------+
3 rows in set (0.00 sec)

method 3:
mysql> select * from (select distinct person from tt) as tt1 where not exists (select * from (select event from tt where person='2') as tt2 where not exists (select * from tt tt3 where tt3.event=tt2.event and tt3.person=tt1.person));
+--------+
| person |
+--------+
| 1 |
| 2 |
| 4 |
+--------+
3 rows in set (0.00 sec)

example 3:
关系模式：
tt1(姓名name，年龄age，性别gender)
tt2(账户号account_id，资产asset，支行branch)
tt3(姓名name，账户号account_id)：代表姓名为name的客户拥有账户account_id

mysql> select * from tt1;
+------+------+--------+
| name | age | gender |
+------+------+--------+
| Bob | 30 | M |
| Jack | 48 | M |
| Ivy | 27 | F |
| Sun | 19 | F |
| Alex | 52 | M |
+------+------+--------+
5 rows in set (0.00 sec)

mysql> select * from tt2;
+------------+-------+--------+
| account_id | asset | branch |
+------------+-------+--------+
| A1 | 520 | BJ |
| A2 | 180 | TJ |
| A3 | 34 | SH |
| A4 | 178 | GZ |
| A5 | 10 | NJ |
| A6 | 101 | TJ |
| A7 | 19 | TJ |
| A8 | 32 | BJ |
| A9 | 290 | GZ |
| A10 | 111 | NJ |
+------------+-------+--------+
10 rows in set (0.00 sec)

mysql> select * from tt3;
+------+------------+
| name | account_id |
+------+------------+
| Bob | A1 |
| Bob | A2 |
| Bob | A3 |
| Bob | A4 |
| Bob | A5 |
| Bob | A6 |
| Bob | A7 |
| Bob | A8 |
| Bob | A9 |
| Bob | A10 |
| Ivy | A7 |
+------+------------+
11 rows in set (0.00 sec)

mysql> select distinct tt1.* from tt1,tt2 where (tt1.name, tt2.account_id) in (select * from tt3);
+------+------+--------+
| name | age | gender |
+------+------+--------+
| Bob | 30 | M |
| Ivy | 27 | F |
+------+------+--------+
2 rows in set (0.00 sec)

所有开了账户的人的资料 exists -> exists
mysql> select distinct * from tt1 where exists (select * from tt2 where exists (select * from tt3 where tt3.name=tt1.name and tt3.account_id=tt2.account_id));
+------+------+--------+
| name | age | gender |
+------+------+--------+
| Bob | 30 | M |
| Ivy | 27 | F |
+------+------+--------+
2 rows in set (0.00 sec)

所有没开账户的人的资料 not exists -> exists
mysql> select distinct * from tt1 where not exists (select * from tt2 where exists (select * from tt3 where tt3.name=tt1.name and tt3.account_id=tt2.account_id));
+------+------+--------+
| name | age | gender |
+------+------+--------+
| Jack | 48 | M |
| Sun | 19 | F |
| Alex | 52 | M |
+------+------+--------+
3 rows in set (0.00 sec)

所有没开所有账户的人的资料 exists -> not exists
mysql> select distinct * from tt1 where exists (select * from tt2 where not exists (select * from tt3 where tt3.name=tt1.name and tt3.account_id=tt2.account_id));
+------+------+--------+
| name | age | gender |
+------+------+--------+
| Jack | 48 | M |
| Ivy | 27 | F |
| Sun | 19 | F |
| Alex | 52 | M |
+------+------+--------+
4 rows in set (0.00 sec)

所有开了所有账户的人的资料 not exists -> not exists
mysql> select distinct * from tt1 where not exists (select * from tt2 where not exists (select * from tt3 where tt3.name=tt1.name and tt3.account_id=tt2.account_id));
+------+------+--------+
| name | age | gender |
+------+------+--------+
| Bob | 30 | M |
+------+------+--------+
1 row in set (0.00 sec)

有两个问题：
1)什么时候采用这种嵌套方式，如何能看出某问题可以采用这种方式解决；
2)有没有其他的替代方法；

总结一下，关于(not) exist的四种嵌套方式：
exists -> exists：这种方式所解决的问题其实可以用其他很多方式完成，而且往往遇到此类问题时，更容易想到的并不是双exists嵌套，而是类似于join的方式，可参考example 1中1)的分析。
not exists -> exists：由于它与exists -> exists所得结果是互为补集，所以可以用全集合exists -> exists的结果做差即可得到结果。
not exists -> not exists：这个是求关系代数中的除法最常用的方法，分析过程可参考example 2，所以遇到需要做除法的问题时可以用此not exists嵌套的方式解决。
not exists -> exists：由于它与not exists -> not exists所得结果是互为补集，所以可以用全集合not exists -> not exists的结果做差即可得到结果。
其实最简单的就是记住这四种方式所能求得的结果的含义，即：
exists -> exists 所有干了某事（干了至少一件事）的某物 vs not exists -> exists 所有没干某事（一件事都没干）的某物
exists -> not exists 所有没干全部事情的某物 vs not exists -> not exists 所有干了全部事情的某物

一个比较简单的问题：将表t中数据按id分组，并求出每组中距离当前时间最近的所有数据。
（参考：http://www.cnblogs.com/mytechblog/articles/2105785.html 原文中给出的方法不太直观，个人选择了另一种方法，列举如下，比较简单，不加以说明了）

mysql> select * from t;
+------+------------+-------+---------------------+
| id | name | count | date |
+------+------------+-------+---------------------+
| 1 | apple | 10 | 2011-07-01 00:00:00 |
| 1 | orange | 20 | 2011-07-02 00:00:00 |
| 1 | banana | 15 | 2011-07-03 00:00:00 |
| 2 | white vege | 12 | 2011-07-01 00:00:00 |
| 2 | blue vege | 19 | 2011-07-02 00:00:00 |
| 2 | red vege | 18 | 2011-07-02 00:00:00 |
+------+------------+-------+---------------------+
6 rows in set (0.00 sec)

mehtod 1:
mysql> select * from t where (id, date) in (select id, max(date) from t group by id);
+------+-----------+-------+---------------------+
| id | name | count | date |
+------+-----------+-------+---------------------+
| 1 | banana | 15 | 2011-07-03 00:00:00 |
| 2 | blue vege | 19 | 2011-07-02 00:00:00 |
| 2 | red vege | 18 | 2011-07-02 00:00:00 |
+------+-----------+-------+---------------------+
3 rows in set (0.00 sec)

method 2:
mysql> select * from t a where (not exists (select * from t where id=a.id and date>a.date));
+------+-----------+-------+---------------------+
| id | name | count | date |
+------+-----------+-------+---------------------+
| 1 | banana | 15 | 2011-07-03 00:00:00 |
| 2 | blue vege | 19 | 2011-07-02 00:00:00 |
| 2 | red vege | 18 | 2011-07-02 00:00:00 |
+------+-----------+-------+---------------------+
3 rows in set (0.00 sec)