删除数据库中重复的信息，只保留一条

最新推荐文章于 2023-01-29 15:33:02 发布

小酷

最新推荐文章于 2023-01-29 15:33:02 发布

阅读量910

点赞数

分类专栏：数据库文章标签： sql

本文链接：https://blog.csdn.net/Love_5209/article/details/19397811

版权

数据库专栏收录该内容

3 篇文章 0 订阅

订阅专栏

比如：有一个user的表
id   name    tel     city    time ....
1     aa     1234
2     cc     4653
3     aa     1234
4     bb     89752
5     aa     1234
6     asd    54656
7     aaa    1234
我想把name和tel字段中重复的信息删掉
得出的结果如下：
id   name    tel     city    time ....
1     aa     1234
2     cc     4653
4     bb     89752
6     asd    54656
7     aaa    1234

mysql语句：

delete from user where exists(select 1 from user a where a.name=user.name and a.tel=user.tel and a.id<user.id)

浅析MySQL中exists与in的使用：http://www.cnblogs.com/zemliu/archive/2012/10/12/2722004.html

exists对外表用loop逐条查询，每次查询都会查看exists的条件语句，当exists里的条件语句能够返回记录行时(无论记录行是的多少，只要能返回)，条件就为真，返回当前loop到的这条记录，反之如果exists里的条件语句不能返回记录行，则当前loop到的这条记录被丢弃，exists的条件就像一个bool条件，当能返回结果集则为true，不能返回结果集则为false

如下：

select * from user where exists (select 1);

对user表的记录逐条取出，由于子条件中的select 1永远能返回记录行，那么user表的所有记录都将被加入结果集，所以与 select * from user;是一样的

又如下

select * from user where exists (select * from user where userId = 0);

可以知道对user表进行loop时，检查条件语句(select * from user where userId = 0),由于userId永远不为0，所以条件语句永远返回空集，条件永远为false，那么user表的所有记录都将被丢弃

not exists与exists相反，也就是当exists条件有结果集返回时，loop到的记录将被丢弃，否则将loop到的记录加入结果集

总的来说，如果A表有n条记录，那么exists查询就是将这n条记录逐条取出，然后判断n遍exists条件

in查询相当于多个or条件的叠加，这个比较好理解，比如下面的查询

select * from user where userId in (1, 2, 3);

等效于

select * from user where userId = 1 or userId = 2 or userId = 3;

not in与in相反，如下

select * from user where userId not in (1, 2, 3);

等效于

select * from user where userId != 1 and userId != 2 and userId != 3;

总的来说，in查询就是先将子查询条件的记录全都查出来，假设结果集为B，共有m条记录，然后在将子查询条件的结果集分解成m个，再进行m次查询

值得一提的是，in查询的子条件返回结果必须只有一个字段，例如

select * from user where userId in (select id from B);

而不能是

select * from user where userId in (select id, age from B);

而exists就没有这个限制

下面来考虑exists和in的性能

考虑如下SQL语句

1: select * from A where exists (select * from B where B.id = A.id);

2: select * from A where A.id in (select id from B);

查询1.可以转化以下伪代码，便于理解

for ($i = 0; $i < count(A); $i++) {

　　$a = get_record(A, $i); #从A表逐条获取记录

　　if (B.id = $a[id]) #如果子条件成立

　　　　$result[] = $a;

}

return $result;

大概就是这么个意思，其实可以看到,查询1主要是用到了B表的索引，A表如何对查询的效率影响应该不大

假设B表的所有id为1,2,3,查询2可以转换为

select * from A where A.id = 1 or A.id = 2 or A.id = 3; (这种等效在大部分数据库中成立,但是在MySQL中不是这样)

这个好理解了，这里主要是用到了A的索引，B表如何对查询影响不大

MySQL中的IN(N个数据)会先被数据库排序,然后使用二分查找在N个数据的数据中查找与M个表中数据相匹配的值,抛开索引不说,他的效率是要logN级别的,要比直接使用

id=1 or id=2 or id=3 的效率要高.

下面再看not exists 和 not in

1. select * from A where not exists (select * from B where B.id = A.id);

2. select * from A where A.id not in (select id from B);

看查询1，还是和上面一样，用了B的索引

而对于查询2，可以转化成如下语句

select * from A where A.id != 1 and A.id != 2 and A.id != 3;

可以知道not in是个范围查询，这种!=的范围查询无法使用任何索引,等于说A表的每条记录，都要在B表里遍历一次，查看B表里是否存在这条记录

故not exists比not in效率高

小酷

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
删除数据库中重复的信息，只保留一条

比如：有一个user的表id name tel city time .... 1 aa 12342 cc 4653 3 aa 12344 bb 897525 aa 12346 asd 546567 aaa 1234我想把name
复制链接

扫一扫