题目:
编写一个 SQL 查询,来删除 Person 表中所有重复的电子邮箱,重复的邮箱里只保留 Id 最小的那个。
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
Id 是这个表的主键。
例如,在运行你的查询语句之后,上面的 Person 表应返回以下几行:
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
+----+------------------+
解题语句:
方法一:NOT IN + 子查询
DELETE FROM
Person
WHERE
Id NOT IN ( SELECT p.minId FROM ( SELECT MIN( Id ) AS minId FROM Person GROUP BY Email ) AS p );
方法二:INNER JOIN (较慢)
DELETE
p1.*
FROM
Person p1
INNER JOIN Person p2 ON p1.Email = p2.Email
AND p1.id > p2.Id;
注意:
方法一中的语句不能写为:(下面错误写法)
-- 错误写法
DELETE FROM
Person
WHERE
Id NOT IN ( SELECT MIN( Id ) AS minId FROM Person GROUP BY Email );
报错信息:
ERROR 1093 (HY000): You can't specify target table 'Person' for update in FROM clause
MySQL官方文档的解释为:
详见:13.2.11.10 Subquery Errors
详见:C.4 Restrictions on Subqueries
解释:可以在UPDATE语句中使用子查询进行赋值, 因为子查询在语句 UPDATE 和 DELETE 语句以及 SELECT 语句中都是合法的。但是,不能使用一张相同的表同时放在子查询的 FROM 子句和修改目标中 。
因此,可以修正为:
DELETE FROM
Person
WHERE
Id NOT IN ( SELECT * FROM ( SELECT MIN( Id ) AS minId FROM Person GROUP BY Email ) AS p );