题目:
Write a SQL query to delete all duplicate email entries in a table named Person
, keeping only unique emails based on its smallest Id.
+----+------------------+ | Id | Email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | | 3 | john@example.com | +----+------------------+ Id is the primary key column for this table.
For example, after running your query, the above Person
table should have the following rows:
+----+------------------+ | Id | Email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | +----+------------------+解析:
最开始想到的做法是得到所有重复的Email地址的最小ID,然后把该Email地址的所有其它ID全部删除,代码如下:
# Write your MySQL query statement below
DELETE FROM Person WHERE
Email IN (SELECT Email FROM Person GROUP BY Email HAVING COUNT(Email) >1)
AND Id NOT IN (SELECT MIN(Id) FROM Person GROUP BY Email HAVING COUNT(Email) >1);
但是,却报了如下的错误:
You can't specify target table 'Person' for update in FROM clause
查询之后得知,原因是不能对同一个表先select,之后再做update操作,需要加一个中间表。修改后代码如下:
# Write your MySQL query statement below
DELETE FROM Person WHERE Email IN
(SELECT t.Email FROM (SELECT Email FROM Person GROUP BY Email HAVING COUNT(Email) >1) t)
AND Id NOT IN (SELECT s.Id FROM (SELECT MIN(Id) AS Id FROM Person GROUP BY Email HAVING COUNT(Email) >1) s);
这次,可以accept了。但是,总觉得代码太繁琐。重新思考之后,其实我们可以把所有Email地址的最小ID全部都检索出来,不管是否有重复。之后,再把其余所有的ID都删除就好了。修改后代码如下:
# Write your MySQL query statement below
DELETE FROM Person WHERE Id NOT IN (SELECT s.Id FROM (SELECT MIN(Id) AS Id FROM Person GROUP BY Email) s);