Write a SQL query to delete all duplicate email entries in a table named Person
, keeping only unique emails based on its smallest Id.
+----+------------------+ | Id | Email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | | 3 | john@example.com | +----+------------------+ Id is the primary key column for this table.
For example, after running your query, the above Person
table should have the following rows:
+----+------------------+ | Id | Email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | +----+------------------+
题目大意:删除重复邮箱(注意必须对原表进行删除操作,查询操作将无结果。
博主今天脑子短路,这种简单的题愣是没想出来答案,不过在搜答案的过程中遇到了几个不能理解的,在这边做阐述尝试理解:
先讲容易理解的:
DELETE FROM Person WHERE Id NOT IN
(SELECT Id FROM (SELECT MIN(Id) Id FROM Person GROUP BY Email) p);
先将表根据Email分组,找出每个组中最小的Id,然后取其Id补集并删除,看似第二个select id from是多余的,其实,这个是mysql语法导致的,mysql语句不允许在同一条语句中对同一个表进行select和update操作,这会导致一个
You can't specify target table 'Person' for update in FROM clause错误,所以要引入中间表p
其实下面的代码也能正常工作:(大小写请见谅)
Delete from Person where Id in
(select Id from
(select p1.Id from Person p1,Person p2 where p1.Id > p2.Id and p1.Email = p2.Email) p);
Delete p2 from Person p1,Person p2 where p1.Email = p2.Email and p2.id > p1.id;
博主有点不理解这个,因为没有见过delete 后面能够跟表别名的,想了好久,能够自圆其说的是,将p2表看作原表,删除其id > 同邮箱对应id的数据。(也许可以当做表单字段去重工具用?)
运行时间1 < 2 < 3;