1.题目
Write a SQL query to delete all duplicate email entries in a table named Person
, keeping only unique emails based on its smallest Id.
翻译:写一个SQL语句删掉Person表中,所有email重复的实体(相同的保留一个),只保留I的最小的唯一email.
+----+------------------+ | Id | Email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | | 3 | john@example.com | +----+------------------+ Id is the primary key column for this table.
For example, after running your query, the above Person
table should have the following rows:
+----+------------------+ | Id | Email | +----+------------------+ | 1 | john@example.com | | 2 | bob@example.com | +----+------------------+
2.思路
①将Person表按Email分组,筛选每组里面的最小Id值作为minId。
②删除所有Id不在minId中的实体。
这个sql语句的思路很简单,但是要注意的细节很多,遇到的错误我将在第4点中进行总结。
3.算法
delete from Person
where Person.Id not in (
select Person_derived.minId from(
select min(Id) as minId from Person group by Email
)Person_derived
)
4.总结
①遇到的错误1:
delete from Person
where Person.Id not in (
select min(Id) as minId from Person group by Email
)
You can't specify target table 'Person' for update in FROM clause
SQL特有的一个错误,不能筛选表中一部分元素,然后直接更新它,需要引入一个中间表。
具体原因我猜测可能是和sql的设计有关系。
②遇到的错误2:
delete from Person
where Person.Id not in (
select minId from(
select min(Id) as minId from Person group by Email
)
)
Your answer
Every derived table must have its own alias
派生出的中间表必须要有个名字。
PS:Leetcode的Runtime很有意思,同一个算法我提交两次,一次打败2%的用户,一次打败98%的用户。
看来不能完全依赖这个指标,还是得学会计算时间复杂度呀。毕竟好几次我都栽在时间复杂度上。