196. Delete Duplicate Emails

Problem: Write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique emails based on its smallest Id.

+----+------------------+
| Id | Email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
| 3  | john@example.com |
+----+------------------+

For example, after running your query, the above Person table should have the following rows:

+----+------------------+
| Id | Email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
+----+------------------+

GROUP BY

mysql> select * from Person;
+------+------------------+
| Id   | Email            |
+------+------------------+
|    1 | john@example.com |
|    2 | bob@example.com  |
|    3 | john@example.com |
+------+------------------+
3 rows in set (0.00 sec)

mysql> select id from Person group by email;
+------+
| id   |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)

mysql> delete from person where id not in(select id from Person group by email);
ERROR 1093 (HY000): You can't specify target table 'person' for update in FROM clause

?Subqueries: You cannot delete from a table and select from the same table in a subquery.

Order of Deletion

若 DELETE 语句中包含 ORDER BY 子句, rows are deleted in the order specified by the clause. This is useful primarily in conjunction with LIMIT. For example, the following statement finds rows matching the WHERE clause, sorts them by timestamp_column, and deletes the first (oldest) one:

DELETE FROM somelog WHERE user='jcole' ORDER BY timestamp_column LIMIT 1;

ORDER BY also helps to delete rows in an order required to avoid referential integrity violations.

Multi-Table Deletes

You can specify multiple tables in a DELETE statement to delete rows from one or more tables depending on the condition in the WHERE clause. You cannot use ORDER BY or LIMIT in a multiple-table DELETE. The table_references clause lists the tables involved in the join, as described in Section 13.2.9.2, “JOIN Clause”.

There are 2 types of multiple-table syntax!

For the first multiple-table syntax, only matching rows from the tables listed before the FROM clause are deleted. For the second multiple-table syntax, only matching rows from the tables listed in the FROM clause (before the USING clause) are deleted. The effect is that you can delete rows from many tables at the same time and have additional tables that are used only for searching:

DELETE t1,t2 FROM t1 INNER JOIN t2 INNER JOIN t3
WHERE t1.id=t2.di AND t2.id=t3.id;

DELETE FROM t1,t2 USING t1 INNER JOIN t2 INNER JOIN t3
WHERE t1.id=t2.di AND t2.id=t3.id;

 These statements use all three tables when searching for rows to delete, but delete matching rows only from tables t1 and t2.

The preceding examples use INNER JOIN, but multiple-table DELETE statements can use other types of join permitted in SELECT statements, such as LEFT JOIN. For example, to delete rows that exist in t1 that have no match in t2, use a LEFT JOIN:

DELETE t1 FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.id IS NULL;

 Note: If you declare an alias for a table, you must use the alias when referring to the table:

DELETE t1 FROM test AS t1, test2 WHERE ...

Table aliases in a multiple-table DELETE should be declared only in the table_references part of the statement. Elsewhere, alias references are permitted but not alias declarations.

DELETE a1, a2 FROM t1 AS a1 INNER JOIN t2 AS a2
WHERE a1.id=a2.id;

DELETE FROM a1, a2 USING t1 AS a1 INNER JOIN t2 AS a2
WHERE a1.id=a2.id;

Solution

SQL92: DELETE p1 FROM Person p1,Person p2 WHERE p1.Email=p2.Email AND p1.Id>p2.Id;

SQL99: DELETE p1 FROM Person p1 INNER JOIN Person p2 ON p1.Email=p2.Email AND p1.Id>p2.ID;

SQL92(1648ms)语法的执行速度优于SQL99(2194ms),不知为何?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值