1.利用rowid
在oracle中,每一条记录都有一个rowid,rowid在整个数据库中是唯一的(rowid确定了每条记录是oracle中的哪一个数据文件、块、行上)。在重复的记录中,可能所有列的内容都相同,但rowid不会相同。SQL语句如下:
DELETE FROM EMPLOYEE
WHERE ROWID IN (SELECT A.ROWID
FROM EMPLOYEE A, EMPLOYEE B
WHERE A.ROWID < B.ROWID
AND A.NAME = B.NAME);
如果已经知道每条记录只有一条重复的,这个sql语句适用。但是如果每条记录的重复记录有N条,这个N是未知的,就要考虑适用下面这种方法了。
2.结合rowid利用max/min函数
DELETE FROM EMPLOYEE A
WHERE ROWID NOT IN
(SELECT MAX(B.ROWID) FROM EMPLOYEE B WHERE A.NAME = B.NAME);
这里也可以使用min或者用下面的语句
DELETE FROM EMPLOYEE A
WHERE ROWID < (SELECT MAX(B.ROWID) FROM EMPLOYEE B WHERE A.NAME = B.NAME);
这里如果把max换成min的话,前面的where子句中需要把<改为>
3.利用group by
跟上面的方法思路基本是一样的,不过使用group by减少显性的比较条件,提高效率。
DELETE FROM EMPLOYEE
WHERE ROWID NOT IN (SELECT MAX(ROWID) FROM EMPLOYEE T GROUP BY T.NAME);
或者
DELETE FROM EMPLOYEE
WHERE NAME IN
(SELECT NAME FROM EMPLOYEE GROUP BY NAME HAVING COUNT(*) > 1)
AND ROWID NOT IN
(SELECT MIN(ROWID) FROM EMPLOYEE GROUP BY NAME HAVING COUNT(*) > 1);
4.利用分析函数
SELECT *
FROM EMPLOYEE
WHERE ROWID IN (SELECT RID
FROM (SELECT ROWID RID,
ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY ROWID) N
FROM EMPLOYEE)
WHERE N <> 1);
注:如果重复的是多个字段(利用group by)
DELETE FROM EMPLOYEE
WHERE (NAME, SEX) IN (SELECT NAME, SEX
FROM EMPLOYEE
GROUP BY NAME, SEX
HAVING COUNT(*) > 1)
AND ROWID NOT IN (SELECT MIN(ROWID)
FROM EMPLOYEE
GROUP BY NAME, SEX
HAVING COUNT(*) > 1);
随便说一下,以上语句的执行效率是很低的,可以考虑建立临时表,将需要判断重复的字段、rowid插入临时表中,然后删除的时候在进行比较。比如:
CREATE TABLE 临时表 AS
SELECT A.字段1, A.字段2, MAX(A.ROWID) DATAID
FROM 正式表 A
GROUP BY A.字段1, A.字段2;
DELETE FROM 表名 A
WHERE A.ROWID <> (SELECT B.DATAID
FROM 临时表 B
WHERE A.字段1 = B.字段1
AND A.字段2 = B.字段2);
最后一点,对于完全重复记录的删除,可以将查询的记录放到临时表中,然后再将原来的表记录删除,最后将临时表的数据导回原来的表中。如下:
CREATE TABLE 临时表 AS(SELECT DISTINCT * FROM 表名);
TRUNCATE TABLE 正式表;
INSERT INTO 正式表 (SELECT * FROM 临时表);
DROP TABLE 临时表;