DROP TABLE t purge ;
CREATE TABLE t AS SELECT * FROM dba_objects WHERE rownum<=10;
UPDATE t SET object_id=rownum;
UPDATE t SET object_id=3 WHERE object_id<=3;
UPDATE t SET object_id=4 WHERE object_id>=4 AND object_id<=6;
COMMIT;
--以下例子顺便 开启set autotrace on 来看看分析函数和非分析函数写法的性能差异
方法1:
SET autotrace ON
SET linesize 2000
SET pagesize 2000
delete from t
where rowid <
(select max(rowid) from t t2
where t.object_id = t2.object_id
);
已删除4行。
执行计划
----------------------------------------------------------
Plan hash value: 3804998351
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 50 | 8 (25)| 00:00:01 |
| 1 | DELETE | T | | | | |
|* 2 | HASH JOIN | | 1 | 50 | 8 (25)| 00:00:01 |
| 3 | VIEW | VW_SQ_1 | 6 | 150 | 4 (25)| 00:00:01 |
| 4 | SORT GROUP BY | | 6 | 150 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL| T | 6 | 150 | 3 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | T | 6 | 150 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T"."OBJECT_ID"="ITEM_1")
filter(ROWID<"MAX(ROWID)")
Note
-----
- dynamic sampling used for this statement (level=2)
统计信息
----------------------------------------------------------
50 recursive calls
6 db block gets
40 consistent gets
0 physical reads
0 redo size
684 bytes sent via SQL*Net to client
659 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
4 rows processed
ROLLBACK;
--方法2(注意,这里的写法和上面的并不完全等价,上面是随便删除,保留rowid最大一条,下面是保留时间最新的,上面既然可以随便删了,下面的写法肯定也是符合要求的。其中row_number必须要有order BY 关键字):
delete t
where rowid in (select rid
from (select rowid rid,
row_number() over(partition by object_id ORDER by created desc) rn
from t)
where rn > 1);
已删除4行。
执行计划
----------------------------------------------------------
Plan hash value: 2005107446
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 24 | 6 (34)| 00:00:01 |
| 1 | DELETE | T | | | | |
| 2 | NESTED LOOPS | | 1 | 24 | 6 (34)| 00:00:01 |
| 3 | VIEW | VW_NSO_1 | 6 | 72 | 4 (25)| 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 150 | | |
|* 5 | VIEW | | 6 | 150 | 4 (25)| 00:00:01 |
| 6 | WINDOW SORT | | 6 | 204 | 4 (25)| 00:00:01 |
| 7 | TABLE ACCESS FULL | T | 6 | 204 | 3 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY USER ROWID| T | 1 | 12 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter("RN">1)
Note
-----
- dynamic sampling used for this statement (level=2)
统计信息
----------------------------------------------------------
48 recursive calls
6 db block gets
38 consistent gets
0 physical reads
0 redo size
684 bytes sent via SQL*Net to client
813 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
4 rows processed
虽然目前来看,第一种方法效率也还可以,不过如果是多个字段重复,第一种方法就有些麻烦了,而且如果需求真的是要根据日期保留重复记录中最新的一个,也非常的繁琐了。
不过这类要记住,要是表记录非常大,比如有几千万,而重复记录的数理又特别巨大,者时候可以考虑直接把不重复的数据单独建出来,也是一个灵活的好办法。
分解过程:
SELECT object_id,rowid rid,
row_number() over(partition by object_id ORDER by created desc) rn
from t;
OBJECT_ID RID RN
---------- ------------------ ----------
3 AAAbW0AALAABRADAAA 1
3 AAAbW0AALAABRADAAB 2
3 AAAbW0AALAABRADAAC 3
4 AAAbW0AALAABRADAAD 1
4 AAAbW0AALAABRADAAE 2
4 AAAbW0AALAABRADAAF 3
7 AAAbW0AALAABRADAAG 1
8 AAAbW0AALAABRADAAH 1
9 AAAbW0AALAABRADAAI 1
10 AAAbW0AALAABRADAAJ 1
SELECT object_id, rid,rn
FROM (select object_id,rowid rid,
row_number() over(partition by object_id ORDER by created desc) rn
from t)
where rn > 1;
OBJECT_ID RID RN
---------- ------------------ ----------
3 AAAbW0AALAABRADAAB 2
3 AAAbW0AALAABRADAAC 3
4 AAAbW0AALAABRADAAE 2
4 AAAbW0AALAABRADAAF 3
--然后就可以根据这些RID,删除了,就留下一条了。
delete t
where rowid in (select rid
from (select rowid rid,
row_number() over(partition by object_id ORDER by created desc) rn
from t)
where rn > 1);