一、服务器信息
1.内存
[oracle@xmldb ~]$ free -g
total used free shared buffers cached
Mem: 125 92 33 0 0 59
-/+ buffers/cache: 32 93
Swap: 80 0 80
2.CPU
[oracle@xmldb ~]$ cat /proc/cpuinfo| grep "cpu cores"| uniq
cpu cores : 8
[oracle@yundingora ~]$ cat /proc/cpuinfo| grep "processor"| wc -l
32
3.IO
服务器IO
[oracle@xmldb ~]$ dd if=/home/oracle/linuxx64_12201_database.zip of=/home/oracle/linuxx64_12201_database.zip.dd
6745501+1 records in
6745501+1 records out
3453696911 bytes (3.5 GB) copied, 25.2508 s, 137 MB/s
数据库IO
数据存在另外一个磁阵上,后续再补
二、数据库信息
1.数据库内存信息
SQL> show parameter ga;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
allow_group_access_to_sga boolean FALSE
lock_sga boolean FALSE
pga_aggregate_target big integer 0
sga_max_size big integer 1088M
sga_target big integer 0
unified_audit_sga_queue_size integer 1048576
三、表信息
1.数据表信息
select a.table_name,a.partitioned,a.degree,b.num_cols,a.num_rows,round(a.blocks*8/1024,2) as size_m,a.logging,a.last_analyzed
from all_tables a,(select table_name, count(*) as num_cols from user_tab_columns group by table_name) b
where a.table_name='TB_DELETE_TEST'
and a.table_name=b.table_name;
TABLE_NAME PAR DEGREE NUM_COLS NUM_ROWS SIZE_M LOG LAST_ANAL
-------------- ----- ---------- ---------- ----------- -------- ----- -----
TB_DELETE_TEST NO 1 11 107946703 11011.56 YES 21-APR-20
Elapsed: 00:00:00.79
--该表无索引。
2.1少量重复记录时
FI_QRY@orcl>select count(*) as distinct_2_cols_cnts from (select /*+parallel(30)*/ distinct acc,med_no,med_op_date from TB_DELETE_TEST);
DISTINCT_2_COLS_CNTS
--------------------
107946694
Elapsed: 00:00:29.56
select 107946703-107946694 as repeat_cnts from dual;
REPEAT_CNTS
-----------
9
Elapsed: 00:00:00.00
2.2大量重复记录时
FI_QRY@orcl>select count(*) as distinct_2_cols_cnts from (select /*+parallel(30)*/ distinct ACC,PAPER_NO from TB_DELETE_TEST);
DISTINCT_2_COLS_CNTS
--------------------
94681760
Elapsed: 00:00:30.88
--重复记录条数
FI_QRY@orcl>select 107946703-94681760 as repeat_cnts from dual;
REPEAT_CNTS
-----------
13264943
Elapsed: 00:00:00.00
四、高效去重
3.1少量重复记录时,通过DML语句完成即可
(注意各处hint)
FI_QRY@orcl>delete /*+RULE parallel(8)*/ from TB_DELETE_TEST a
where exists (select /*+parallel(8)*/
from ( select /*+parallel(30)*/ rowid rid,row_number() over (partition by acc,med_no,med_op_date order by rowid) rn from TB_DELETE_TEST) b
where b.rn <> 1 and a.rowid=b.rid);
9 rows deleted.
Elapsed: 00:03:28.89
3.2大量重复记录时,建议通过DDL语句完成
(注意各处hint)
FI_QRY@orcl>create /*+parallel(30)*/ table TB_DELETE_TEST_NEW as select /*+parallel(30)*/ DISTINCT * from TB_DELETE_TEST;
Table created.
Elapsed: 00:01:26.46
FI_QRY@orcl>rename TB_DELETE_TEST to TB_DELETE_TEST_OLD;
Table renamed.
Elapsed: 00:00:00.99
FI_QRY@orcl>rename TB_DELETE_TEST_NEW to TB_DELETE_TEST;
Table renamed.
Elapsed: 00:00:00.02
FI_QRY@orcl>drop table TB_DELETE_TEST_OLD purge;
Table dropped.
Elapsed: 00:00:00.56
FI_QRY@orcl>
如果使用快速delete耗时相当长
FI_QRY@orcl>delete /*+RULE parallel(8)*/ from TB_DELETE_TEST1 a
where exists (select /*+parallel(8)*/
from ( select /*+parallel(30)*/ rowid rid,row_number() over (partition by acc,paper_no order by rowid) rn from TB_DELETE_TEST1) b
where b.rn <> 1 and a.rowid=b.rid);
13264943 rows deleted.
Elapsed: 03:01:20.54
其他:如,通过表设置分区,通过程序按分区依次delete。
综上,如果重复数据量较少可以使用上述delete方式快速删除并保留一条,如果重复数据量较大则建议使用DDL实现。
此外,如果表数据量在千万级以下,重复数据来在百万级,通过上述delete仍然可以在数分钟内完成去重,详见:
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:15258974323143