Oracle中查找和删除重复记录方法
平时工作中可能会遇到当试图对库表中的某一列或几列创建唯一索引时,系统提示 ORA - 01452 :不能创建唯一索引,发现重复记录。
下面总结一下几种查找和删除重复记录的方法(以表CZ为例):
表CZ的结构如下:
SQL > desc cz
Name Null ? Type
-- ---------------------------------------
C1 NUMBER ( 10 )
C10 NUMBER ( 5 )
C20 VARCHAR2 ( 3 )
删除重复记录的方法原理:
( 1 ).在Oracle中,每一条记录都有一个rowid,rowid在整个数据库中是唯一的,rowid确定了每条记录是在Oracle中的哪一个数据文件、块、行上。
( 2 ).在重复的记录中,可能所有列的内容都相同,但rowid不会相同,所以只要确定出重复记录中那些具有最大rowid的就可以了,其余全部删除。
重复记录判断的标准是:
C1,C10和C20这三列的值都相同才算是重复记录。
经查看表CZ总共有16条记录:
SQL > set pagesize 100
SQL > select * from cz;
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
1 2 dsf
1 2 dsf
1 2 dsf
2 3 che
1 2 dsf
1 2 dsf
1 2 dsf
1 2 dsf
2 3 che
2 3 che
2 3 che
2 3 che
3 4 dff
3 4 dff
3 4 dff
4 5 err
5 3 dar
6 1 wee
7 2 zxc
20 rows selected.
1 .查找重复记录的几种方法:
( 1 ).SQL > select * from cz group by c1,c10,c20 having count ( * ) > 1 ;
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
2 3 che
3 4 dff
( 2 ).SQL > select distinct * from cz;
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
2 3 che
3 4 dff
( 3 ).SQL > select * from cz a where rowid = ( select max (rowid)
from cz where c1 = a.c1 and c10 = a.c10 and c20 = a.c20);
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
2 3 che
3 4 dff
2 .删除重复记录的几种方法:
( 1 ).适用于有大量重复记录的情况(在C1,C10和C20列上建有索引的时候,用以下语句效率会很高):
SQL > delete cz where (c1,c10,c20) in ( select c1,c10,
c20 from cz group by c1,c10,c20 having count ( * ) > 1 ) and rowid not in
( select min (rowid) from cz group by c1,c10,c20 having count ( * ) > 1 );
SQL > delete cz where rowid not in
( select min (rowid) from cz group by c1,c10,c20);
( 2 ).适用于有少量重复记录的情况(注意,对于有大量重复记录的情况,用以下语句效率会很低):
SQL > delete from cz a where a.rowid != ( select max (rowid)
from cz b where a.c1 = b.c1 and a.c10 = b.c10 and a.c20 = b.c20);
SQL > delete from cz a where a.rowid < ( select max (rowid)
from cz b where a.c1 = b.c1 and a.c10 = b.c10 and a.c20 = b.c20);
SQL > delete from cz a where rowid < ( select max (rowid)
from cz where c1 = a.c1 and c10 = a.c10 and c20 = a.c20);
( 3 ).适用于有少量重复记录的情况(临时表法):
SQL > create table test as select distinct * from cz;
(建一个临时表test用来存放重复的记录)
SQL > truncate table cz; (清空cz表的数据,但保留cz表的结构)
SQL > insert into cz select * from test;
(再将临时表test里的内容反插回来)
( 4 ).适用于有大量重复记录的情况(Exception into 子句法):
采用alter table 命令中的 Exception into 子句也可以确定出库表中重复的记录。这种方法稍微麻烦一些,为了使用“excepeion into ”子句,必须首先创建 EXCEPTIONS 表。创建该表的 SQL 脚本文件为 utlexcpt.sql 。对于win2000系统和 UNIX 系统, Oracle 存放该文件的位置稍有不同,在win2000系统下,该脚本文件存放在$ORACLE_HOMEOra90rdbmsadmin 目录下;而对于 UNIX 系统,该脚本文件存放在$ORACLE_HOME / rdbms / admin 目录下。
具体步骤如下:
SQL > @? / rdbms / admin / utlexcpt.sql
Table created.
SQL > desc exceptions
Name Null ? Type
-- ---------------------------------------
ROW_ID ROWID
OWNER VARCHAR2 ( 30 )
TABLE_NAME VARCHAR2 ( 30 )
CONSTRAINT VARCHAR2 ( 30 )
SQL > alter table cz add constraint
cz_unique unique (c1,c10,c20) exceptions into exceptions;
*
ERROR at line 1 :
ORA - 02299 : cannot validate (TEST.CZ_UNIQUE) - duplicate keys found
SQL > create table dups as select
* from cz where rowid in ( select row_id from exceptions);
Table created.
SQL > select * from dups;
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
1 2 dsf
1 2 dsf
1 2 dsf
2 3 che
1 2 dsf
1 2 dsf
1 2 dsf
1 2 dsf
2 3 che
2 3 che
2 3 che
2 3 che
3 4 dff
3 4 dff
3 4 dff
16 rows selected.
SQL > select row_id from exceptions;
ROW_ID
-- ----------------
AAAHD / AAIAAAADSAAA
AAAHD / AAIAAAADSAAB
AAAHD / AAIAAAADSAAC
AAAHD / AAIAAAADSAAF
AAAHD / AAIAAAADSAAH
AAAHD / AAIAAAADSAAI
AAAHD / AAIAAAADSAAG
AAAHD / AAIAAAADSAAD
AAAHD / AAIAAAADSAAE
AAAHD / AAIAAAADSAAJ
AAAHD / AAIAAAADSAAK
AAAHD / AAIAAAADSAAL
AAAHD / AAIAAAADSAAM
AAAHD / AAIAAAADSAAN
AAAHD / AAIAAAADSAAO
AAAHD / AAIAAAADSAAP
16 rows selected.
SQL > delete from cz where rowid in
( select row_id from exceptions);
16 rows deleted.
SQL > insert into cz select distinct * from dups;
3 rows created.
SQL > select * from cz;
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
2 3 che
3 4 dff
4 5 err
5 3 dar
6 1 wee
7 2 zxc
7 rows selected.
从结果里可以看到重复记录已经删除。
平时工作中可能会遇到当试图对库表中的某一列或几列创建唯一索引时,系统提示 ORA - 01452 :不能创建唯一索引,发现重复记录。
下面总结一下几种查找和删除重复记录的方法(以表CZ为例):
表CZ的结构如下:
SQL > desc cz
Name Null ? Type
-- ---------------------------------------
C1 NUMBER ( 10 )
C10 NUMBER ( 5 )
C20 VARCHAR2 ( 3 )
删除重复记录的方法原理:
( 1 ).在Oracle中,每一条记录都有一个rowid,rowid在整个数据库中是唯一的,rowid确定了每条记录是在Oracle中的哪一个数据文件、块、行上。
( 2 ).在重复的记录中,可能所有列的内容都相同,但rowid不会相同,所以只要确定出重复记录中那些具有最大rowid的就可以了,其余全部删除。
重复记录判断的标准是:
C1,C10和C20这三列的值都相同才算是重复记录。
经查看表CZ总共有16条记录:
SQL > set pagesize 100
SQL > select * from cz;
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
1 2 dsf
1 2 dsf
1 2 dsf
2 3 che
1 2 dsf
1 2 dsf
1 2 dsf
1 2 dsf
2 3 che
2 3 che
2 3 che
2 3 che
3 4 dff
3 4 dff
3 4 dff
4 5 err
5 3 dar
6 1 wee
7 2 zxc
20 rows selected.
1 .查找重复记录的几种方法:
( 1 ).SQL > select * from cz group by c1,c10,c20 having count ( * ) > 1 ;
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
2 3 che
3 4 dff
( 2 ).SQL > select distinct * from cz;
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
2 3 che
3 4 dff
( 3 ).SQL > select * from cz a where rowid = ( select max (rowid)
from cz where c1 = a.c1 and c10 = a.c10 and c20 = a.c20);
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
2 3 che
3 4 dff
2 .删除重复记录的几种方法:
( 1 ).适用于有大量重复记录的情况(在C1,C10和C20列上建有索引的时候,用以下语句效率会很高):
SQL > delete cz where (c1,c10,c20) in ( select c1,c10,
c20 from cz group by c1,c10,c20 having count ( * ) > 1 ) and rowid not in
( select min (rowid) from cz group by c1,c10,c20 having count ( * ) > 1 );
SQL > delete cz where rowid not in
( select min (rowid) from cz group by c1,c10,c20);
( 2 ).适用于有少量重复记录的情况(注意,对于有大量重复记录的情况,用以下语句效率会很低):
SQL > delete from cz a where a.rowid != ( select max (rowid)
from cz b where a.c1 = b.c1 and a.c10 = b.c10 and a.c20 = b.c20);
SQL > delete from cz a where a.rowid < ( select max (rowid)
from cz b where a.c1 = b.c1 and a.c10 = b.c10 and a.c20 = b.c20);
SQL > delete from cz a where rowid < ( select max (rowid)
from cz where c1 = a.c1 and c10 = a.c10 and c20 = a.c20);
( 3 ).适用于有少量重复记录的情况(临时表法):
SQL > create table test as select distinct * from cz;
(建一个临时表test用来存放重复的记录)
SQL > truncate table cz; (清空cz表的数据,但保留cz表的结构)
SQL > insert into cz select * from test;
(再将临时表test里的内容反插回来)
( 4 ).适用于有大量重复记录的情况(Exception into 子句法):
采用alter table 命令中的 Exception into 子句也可以确定出库表中重复的记录。这种方法稍微麻烦一些,为了使用“excepeion into ”子句,必须首先创建 EXCEPTIONS 表。创建该表的 SQL 脚本文件为 utlexcpt.sql 。对于win2000系统和 UNIX 系统, Oracle 存放该文件的位置稍有不同,在win2000系统下,该脚本文件存放在$ORACLE_HOMEOra90rdbmsadmin 目录下;而对于 UNIX 系统,该脚本文件存放在$ORACLE_HOME / rdbms / admin 目录下。
具体步骤如下:
SQL > @? / rdbms / admin / utlexcpt.sql
Table created.
SQL > desc exceptions
Name Null ? Type
-- ---------------------------------------
ROW_ID ROWID
OWNER VARCHAR2 ( 30 )
TABLE_NAME VARCHAR2 ( 30 )
CONSTRAINT VARCHAR2 ( 30 )
SQL > alter table cz add constraint
cz_unique unique (c1,c10,c20) exceptions into exceptions;
*
ERROR at line 1 :
ORA - 02299 : cannot validate (TEST.CZ_UNIQUE) - duplicate keys found
SQL > create table dups as select
* from cz where rowid in ( select row_id from exceptions);
Table created.
SQL > select * from dups;
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
1 2 dsf
1 2 dsf
1 2 dsf
2 3 che
1 2 dsf
1 2 dsf
1 2 dsf
1 2 dsf
2 3 che
2 3 che
2 3 che
2 3 che
3 4 dff
3 4 dff
3 4 dff
16 rows selected.
SQL > select row_id from exceptions;
ROW_ID
-- ----------------
AAAHD / AAIAAAADSAAA
AAAHD / AAIAAAADSAAB
AAAHD / AAIAAAADSAAC
AAAHD / AAIAAAADSAAF
AAAHD / AAIAAAADSAAH
AAAHD / AAIAAAADSAAI
AAAHD / AAIAAAADSAAG
AAAHD / AAIAAAADSAAD
AAAHD / AAIAAAADSAAE
AAAHD / AAIAAAADSAAJ
AAAHD / AAIAAAADSAAK
AAAHD / AAIAAAADSAAL
AAAHD / AAIAAAADSAAM
AAAHD / AAIAAAADSAAN
AAAHD / AAIAAAADSAAO
AAAHD / AAIAAAADSAAP
16 rows selected.
SQL > delete from cz where rowid in
( select row_id from exceptions);
16 rows deleted.
SQL > insert into cz select distinct * from dups;
3 rows created.
SQL > select * from cz;
C1 C10 C20
-- -------- ---------- ---
1 2 dsf
2 3 che
3 4 dff
4 5 err
5 3 dar
6 1 wee
7 2 zxc
7 rows selected.
从结果里可以看到重复记录已经删除。