Oracle中查找和删除重复记录方法
平时工作中可能会遇到当试图对库表中的某一列或几列创建唯一索引时,系统提示 ORA
-
01452
:不能创建唯一索引,发现重复记录。
下面总结一下几种查找和删除重复记录的方法(以表CZ为例):
表CZ的结构如下:
SQL
>
desc
cz
Name
Null
? Type
--
---------------------------------------
C1
NUMBER
(
10
)
C10
NUMBER
(
5
)
C20
VARCHAR2
(
3
)
删除重复记录的方法原理:
(
1
).在Oracle中,每一条记录都有一个rowid,rowid在整个数据库中是唯一的,rowid确定了每条记录是在Oracle中的哪一个数据文件、块、行上。
(
2
).在重复的记录中,可能所有列的内容都相同,但rowid不会相同,所以只要确定出重复记录中那些具有最大rowid的就可以了,其余全部删除。
重复记录判断的标准是:
C1,C10和C20这三列的值都相同才算是重复记录。
经查看表CZ总共有16条记录:
SQL
>
set
pagesize
100
SQL
>
select
*
from
cz;
C1 C10 C20
--
-------- ---------- ---
1
2
dsf
1
2
dsf
1
2
dsf
1
2
dsf
2
3
che
1
2
dsf
1
2
dsf
1
2
dsf
1
2
dsf
2
3
che
2
3
che
2
3
che
2
3
che
3
4
dff
3
4
dff
3
4
dff
4
5
err
5
3
dar
6
1
wee
7
2
zxc
20
rows selected.
1
.查找重复记录的几种方法:
(
1
).SQL
>
select
*
from
cz
group
by
c1,c10,c20
having
count
(
*
)
>
1
;
C1 C10 C20
--
-------- ---------- ---
1
2
dsf
2
3
che
3
4
dff
(
2
).SQL
>
select
distinct
*
from
cz;
C1 C10 C20
--
-------- ---------- ---
1
2
dsf
2
3
che
3
4
dff
(
3
).SQL
>
select
*
from
cz a
where
rowid
=
(
select
max
(rowid)
from
cz
where
c1
=
a.c1
and
c10
=
a.c10
and
c20
=
a.c20);
C1 C10 C20
--
-------- ---------- ---
1
2
dsf
2
3
che
3
4
dff
2
.删除重复记录的几种方法:
(
1
).适用于有大量重复记录的情况(在C1,C10和C20列上建有索引的时候,用以下语句效率会很高):
SQL
>
delete
cz
where
(c1,c10,c20)
in
(
select
c1,c10,
c20
from
cz
group
by
c1,c10,c20
having
count
(
*
)
>
1
)
and
rowid
not
in
(
select
min
(rowid)
from
cz
group
by
c1,c10,c20
having
count
(
*
)
>
1
);
SQL
>
delete
cz
where
rowid
not
in
(
select
min
(rowid)
from
cz
group
by
c1,c10,c20);
(
2
).适用于有少量重复记录的情况(注意,对于有大量重复记录的情况,用以下语句效率会很低):
SQL
>
delete
from
cz a
where
a.rowid
!=
(
select
max
(rowid)
from
cz b
where
a.c1
=
b.c1
and
a.c10
=
b.c10
and
a.c20
=
b.c20);
SQL
>
delete
from
cz a
where
a.rowid
<
(
select
max
(rowid)
from
cz b
where
a.c1
=
b.c1
and
a.c10
=
b.c10
and
a.c20
=
b.c20);
SQL
>
delete
from
cz a
where
rowid
<
(
select
max
(rowid)
from
cz
where
c1
=
a.c1
and
c10
=
a.c10
and
c20
=
a.c20);
(
3
).适用于有少量重复记录的情况(临时表法):
SQL
>
create
table
test
as
select
distinct
*
from
cz;
(建一个临时表test用来存放重复的记录)
SQL
>
truncate
table
cz; (清空cz表的数据,但保留cz表的结构)
SQL
>
insert
into
cz
select
*
from
test;
(再将临时表test里的内容反插回来)
(
4
).适用于有大量重复记录的情况(Exception
into
子句法):
采用alter
table
命令中的 Exception
into
子句也可以确定出库表中重复的记录。这种方法稍微麻烦一些,为了使用“excepeion
into
”子句,必须首先创建 EXCEPTIONS 表。创建该表的 SQL 脚本文件为 utlexcpt.sql 。对于win2000系统和 UNIX 系统, Oracle 存放该文件的位置稍有不同,在win2000系统下,该脚本文件存放在$ORACLE_HOMEOra90rdbmsadmin 目录下;而对于 UNIX 系统,该脚本文件存放在$ORACLE_HOME
/
rdbms
/
admin 目录下。
具体步骤如下:
SQL
>
@?
/
rdbms
/
admin
/
utlexcpt.sql
Table
created.
SQL
>
desc
exceptions
Name
Null
? Type
--
---------------------------------------
ROW_ID ROWID
OWNER
VARCHAR2
(
30
)
TABLE_NAME
VARCHAR2
(
30
)
CONSTRAINT
VARCHAR2
(
30
)
SQL
>
alter
table
cz
add
constraint
cz_unique
unique
(c1,c10,c20) exceptions
into
exceptions;
*
ERROR at line
1
:
ORA
-
02299
: cannot validate (TEST.CZ_UNIQUE)
-
duplicate keys found
SQL
>
create
table
dups
as
select
*
from
cz
where
rowid
in
(
select
row_id
from
exceptions);
Table
created.
SQL
>
select
*
from
dups;
C1 C10 C20
--
-------- ---------- ---
1
2
dsf
1
2
dsf
1
2
dsf
1
2
dsf
2
3
che
1
2
dsf
1
2
dsf
1
2
dsf
1
2
dsf
2
3
che
2
3
che
2
3
che
2
3
che
3
4
dff
3
4
dff
3
4
dff
16
rows selected.
SQL
>
select
row_id
from
exceptions;
ROW_ID
--
----------------
AAAHD
/
AAIAAAADSAAA
AAAHD
/
AAIAAAADSAAB
AAAHD
/
AAIAAAADSAAC
AAAHD
/
AAIAAAADSAAF
AAAHD
/
AAIAAAADSAAH
AAAHD
/
AAIAAAADSAAI
AAAHD
/
AAIAAAADSAAG
AAAHD
/
AAIAAAADSAAD
AAAHD
/
AAIAAAADSAAE
AAAHD
/
AAIAAAADSAAJ
AAAHD
/
AAIAAAADSAAK
AAAHD
/
AAIAAAADSAAL
AAAHD
/
AAIAAAADSAAM
AAAHD
/
AAIAAAADSAAN
AAAHD
/
AAIAAAADSAAO
AAAHD
/
AAIAAAADSAAP
16
rows selected.
SQL
>
delete
from
cz
where
rowid
in
(
select
row_id
from
exceptions);
16
rows deleted.
SQL
>
insert
into
cz
select
distinct
*
from
dups;
3
rows created.
SQL
>
select
*
from
cz;
C1 C10 C20
--
-------- ---------- ---
1
2
dsf
2
3
che
3
4
dff
4
5
err
5
3
dar
6
1
wee
7
2
zxc
7
rows selected.
从结果里可以看到重复记录已经删除。
Oracle SQL中查找和删除重复记录方法
最新推荐文章于 2024-04-28 22:24:25 发布