oracle有效的删除重复数据方法和 oracle更新高效更新大数据量的操作（多表关联更新）

最新推荐文章于 2024-05-14 17:15:27 发布

suofeya

最新推荐文章于 2024-05-14 17:15:27 发布

阅读量1k

点赞数

本文链接：https://blog.csdn.net/osuofeya/article/details/82461876

版权

oracle有效的删除重复数据方法：
1.使用having count（emp_no）>1

delete from employees t2
where t2.employee_id in(select t1.employee_id from employees t1
group by t1.employee_id
having count(t1.employee_id)>1);
ps:但这种方法是删除重复数据的所有记录，最后结果表中也没有保留一条数据。而且使用 having count（emp_no）>1 时，当遇到大数据量的表时，执行速度会非常的慢。
一般不建议使用此方法。

2.使用rowid
当遇到大数据量是，强烈建议使用此方法去执行删除。
（1）直接使用rowid进行比较删除
delete from employees t2
where t2.rowid>
(select min(t1.rowid) from employees t1
where t1.employee_id=t2.employee_id);

（2）使用聚合函数row_number()over(partition by … order by)函数
delete from employees t1
where t1.rowid in(
(select a.rowid
from(select t1.rowid,
row_number()over(partition by t1.employee_id order by t1.first_name) rn
from employees t1)a
where a.rn>1));

ps：使用rowid的删除方法要比其他方法的确执行效率要高很多。其中，但数据量低于10万时方法（2）要比方法（1）执行效率又要高很多。
如：使用方法（1）和方法（2）同时对一张7万多条记录的表进行操作：
方法（1）需要：0.016s
方法（2）需要：0.015s

oracle更新高效更新大数据量的操作（多表关联更新）：

（1）根据查询的结果集，对对应的表进行大数据量的更新，使用子查询和并行执行的方式：

update /+ parallel(a,4) / customers a – 使用别名
set (city_name,customer_type)=(select b.city_name,b.customer_type
from tmp_cust_city b
where b.customer_id=a.customer_id)
where exists (select 1
from tmp_cust_city b
where b.customer_id=a.customer_id
);

ps：使用这种查询方式，当遇到大数据量时，sql的执行效率非常差。所以当更新表的数据超过万时，不建议使用此方法进行更新。

（2）使用rowid 进行更新（当遇到大数据量的更新，这种写法的执行效率非常高）
begin
for aa in(
select b.city_name,
b.customer_type,
a.rowid id
from customers a,tmp_cust_city b
where b.customer_id=a.customer_id
)loop
update customers t1 set
t1.city_name=aa.city_name,
t1.customer_type=aa.customer_type
where rowid=aa.id;
commit;
end loop;
end;