删除表中重复数据的sql

一方法:

查询表中重复数据 
select * from employee where employeeId in (select employeeId from employee group by employeeId having count(employeeId) > 1)

删除表中多余的重复记录,重复记录是根据单个字段(employeeId)来判断,只留有rowid最小的记录 
delete from employee where employeeId in (select employeeId from employee group by employeeId having count(employeeId) > 1) 
and rowid not in (select min(rowid) from employee group by employeeId having count(employeeId )>1)

查找表中多余的重复记录(多个字段) 
select * from employee e 
where (e.employeeId,e.phoneNo) in (select employeeId,phoneNo from employee group by employeeId,phoneNo having count(*) > 1)

删除表中多余的重复记录(多个字段),只留有rowid最小的记录 
delete from employee e 
where (e.employeeId, e.phoneNo) in (select employeeId, phoneNo from employee group by employeeId,phoneNo having count(*) > 1) 
and rowid not in (select min(rowid) from employee group by employeeId,phoneNo having count(*)>1)

查找表中多余的重复记录(多个字段),不包含rowid最小的记录 
select * from employee e 
where (e.employeeId,e.phoneNo) in (select employeeId, phoneNo from employee group by employeeId,phoneNo having count(*) > 1) 
and rowid not in (select min(rowid) from employee group by employeeId,phoneNo having count(*)>1)

通用:

delete from table t 
where (t.字段1, t.字段2, … , t.字段n) in (select 字段1, 字段2, … , 字段n from table group by 字段1, 字段2, … , 字段n having count(*) > 1) 
and rowid not in (select min(rowid) from table group by 字段1, 字段2, … , 字段n having count(*)>1)

此外:

如果只是查询的时候, 不显示重复, 只需select distinct 字段 from table….
--------------------- 

二删除方法:DELETE FROM hr.employees t1
                     WHERE t1.ROWID NOT IN (
                       SELECT MIN(t2.ROWID)
                       FROM hr.employees t2
                       GROUP BY t2.employee_id --按照想要唯一保留的字段进行分组
                      );

这个明显就比方法一好多了,子查询中我们先选除了rowid,然后按照我们想要保留的唯一字段进行分组,并取每组最小的rowid(注意是子查询表的rowid);然后在用not in删除除开最小的rowid以外的所有记录

  怎么样,这个方法是不是瞬间解决并且非常好理解?但是你以为这样就结束了?no no no

三删除方法:DELETE FROM hr.employees t1
              WHERE t1.rowid > (
                   SELECT MIN(t2.rowid)
                   FROM hr.employees t2
                   WHERE t1.employee_id = t2.employee_id --按照想要唯一保留的字段进行匹配
                 );

这个方式看起来和方法二差不多,但是想要说的是,他用的是连接,他用的是连接,不敢说连接一定比group by快,但是基本上不会输group by,而且在一般的情况下也是最快的了。而且外层的">"可以用到索引,就是各种快。

  方法也同样说一下,子查询中按照要保留的字段对t1和t2进行关联,然后选择出最小的rowid(注意是子查询表的rowid),然后在外层用">"只保留每个匹配结果最小的一条记录。然后就瞬间删除重复的记录

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值