删除一个表中的重复数据同时保留第一次插入那一条以及sql优化

       业务:一个表中有很多数据(id为自增主键),在这些数据中有个别数据出现了重复的数据。

       目标:需要把这些重复数据删除同时保留第一次插入的那一条数据,还要保持其它的数据不受影响。

       解题过程:

       第一步:查出所有要保留的下来的数据的id(save_id)

SELECT id as save_id
  FROM yujing.alarm_event_info_snapshot aeis
 where aeis.event_id in
       (SELECT ae.id
          FROM yujing.alarm_event ae
         where ae.event_uuid like 'yuanwtj_%')
 group by (aeis.event_id)

优化后:

SELECT aeis.id as save_id
  FROM yujing.alarm_event ae
 right join yujing.alarm_event_info_snapshot aeis
    on aeis.event_id = ae.id
 where ae.event_uuid like 'yuanwtj_%'
 group by (aeis.event_id)
  

       第二步:获取所有相关数据的id(all_id)

SELECT aeis.id as all_id
  FROM yujing.alarm_event_info_snapshot aeis
 where aeis.event_id in
       (SELECT ae.id
          FROM yujing.alarm_event ae
         where ae.event_uuid like 'yuanwtj_%')
 order by aeis.event_id

优化后:       

SELECT aeis.id as all_id
  FROM yujing.alarm_event ae
 right join yujing.alarm_event_info_snapshot aeis
    on aeis.event_id = ae.id
 where ae.event_uuid like 'yuanwtj_%'

         第三步:获取要删除的数据的 id(del_id)
select ad.all_id as del_id
  from (SELECT aeis.id as all_id
          FROM yujing.alarm_event_info_snapshot aeis
         where aeis.event_id in
               (SELECT ae.id
                  FROM yujing.alarm_event ae
                 where ae.event_uuid like 'yuanwtj_%')) as ad
 where ad.all_id not in (SELECT id as save_id
                           FROM yujing.alarm_event_info_snapshot aeis
                          where aeis.event_id in
                                (SELECT ae.id
                                   FROM yujing.alarm_event ae
                                  where ae.event_uuid like 'yuanwtj_%')
                          group by (aeis.event_id))
优化后:
select ad.all_id as del_id
  from (SELECT aeis.id as all_id
          FROM yujing.alarm_event ae
         right join yujing.alarm_event_info_snapshot aeis
            on aeis.event_id = ae.id
         where ae.event_uuid like 'yuanwtj_%') as ad
  left join (SELECT aeis.id as save_id
               FROM yujing.alarm_event ae
              right join yujing.alarm_event_info_snapshot aeis
                 on aeis.event_id = ae.id
              where ae.event_uuid like 'yuanwtj_%'
              group by (aeis.event_id)) as sd
    on ad.all_id = sd.save_id
 where sd.save_id is null
       

       第四步:根据id删除所有节点,注意mysql中如果有大量数据时需要批量删除,我最后使用了ETL工具进行的批量删除

       总结:在mysql数据库中,sql语句中最好不要在in或not in关键字的查询里动态获取匹配的值,数据量大的情况下使用它们效率很低,可以使用左右连接来代替in操作,这样效率会提高很多倍,大数据量下尤为明显。


  • 4
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值