删除表中重复数据sql语句 优化!!!

有天这个需求需要实现

于是我去百度

得到如下结果:

 

SQL:删除重复数据,只保留一条用SQL语句,删除掉重复项只保留一条在几千条记录里,存在着些相同的记录,如何能用SQL语句,删除掉重复的呢

 1、查找表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断 select * from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)

2、删除表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断,只留有rowid最小的记录 delete from people where   peopleName in (select peopleName    from people group by peopleName      having count(peopleName) > 1) and   peopleId not in (select min(peopleId) from people group by peopleName     having count(peopleName)>1)

3、查找表中多余的重复记录(多个字段) select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)

4、删除表中多余的重复记录(多个字段),只留有rowid最小的记录 delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)

5、查找表中多余的重复记录(多个字段),不包含rowid最小的记录 select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)  

6.消除一个字段的左边的第一位:

update tableName set [Title]=Right([Title],(len([Title])-1)) where Title like '村%'

7.消除一个字段的右边的第一位:

update tableName set [Title]=left([Title],(len([Title])-1)) where Title like '%村'

8.假删除表中多余的重复记录(多个字段),不包含rowid最小的记录 update vitae set ispass=-1 where peopleId in (select peopleId from vitae group by peopleId,seq having count(*) > 1) and seq in (select seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1) 

 

便选取一条

删除表中多余的重复记录(多个字段),只留有rowid最小的记录 delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)

 

由于需求实现需要五个字段便改如下:

DELETE
FROM
    TEMP_registration_information
WHERE
    (
        level1options,
        level2options,
        level3options,
        level4options,
        NAME
    ) IN (
        SELECT
            level1options,
            level2options,
            level3options,
            level4options,
            NAME
        FROM
            (
                SELECT
                    level1options,
                    level2options,
                    level3options,
                    level4options,
                    NAME
                FROM
                    TEMP_registration_information
                GROUP BY
                    level1options,
                    level2options,
                    level3options,
                    level4options,
                    NAME
                HAVING
                    count(*) > 1
            ) AS tmp
    )
AND id NOT IN (
    SELECT
        id
    FROM
        (
            SELECT
                min(id)
            FROM
                TEMP_registration_information
            GROUP BY
                level1options,
                level2options,
                level3options,
                level4options,
                NAME
            HAVING
                count(*) > 1
        ) AS temp
)

做了临时表

SELECT
        id
    FROM
        (
            SELECT
                min(id)
            FROM
                TEMP_registration_information
            GROUP BY
                level1options,
                level2options,
                level3options,
                level4options,
                NAME
            HAVING
                count(*) > 1
        ) AS temp

因为运行

delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)

这种会报错

因为在mysql中,不能在一条Sql语句中,即查询这些数据,同时修改这些数据

所以增加临时表,

但运行后,耗时太长,5,6分钟毫无停止迹象,继续运行。

所以思考新方法,尝试对使用的5个字段增加索引,但耗时依旧5,6分钟毫无停止迹象,继续运行。

所以继续思考

最后想到

DELETE
FROM
  TEMP_registration_information
WHERE id  IN (
  SELECT
    id
  FROM
    (
      SELECT
        min(id) id
      FROM
        TEMP_registration_information
      GROUP BY
        level1options,
        level2options,
        level3options,
        level4options,
        NAME
      HAVING
        count(*) > 1
    ) AS temp
)

这个语句在8万条数据下,运行时长为约20s,

重复运行直到删除条数为0

就可以!

当然这种方式有些笨,本人不是专业写sql,

所以也欢迎有更好方法可以交流,留言。

 

 

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值