MYSQL删除重复数据
1. 环境
- MYSQL 5.6
- 表DATATABLE_COPY有60万条记录,其中的BARCODE字段重复的记录有50万条,BARCODE/DEVICEID/DatiumName重复的有12条,将这12条数据删除掉。
2. 一个字段重复的数据记录的处理
2.1 查看一个字段重复的记录
SELECT * FROM datatable_copy
WHERE Barcode IN
(SELECT Barcode FROM datatable_copy GROUP BY Barcode HAVING COUNT(1)>1)
2.2 查看一个字段重复的多余的记录
- 方法1
这种方法使用了1秒左右的时间
SELECT * FROM datatable_COPY
WHERE ID NOT IN
(SELECT DTC.MIN_ID FROM
(SELECT MIN(ID) AS MIN_ID FROM datatable_COPY GROUP BY BARCODE) DTC )
- 方法2
这种方法使用了5秒左右的时间
SELECT * FROM datatable_COPY AS DTC
WHERE DTC.ID <> (SELECT MAX(ID) FROM datatable_COPY AS DTC2 WHERE DTC.BARCODE = DTC2.BARCODE)
- 方法3
这种方法使用了10秒左右的时间
SELECT * FROM datatable_copy
WHERE Barcode IN
(SELECT Barcode FROM datatable_copy GROUP BY Barcode HAVING COUNT(1)>1)
AND ID NOT IN
(SELECT MIN(ID) FROM datatable_copy GROUP BY Barcode HAVING COUNT(1)>1)
2.3 删除多余的记录,只保留一条
对应查询的方法1,将SELECT *更换成DELETE,删除多余的记录,使用了4秒多的时间。
DELETE FROM datatable_COPY
WHERE ID NOT IN
(SELECT DTC.MIN_ID FROM
(SELECT MIN(ID) AS MIN_ID FROM datatable_COPY GROUP BY BARCODE) DTC )
3. 多个字段重复的数据记录的处理
3.1 查看重复的记录
用时14秒
SELECT * FROM datatable_copy
WHERE (Barcode,DeviceID,DatiumName) IN
(SELECT Barcode,DeviceID,DatiumName FROM datatable_copy GROUP BY Barcode,DeviceID,DatiumName HAVING COUNT(1)>1)
在给这三个字段建立索引后,用时2秒多。
3.2 查看多余的记录
用时19秒
SELECT * FROM datatable_COPY
WHERE ID NOT IN
(SELECT DTC.MIN_ID FROM
(SELECT MIN(ID) AS MIN_ID FROM datatable_COPY GROUP BY BARCODE,DeviceID,DatiumName) DTC )
在给这三个字段建立索引后,用时6秒多。
3.3 删除多余的记录
使用了新的索引,用时6秒多。
DELETE FROM datatable_COPY
WHERE ID NOT IN
(SELECT DTC.MIN_ID FROM
(SELECT MIN(ID) AS MIN_ID FROM datatable_COPY GROUP BY BARCODE,DeviceID,DatiumName) DTC )
4. 参考引用
[【mysql】mysql删除重复记录并且只保留一条]https://blog.csdn.net/n950814abc/article/details/82284838