有天这个需求需要实现
于是我去百度
得到如下结果:
SQL:删除重复数据,只保留一条用SQL语句,删除掉重复项只保留一条在几千条记录里,存在着些相同的记录,如何能用SQL语句,删除掉重复的呢
1、查找表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断 select * from people where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)
2、删除表中多余的重复记录,重复记录是根据单个字段(peopleId)来判断,只留有rowid最小的记录 delete from people where peopleName in (select peopleName from people group by peopleName having count(peopleName) > 1) and peopleId not in (select min(peopleId) from people group by peopleName having count(peopleName)>1)
3、查找表中多余的重复记录(多个字段) select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1)
4、删除表中多余的重复记录(多个字段),只留有rowid最小的记录 delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
5、查找表中多余的重复记录(多个字段),不包含rowid最小的记录 select * from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
6.消除一个字段的左边的第一位:
update tableName set [Title]=Right([Title],(len([Title])-1)) where Title like '村%'
7.消除一个字段的右边的第一位:
update tableName set [Title]=left([Title],(len([Title])-1)) where Title like '%村'
8.假删除表中多余的重复记录(多个字段),不包含rowid最小的记录 update vitae set ispass=-1 where peopleId in (select peopleId from vitae group by peopleId,seq having count(*) > 1) and seq in (select seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
便选取一条
删除表中多余的重复记录(多个字段),只留有rowid最小的记录 delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
由于需求实现需要五个字段便改如下:
DELETE
FROM
TEMP_registration_information
WHERE
(
level1options,
level2options,
level3options,
level4options,
NAME
) IN (
SELECT
level1options,
level2options,
level3options,
level4options,
NAME
FROM
(
SELECT
level1options,
level2options,
level3options,
level4options,
NAME
FROM
TEMP_registration_information
GROUP BY
level1options,
level2options,
level3options,
level4options,
NAME
HAVING
count(*) > 1
) AS tmp
)
AND id NOT IN (
SELECT
id
FROM
(
SELECT
min(id)
FROM
TEMP_registration_information
GROUP BY
level1options,
level2options,
level3options,
level4options,
NAME
HAVING
count(*) > 1
) AS temp
)
做了临时表
SELECT
id
FROM
(
SELECT
min(id)
FROM
TEMP_registration_information
GROUP BY
level1options,
level2options,
level3options,
level4options,
NAME
HAVING
count(*) > 1
) AS temp
因为运行
delete from vitae a where (a.peopleId,a.seq) in (select peopleId,seq from vitae group by peopleId,seq having count(*) > 1) and rowid not in (select min(rowid) from vitae group by peopleId,seq having count(*)>1)
这种会报错
因为在mysql中,不能在一条Sql语句中,即查询这些数据,同时修改这些数据
所以增加临时表,
但运行后,耗时太长,5,6分钟毫无停止迹象,继续运行。
所以思考新方法,尝试对使用的5个字段增加索引,但耗时依旧5,6分钟毫无停止迹象,继续运行。
所以继续思考
最后想到
DELETE
FROM
TEMP_registration_information
WHERE id IN (
SELECT
id
FROM
(
SELECT
min(id) id
FROM
TEMP_registration_information
GROUP BY
level1options,
level2options,
level3options,
level4options,
NAME
HAVING
count(*) > 1
) AS temp
)
这个语句在8万条数据下,运行时长为约20s,
重复运行直到删除条数为0
就可以!
当然这种方式有些笨,本人不是专业写sql,
所以也欢迎有更好方法可以交流,留言。