一、使用 ROW_NUMBER() OVER(PARTITION BY … ORDER BY …) 为每个分组生成一个内部序号
1.1 根据需求查询重复数据. 并为每个分组生成一个内部行号
SELECT ROW_NUMBER() OVER (PARTITION BY INDICATOR_CODE, NAME_ZH_CN ORDER BY T.INIT_TIME) SU,
T.*
FROM TABLE_NAME T
ORDER BY SU DESC;
1.2 筛选出行号大于1的. 即重复的多余的数据
SELECT *
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY INDICATOR_CODE, NAME_ZH_CN ORDER BY T.INIT_TIME) SU,
T.*
FROM TABLE_NAME T)
WHERE SU > 1;
1.3 删除重复数据. 只保留1条
DELETE
FROM TABLE_NAME
WHERE SEQ IN (SELECT SEQ
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY INDICATOR_CODE, NAME_ZH_CN ORDER BY T.INIT_TIME) SU,
T.*
FROM TABLE_NAME T)
WHERE SU > 1);
注:如果数据本身并没有每行的唯一标识ID. 删除重复数据还是非常困难的。
二、使用 ROWID
2.1 查找重复的记录
SELECT *
FROM TABLE_NAME T
WHERE (INDICATOR_CODE, NAME_ZH_CN) IN
(SELECT INDICATOR_CODE, NAME_ZH_CN
FROM TABLE_NAME
GROUP BY INDICATOR_CODE, NAME_ZH_CN
HAVING COUNT(*) > 1)
ORDER BY INIT_TIME DESC;
2.2 查找表中多余的重复记录. 不包含ROWID最小的记录
SELECT *
FROM TABLE_NAME T
WHERE (INDICATOR_CODE, NAME_ZH_CN) IN
(SELECT INDICATOR_CODE, NAME_ZH_CN
FROM TABLE_NAME
GROUP BY INDICATOR_CODE, NAME_ZH_CN
HAVING COUNT(*) > 1)
AND ROWID NOT IN (SELECT MIN(ROWID)
FROM TABLE_NAME
GROUP BY INDICATOR_CODE, NAME_ZH_CN
HAVING COUNT(*) > 1);
2.3 删除表中多余的重复记录. 只保留ROWID最小的记录
DELETE
FROM TABLE_NAME T
WHERE (INDICATOR_CODE, NAME_ZH_CN) IN
(SELECT INDICATOR_CODE, NAME_ZH_CN
FROM TABLE_NAME
GROUP BY INDICATOR_CODE, NAME_ZH_CN
HAVING COUNT(*) > 1)
AND ROWID NOT IN (SELECT MIN(ROWID)
FROM TABLE_NAME
GROUP BY INDICATOR_CODE, NAME_ZH_CN
HAVING COUNT(*) > 1);