从数据量很大的表中删除重复项的最佳的方式是什么?
MyTable
-----------
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
回复:
确定所选的列中不能为空,group by 那些唯一的列并且查找最小的项作为字表的结果集。然后,删除那些没有rowid的项
DELETE MyTable
FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
备注:通过group by col1, col2, col3 来判断表中的项是否重复。
原文地址链接:http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows