去除数据库重复记录方法-CSDN博客

问题：

去除数据库表重复行中是非常常见的需求，一下是我根据一些资料总结的几种方法。

解决：

目标：表中 empname 与 orderdate 相同的记录只保留一行。

数据初始化：

Select empname,orderdate,identity(int,1,1) as keycol
Into #duptb
From (
 Select '张三' as empname,'2006-07-04'  as orderdate Union All
 Select '张三','2006-07-08' Union All
 Select '张三','2006-07-08' Union All
 Select '李四','2006-07-08' Union All
 Select '李四','2006-07-09' Union All
 Select '李四','2006-07-10' Union All
 Select '王五','2006-07-11' Union All 
 Select '王五','2006-07-11' Union All
 Select '狗二','2006-07-15' Union All
 Select '狗二','2006-07-16' 
 )as tt
Go

一、如果结果集中需要empname与orderdate两列时,直接用Distinct就可以了。
Select distinct empname,orderdate
From(
	Select empname,orderdate
	From #duptb)as tt
--(8 行受影响)
二、或者使用Group by
Select empname,orderdate
From #duptb
Group by empname,orderdate
--(8 行受影响)
三、如果要求结果集包含除分组列的其他属性（使用窗口函数 ）
----------表中 empname与orderdate重复的记录，只保留一条
--不使用keycol MS SqlServer2008
with tb as
	(Select empname,orderdate,keycol,
	 ROW_NUMBER() Over(Partition by empname,orderdate Order by empname,orderdate) as rn
 	 From #duptb
	)
Select *
From tb
Where rn <2	--(8 行受影响)
--(8 行受影响)
四、如果要求结果集包含除分组列的其他属性（使用子查询）
--method2 --使用keycol
Select t1.empname,t1.orderdate
From #duptb as t1
Where Not Exists(
	Select *
	From #duptb as t2
	Where t1.empname = t2.empname
	    And t1.orderdate = t2.orderdate
	    And t1.keycol < t2.keycol
)
--(8 行受影响)
五、如果要求结果集包含除分组列的其他属性（使用子查询）
--表中 empname与orderdate重复的记录，只保留一条,使用keycol
Select t1.empname,t1.orderdate,t1.keycol
From #duptb as t1
Where keycol In(Select MIN(keycol)
	       From #duptb as t2
	       Group by empname,orderdate
	       )
--(8 行受影响)

总结：

以上四种方法在数据量不同、索引不同等其他因素存在时有很大性能差异。
比如说，数据量少时，方法四一般会比方式三更具性能优势，尽管其看似要进行两次表扫描或查找。
方式五是所有方式中性能最差的。
如果结果集仅需要分组列，建议选择方式一、方式二、方式三。
如果结果集需要除分组列的其他列，可以考虑方式三与方式四。