into merge 主键重复,使用T-SQL Merge语句时如何避免插入重复记录

I am attempting to insert many records using T-SQL's MERGE statement, but my query fails to INSERT when there are duplicate records in the source table. The failure is caused by:

The target table has a Primary Key based on two columns

The source table may contain duplicate records that violate the target table's Primary Key constraint ("Violation of PRIMARY KEY constraint" is thrown)

I'm looking for a way to change my MERGE statement so that it either ignores duplicate records within the source table and/or will try/catch the INSERT statement to catch exceptions that may occur (i.e. all other INSERT statements will run regardless of the few bad eggs that may occur) - or, maybe, there's a better way to go about this problem?

Here's a query example of what I'm trying to explain. The example below will add 100k records to a temp table and then will attempt to insert those records in the target table -

EDIT

In my original post I only included two fields in the example tables which gave way to SO friends to give a DISTINCT solution to avoid duplicates in the MERGE statement. I should have mentioned that in my real-world problem the tables have 15 fields and of those 15, two of the fields are a CLUSTERED PRIMARY KEY. So the DISTINCT keyword doesn't work because I need to SELECT all 15 fields and ignore duplicates based on two of the fields.

I have updated the query below to include one more field, col4. I need to include col4 in the MERGE, but I only need to make sure that ONLY col2 and col3 are unique.

-- Create the source table

CREATE TABLE #tmp (

col2 datetime NOT NULL,

col3 int NOT NULL,

col4 int

)

GO

-- Add a bunch of test data to the source table

-- For testing purposes, allow duplicate records to be added to this table

DECLARE @loopCount int = 100000

DECLARE @loopCounter int = 0

DECLARE @randDateOffset int

DECLARE @col2 datetime

DECLARE @col3 int

DECLARE @col4 int

WHILE (@loopCounter) < @loopCount

BEGIN

SET @randDateOffset = RAND() * 100000

SET @col2 = DATEADD(MI,@randDateOffset,GETDATE())

SET @col3 = RAND() * 1000

SET @col4 = RAND() * 10

INSERT INTO #tmp

(col2,col3,col4)

VALUES

(@col2,@col3,@col4);

SET @loopCounter = @loopCounter + 1

END

-- Insert the source data into the target table

-- How do we make sure we don't attempt to INSERT a duplicate record? Or how can we

-- catch exceptions? Or?

MERGE INTO dbo.tbl1 AS tbl

USING (SELECT * FROM #tmp) AS src

ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3)

WHEN NOT MATCHED THEN

INSERT (col2,col3,col4)

VALUES (src.col2,src.col3,src.col4);

GO

解决方案

Solved to your new specification. Only inserting the highest value of col4: This time I used a group by to prevent duplicate rows.

MERGE INTO dbo.tbl1 AS tbl

USING (SELECT col2,col3, max(col4) col4 FROM #tmp group by col2,col3) AS src

ON (tbl.col2 = src.col2 AND tbl.col3 = src.col3)

WHEN NOT MATCHED THEN

INSERT (col2,col3,col4)

VALUES (src.col2,src.col3,src.col4);

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值