如何在SQL Server中比较表

介绍 (Introduction)

If you’ve been developing in SQL Server for any length of time, you’ve no doubt hit this scenario: You have an existing, working query that produces results your customers or business owners say are correct. Now, you’re asked to change something, or perhaps you find out your existing code will have to work with new source data or maybe there’s a performance problem and you need to tune the query. Whatever the case, you want to be sure that whatever changes have been made (whether in your code or somewhere else), the changes in the output are as expected. In other words, you need to be sure that anything that was supposed to change, did, and that anything else remains the same. So, how can you easily do that in SQL Server?

如果您已经在SQL Server中进行了一段时间的开发,那么毫无疑问,您会遇到这种情况:您有一个现有的有效查询,可以产生客户或企业主认为正确的结果。 现在,系统要求您更改某些内容,或者您​​发现现有代码必须使用新的源数据,或者可能存在性能问题,您需要调整查询。 无论是哪种情况,您都希望确保已进行了任何更改(无论是在代码中还是在其他位置), 输出中的更改都是预期的。 换句话说,您需要确保应该进行更改,所做的任何事情以及其他任何事情都保持不变。 因此,如何在SQL Server中轻松地做到这一点?

In short, I’m going to look at an efficient way to just identify differences and produce some helpful statistics along with them. Along the way, I hope you learn a few useful techniques.

简而言之,我将寻找一种有效的方法来识别差异并与之一起产生一些有用的统计数据。 在此过程中,希望您能学到一些有用的技术。

设置测试环境 (Setting up a test environment)

We’ll need two tables to test with, so here is some simple code that will do the trick:

我们将需要两个表进行测试,因此这里有一些简单的代码可以解决这个问题:

 
USE [SQLShack]
GO
/****** Object:  Table [dbo].[Original]    Script Date: 9/14/2017 7:57:37 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Original](
	[CustId] [int] IDENTITY(1,1) NOT NULL,
	[CustName] [varchar](255) NOT NULL,
	[CustAddress] [varchar](255) NOT NULL,
	[CustPhone] [numeric](12, 0) NULL,
PRIMARY KEY CLUSTERED 
(
	[CustId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object:  Table [dbo].[Revised]    Script Date: 9/14/2017 7:57:37 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Revised](
	[CustId] [int] IDENTITY(1,1) NOT NULL,
	[CustName] [varchar](255) NOT NULL,
	[CustAddress] [varchar](255) NOT NULL,
	[CustPhone] [numeric](12, 0) NULL,
PRIMARY KEY CLUSTERED 
(
	[CustId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
 
-- Populate Original Table
SET IDENTITY_INSERT [dbo].[Original] ON 
GO
INSERT [dbo].[Original] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (1, 
N'Salvador', N'1 Main Street North', CAST(76197081653 AS Numeric(12, 0)))
INSERT [dbo].[Original] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (2, 
N'Edward', N'142 Main Street West', CAST(80414444338 AS Numeric(12, 0)))
INSERT [dbo].[Original] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (3, 
N'Gilbert', N'51 Main Street East', CAST(23416310745 AS Numeric(12, 0)))
INSERT [dbo].[Original] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (4, 
N'Nicholas', N'7 Walnut Street', CAST(62051432934 AS Numeric(12, 0)))
INSERT [dbo].[Original] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (5, 
N'Jorge', N'176 Washington Street', CAST(58796383002 AS Numeric(12, 0)))
INSERT [dbo].[Original] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (6, 
N'Ernest', N'39 Main Street', CAST(461992109 AS Numeric(12, 0)))
INSERT [dbo].[Original] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (7, 
N'Stella', N'191 Church Street', CAST(78584836879 AS Numeric(12, 0)))
INSERT [dbo].[Original] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (8, 
N'Jerome', N'177 Elm Street', CAST(30235760533 AS Numeric(12, 0)))
INSERT [dbo].[Original] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (9, 
N'Ray', N'214 High Street', CAST(57288772686 AS Numeric(12, 0)))
INSERT [dbo].[Original] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (10, 
N'Lawrence', N'53 Main Street South', CAST(92544965861 AS Numeric(12, 0)))
GO
SET IDENTITY_INSERT [dbo].[Original] OFF
GO
 
-- Populate Revised Table
SET IDENTITY_INSERT [dbo].[Revised] ON 
GO
INSERT [dbo].[Revised] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (1, 
N'Jerome', N'1 Main Street North', CAST(36096777923 AS Numeric(12, 0)))
INSERT [dbo].[Revised] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (2, 
N'Lawrence', N'53 Main Street South', CAST(73368786216 AS Numeric(12, 0)))
INSERT [dbo].[Revised] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (3, 
N'Ray', N'214 High Street', CAST(64765571087 AS Numeric(12, 0)))
INSERT [dbo].[Revised] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (4, 
N'Gilbert', N'177 Elm Street', CAST(4979477778 AS Numeric(12, 0)))
INSERT [dbo].[Revised] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (5, 
N'Jorge', N'7 Walnut Street', CAST(88842643373 AS Numeric(12, 0)))
INSERT [dbo].[Revised] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (6, 
N'Ernest', N'176 Washington Street', CAST(17153094018 AS Numeric(12, 0)))
INSERT [dbo].[Revised] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (7, 
N'Edward', N'142 Main Street West', CAST(66115434358 AS Numeric(12, 0)))
INSERT [dbo].[Revised] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (8, 
N'Stella', N'51 Main Street East', CAST(94093532159 AS Numeric(12, 0)))
INSERT [dbo].[Revised] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (9, 
N'Nicholas', N'191 Church Street', CAST(54482064421 AS Numeric(12, 0)))
INSERT [dbo].[Revised] ([CustId], [CustName], [CustAddress], [CustPhone]) VALUES (10, 
N'Salvador', N'39 Main Street', CAST(94689656558 AS Numeric(12, 0)))
GO
SET IDENTITY_INSERT [dbo].[Revised] OFF
GO
 

This code creates the tables Original and Revised that hold customer data. At the moment they are completely different, which you can see since they are small. But what if these tables had thousands or millions of rows? Eyeballing them wouldn’t be possible. You’d need a different approach. Enter set-based operations!

此代码创建包含客户数据的原始表和修订表。 目前它们完全不同,由于它们很小,您可以看到。 但是,如果这些表具有数千行或数百万行呢? 盯着他们是不可能的。 您需要使用其他方法。 输入基于集合的操作!

基于集合的操作 (Set-based operations)

If you remember your computer science classes, you’ll no doubt recall studying sets as mathematical objects. Relational databases combine set theory with relational calculus. Put them together and you get relational algebra, the foundation of all RDBMS’s (Thanks and hats-off to E.F. Codd). That means that we can use set theory. Remember these set operations?

如果您还记得计算机科学课程,那么您无疑会记得将学习集视为数学对象。 关系数据库将集合论与关系演算相结合。 将它们放在一起,您将获得关系代数,这是所有RDBMS的基础(感谢EF Codd表示感谢)。 这意味着我们可以使用集合论。 还记得这些设置操作吗?

A ∪ B   Set union: Combine two sets into one

A∪B集并集:将两个集合并为一个

A ∩ B   Set intersection: The members that A and B have in common

A∩B设置交集:A和B共有的成员

A − B   Set difference: The members of A that are not in B

A − B集合差异:不在B中的A的成员

These have direct counterparts in SQL:

这些在SQL中具有直接对应项:

A ∪ B : UNION or UNION ALL (UNION eliminates duplicates, UNION ALL keeps them)

A∪B:UNION或UNION ALL(UNION消除重复项,UNION ALL保留重复项)

A ∩ B : INTERSECT

A∩B:相交

A − B : EXCEPT

A − B:除

We can use these to find out some things about our tables:

我们可以使用这些来查找有关表的一些信息:

 
SELECT CustId, CustName, CustAddress, CustPhone FROM Original
INTERSECT
SELECT CustId, CustName, CustAddress, CustPhone FROM Revised
 

Will show us what rows these two tables have in common (none, at the moment)

将向我们展示这两个表共有哪些行(目前没有)

 
SELECT CustId, CustName, CustAddress, CustPhone FROM Original
EXCEPT
SELECT CustId, CustName, CustAddress, CustPhone FROM Revised
 

Will show us all the rows of the Original table that are not in the Revised table (at the moment, that’s all of them).

将向我们显示原始表中不在修订表中的所有行(目前,所有行)。

Using these two queries, we can see if the tables are identical or what their differences may be. If the number of rows in the first query (INERSECT) is the same as the number of rows in the Original and Revised tables, they are identical, at least for tables having keys (since there can be no duplicates). Similarly, if the results from the second query (EXCEPT) are empty and the results from a similar query reversing the order of the selects is empty, they are equal. Saying it another way, if both sets have the same number of members and all members of one set are the same as all the members of the other set, they are equal.

使用这两个查询,我们可以查看这些表是否相同或它们之间的区别是什么。 如果第一个查询(INERSECT)中的行数与原始表和修订表中的行数相同,则至少在具有键的表中它们是相同的(因为不能有重复项)。 类似地,如果第二个查询(EXCEPT)的结果为空, 并且反转选择顺序的类似查询的结果为空,则它们相等。 换句话说,如果两个集合具有相同数目的成员,并且一个集合的所有成员与另一集合的所有成员相同,则它们是相等的。

非键表的挑战 (Challenges with non-keyed tables)

The tables we are working with are keyed so we know that each row must be unique in each table, since duplicate keys are not allowed. What about non-keyed tables? Here’s a simple example:

我们正在使用的表是带键的,因此我们知道每个表中的每一行都必须是唯一的,因为不允许重复的键。 非键表呢? 这是一个简单的例子:

 
DECLARE @t1 TABLE (a INT, b INT);
 
INSERT INTO @t1 VALUES
    (1, 2),
    (1, 2),
    (2, 3),
    (3, 4);
 
DECLARE @t2 TABLE (a INT, b INT);
 
INSERT INTO @t2 VALUES
    (1, 2),
    (2, 3),
    (2, 3),
    (3, 4);
SELECT * FROM @t1
EXCEPT
SELECT * FROM @t2;
 

The last query, using EXCEPT, returns an empty result. But the tables are different! The reason is that EXCEPT, INTERSECT and UNION eliminate duplicate rows. Now this query:

使用EXCEPT的最后一个查询返回空结果。 但是桌子不一样! 原因是EXCEPT,INTERSECT和UNION消除了重复的行。 现在这个查询:

 
SELECT * FROM @t1
INTERSECT
SELECT * FROM @t2;
 

Returns 3 rows:

返回3行:

These are the three rows that the two tables have in common. However, since each table has 4 rows, you know they are not identical. Checking non-keyed tables for equality is a challenge I’ll leave for a future article.

这是两个表共有的三行。 但是,由于每个表都有4行,所以您知道它们并不相同。 检查非键表是否相等是一个挑战,我将在以后的文章中讨论。

给我们的桌子一些共同点 (Giving our tables something in common)

Let’s go back to the first example using keyed tables. To make our comparisons interesting, let’s give our tables something in common:

让我们回到使用键表的第一个示例。 为了使我们的比较有趣,让我们给表一些共同点:

 
SET IDENTITY_INSERT Revised ON;
INSERT INTO Revised(CustId, CustName, CustAddress, CustPhone)
SELECT TOP 50 PERCENT CustId, CustName, CustAddress, CustPhone
FROM Original
ORDER BY NEWID();
SET IDENTITY_INSERT Revised OFF;
 

Here, we take about half the rows of the Original table and insert them into the Revised table. Using ORDER BY NEWID() makes the selection pseudo-random.

在这里,我们约占原始表的一半行,并将它们插入到修订表中。 使用ORDER BY NEWID()会使选择成为伪随机。

 
SET IDENTITY_INSERT Original ON;
INSERT INTO Original(CustId, CustName, CustAddress, CustPhone)
SELECT TOP 50 PERCENT CustId, CustName, CustAddress, CustPhone
FROM Revised
WHERE CustID NOT IN (SELECT CustId FROM Original)
ORDER BY NEWID();
SET IDENTITY_INSERT Original OFF;
 

This query takes some of the rows from the Revised table and inserts them into the Original table using a similar technique, while avoiding duplicates.

该查询从Revised表中获取一些行,并使用类似的技术将它们插入到Original表中,同时避免重复。

Now, the EXCEPT query is more interesting. Whichever table I put first, I should get 5 rows output. For example:

现在,EXCEPT查询更加有趣。 无论我放哪个表,我都应该得到5行输出。 例如:

 
SELECT CustId, CustName, CustAddress, CustPhone FROM Revised
EXCEPT
SELECT CustId, CustName, CustAddress, CustPhone FROM Original
 

Returns:

返回值:

Now the two tables also have 10 rows in common:

现在,两个表也共有10行:

 
SELECT CustId, CustName, CustAddress, CustPhone FROM Original
INTERSECT
SELECT CustId, CustName, CustAddress, CustPhone FROM Revised
 

Returns:

返回值:

Depending on the change being implemented, these results may be either good or bad. But at least now you have something to show for your efforts!

根据所实施的更改,这些结果可能是好是坏。 但是至少现在您可以为您的努力展示些东西!

逐行更改 (Row-to-row changes)

So far, we’ve only considered changes in whole rows. What if only certain columns are changing? For example, what if in the Revised table, for some customer id, the name or phone number changed? It would be great to be able to report the rows that changed and also to provide a summary of the number of changes by column and also some way to see what changed between two rows, not just visually, but programmatically. These sample tables are small and narrow. Imagine a table with 40 columns, not 4 and 1 million rows, not 10. Computing such a summary would be very tedious. I’m thinking about something like this:

到目前为止,我们仅考虑了整行的更改。 如果仅某些列正在更改怎么办? 例如,如果在修订表中某些客户ID的名称或电话号码更改了怎么办? 能够报告已更改的行,并提供按列的更改数量的摘要,以及以某种方式查看两行之间的更改的方式,这不仅是视觉上的,而且是编程上的,这将是很棒的。 这些样本表又小又窄。 想象一个表有40列,而不是4列和1百万行,而不是10。计算这样的摘要将非常繁琐。 我正在考虑这样的事情:

This shows me that there are 8 rows with the same customer id but different contents and that four of them have different phone numbers, two have different names and two have different addresses.

这说明有八行具有相同的客户ID但内容不同,并且其中四行具有不同的电话号码,两行具有不同的名称,两行具有不同的地址。

I’ll also want to produce a table of these differences that can be joined back to the Original and Revised tables.

我还将要生成一个包含这些差异的表,这些表可以重新加入到原始表和修订表中。

使用sys.columns检查可比性 (Checking comparability with sys.columns)

Just because two tables look the same at first glance, that doesn’t mean that they are the same. To do the kind of comparison I’m talking about here, I need to be sure that really are the same. To that end I can use the system catalog view sys.columns. This view returns a row for each column of an object that has columns, such as views or tables. Each row contains the properties of a single column, e.g. the column name, datatype, and column_id. See the references section for a link to the official documentation where you can find all the details.

乍看之下两个表是相同的,并不表示它们相同的。 为了进行这种比较,我需要确保它们确实相同。 为此,我可以使用系统目录视图sys.columns。 此视图为具有列的对象(例如视图或表)的每一列返回一行。 每行都包含单个列的属性,例如列名,数据类型和column_id。 请参阅参考资料部分,以获得指向官方文档的链接,您可以在其中找到所有详细信息。

Now, there are at least two columns in sys.columns that will likely be different: object_id, which is the object id of the table or view to which the column belongs, and default_object_id, which is the id of a default object for that column, should one exist. There are other id columns that may be different as well. When using these techniques in your own work, check the ones that apply.

现在,sys.columns中至少有两列可能会有所不同:object_id是该列所属的表或视图的对象ID,而default_object_id是该列的默认对象的ID ,应该存在一个。 其他id列也可能不同。 在自己的工作中使用这些技术时,请检查适用的技术。

How can we use sys.columns to see if two tables have the same schema? Consider this query:

我们如何使用sys.columns来查看两个表是否具有相同的架构? 考虑以下查询:

 
SELECT * FROM sys.columns
WHERE object_id = OBJECT_ID(N'Original')
 
EXCEPT
 
SELECT * FROM sys.columns
WHERE object_id = OBJECT_ID(N'Revised');
 

If the two tables really are identical, the above query would return no results. However, we need to think about the columns that may differ because they refer to other objects. You can eliminate the problem this way:

如果两个表确实相同,则上面的查询将不返回任何结果。 但是,我们需要考虑可能会有所不同的列,因为它们引用了其他对象。 您可以通过以下方式消除问题:

 
IF object_id(N'tempdb..#Source', N'U') IS NOT NULL   
    DROP TABLE #Source;
SELECT TOP (0) * INTO #Source FROM sys.columns;
 

This script will create a temporary table from sys.columns for the Original table. Note that the “SELECT INTO” in this snippet just copies the schema of sys.columns to a new, temporary table. Now, we can populate it like this:

该脚本将从sys.columns中为原始表创建一个临时表。 请注意,此代码段中的“ SELECT INTO”仅将sys.columns的架构复制到新的临时表中。 现在,我们可以像这样填充它:

 
INSERT INTO #Source
SELECT * 
FROM sys.columns c
WHERE c.object_id = OBJECT_ID(N'Original')
  --AND c.is_identity = 0
 

I’ve commented out the check for an identity column. You might want to exclude identity columns since they are system generated and are likely to differ between otherwise-identical tables. In that case, you’ll want to match on the business keys, not the identity column. In the working example though, I explicitly set the customer ids so this does not apply, though it very well might in the next comparison. Now, so we don’t compare the columns that we know will be different:

我已注释掉身份列的检查。 您可能要排除标识列,因为它们是系统生成的,并且在其他相同的表之间可能有所不同。 在这种情况下,您需要匹配业务键,而不是标识列。 不过,在工作示例中,我明确设置了客户ID,因此这不适用,尽管在下一次比较中可能会很好。 现在,所以我们不比较我们知道会有所不同的列:

 
  ALTER TABLE #Source DROP COLUMN object_id, default_object_id, -- and possibly others
 

Repeat the above logic for the target table (The Revised table in the working example). Then you can run:

对目标表(工作示例中的修订表)重复上述逻辑。 然后,您可以运行:

 
SELECT * FROM #Source
EXCEPT 
SELECT * FROM #Target;
 

In the working example, this will return no results. (In case you are wondering, SELECT * is fine in this case because of the way the temporary tables are created – the schema and column order will be the same for any given release of SQL Server.) If the query does return results, you’ll have to take those differences into account. For the purpose of this article, however, I expect no results.

在工作示例中,这将不返回任何结果。 (如果您想知道,SELECT *在这种情况下会很好,因为创建临时表的方式–任何给定版本SQL Server的架构和列顺序都是相同的。)如果查询确实返回了结果,则您将不得不考虑这些差异。 但是,出于本文的目的,我预计不会有任何结果。

创造差异 (Creating differences)

Starting with the Original and Revised tables, I’ll create some differences and leave some things the same:

从“原始”和“修订”表开始,我将创建一些差异并保留一些相同的内容:

 
WITH 
 
MixUpCustName(CustId, CustName, CustAddress, CustPhone) AS (
    SELECT TOP 50 PERCENT 
	   CustId, 
	   SUBSTRING(CustName,6, len(CustName)) + LEFT(CustName,5), 
	   CustAddress, 
	   CustPhone
    FROM Original ORDER BY NEWID()
),
 
MixUpCustAddress(CustId, CustName, CustAddress, CustPhone) AS (
    SELECT TOP 50 PERCENT 
	   CustId, 
	   CustName, 
	   SUBSTRING(CustAddress,6, len(CustAddress)) + LEFT(CustAddress,5), 
	   CustPhone
    FROM Original ORDER BY NEWID()
),
 
MixUpCustPhone(CustId, CustName, CustAddress, CustPhone) AS (
    SELECT TOP 50 PERCENT 
	   CustId, 
	   CustName, 
	   CustAddress, 
	   CAST(CustPhone / 100 AS int) + 42
    FROM Original ORDER BY NEWID()
),
 
MixItUp(CustId, CustName, CustAddress, CustPhone) AS 
(
    SELECT CustId, CustName, CustAddress, CustPhone 
    FROM MixUpCustName
    UNION
    SELECT CustId, CustName, CustAddress, CustPhone 
    FROM MixUpCustAddress
    UNION
    SELECT CustId, CustName, CustAddress, CustPhone 
    FROM MixUpCustPhone
)
 
-- Main query to mix up the data
 
UPDATE Revised
SET Revised.CustName    = MixItUp.CustName,
    Revised.CustAddress = MixItUp.CustAddress,
    Revised.CustPhone   = MixItUp.CustPhone
 
FROM Revised
JOIN MixItUp
ON Revised.CustId = MixItUp.CustId
 

This query mixes up the data in the columns of the Revised table in randomly-selected rows by simple transpositions and arithmetic. I use ORDER BY NEWID() again to perform pseudo-random selection.

此查询通过简单的换位和算术将Revised表的列中随机选择的行中的数据混合在一起。 我再次使用ORDER BY NEWID()来执行伪随机选择。

计算基本统计 (Computing basic statistics)

Now that we know the tables are comparable, we can easily compare them to produce some basic difference statistics. For example:

现在我们知道这些表是可比较的,我们可以轻松地比较它们以产生一些基本差异统计信息。 例如:

 
WITH 
InOriginal AS (
    SELECT  * FROM Original EXCEPT SELECT * FROM Revised
),
 
InRevised AS (
    SELECT * FROM Revised EXCEPT SELECT * FROM Original
),
 
InBoth AS (
    SELECT * FROM Revised INTERSECT SELECT * from Original
)
    
SELECT 
    (SELECT COUNT(*) FROM Original) AS Original,
    (SELECT COUNT(*) FROM Revised)  AS Revised,
    (SELECT COUNT(*) FROM InOriginal) AS InOriginal,
    (SELECT COUNT(*) FROM InRevised)  AS InRevised,
    (SELECT COUNT(*) FROM InBoth)   AS InBoth;
 

Returns:

返回值:

for the working example. However, I want to go deeper!

作为工作示例。 但是,我想更深入!

使用SELECT…EXCEPT查找列差异 (Using SELECT … EXCEPT to find column differences)

Since SQL uses three-value logic (True, False and Null), you might have written something like this to compare two columns:

由于SQL使用三值逻辑(True,False和Null),因此您可能已经编写了如下内容来比较两列:

 
WHERE a.col <> b.col OR a.col IS NULL AND b.col IS NOT NULL OR a.col IS NOT NULL 
and b.col IS NULL
 

To check if columns from two tables are different. This works of course, but here is a simpler way!

检查两个表中的列是否不同。 当然可以,但是这是一种更简单的方法!

WHERE NOT EXISTS (SELECT a.col EXCEPT SELECT b.col)

This is much easier to write, is DRYer (DRY = Don’t Repeat Yourself) and takes care of the complicated logic in the original WHERE clause. On top of that, this does not cause a performance problem or make for a suddenly-complicated execution plan. The reason is that the sub query is comparing columns from two rows that are being matched, as in a JOIN for example. You can use this technique anywhere you need a simple comparison and columns (or variables) are nullable.

这是更容易编写的,是DRYer(DRY =不要重复自己),并且可以处理原始WHERE子句中的复杂逻辑。 最重要的是,这不会导致性能问题或执行突然复杂的执行计划。 原因是子查询正在比较匹配的两行中的列,例如在JOIN中。 您可以在需要简单比较的任何地方使用此技术,并且列(或变量)可以为空。

Now, applying this to the working example, consider this query:

现在,将其应用于工作示例,请考虑以下查询:

 
IF  OBJECT_ID(N'tempdb..#diffcols', N'U') IS NOT NULL
    DROP TABLE #diffcols;
 
SELECT 
    src.CustId,
    CONCAT(
    IIF(EXISTS(SELECT src.CustId       EXCEPT SELECT tgt.CustId      ), RTRIM(', CustId      
'), ''),
    IIF(EXISTS(SELECT src.CustName     EXCEPT SELECT tgt.CustName    ), RTRIM(', CustName    
'), ''),
    IIF(EXISTS(SELECT src.CustAddress  EXCEPT SELECT tgt.CustAddress ), RTRIM(', 
CustAddress '), ''),
    IIF(EXISTS(SELECT src.CustPhone    EXCEPT SELECT tgt.CustPhone   ), RTRIM(', 
CustPhone   '), '')) + ', '
AS cols
INTO #diffcols
FROM Original src
JOIN Revised tgt
  ON src.CustId = tgt.CustId
WHERE EXISTS (SELECT src.* EXCEPT SELECT tgt.*)
;
 

At its heart, this query joins the Original and Revised tables on customer id. For each pair of rows joined, the query returns a new column (called ‘cols’) that contains a concatenated list of column names if the columns differ. The query uses the technique just described to make things compact and easy to read. Since the query creates a new temporary table, let’s look at the contents:

从本质上讲,此查询将客户ID的原始表和修订表连接起来。 对于每对连接的行,查询将返回一个新列(称为“ cols”), 如果列不同,则该列将包含一个列名称的串联列表 该查询使用刚刚描述的技术使内容紧凑且易于阅读。 由于查询创建了一个新的临时表,让我们看一下内容:

The eight rows that differ, differ in specific columns. For each row that differs, we have a CSV list of column names. (The leading and trailing commas make it easy to pick out column names, both visually and programmatically). Although there is only one column listed for each customer id, there could be multiple columns listed and in a real-world scenario likely would be. Since the newly-created temp table has the customer id in it, you can easily do a three-way join between the Original and Revised tables to see the context and content of these changes, e.g.

八行不同,特定的列不同。 对于每一行,我们都有一个CSV列名列表。 (逗号前后都可以很容易地从视觉和程序上选择列名)。 尽管每个客户ID仅列出了一个列,但是可能会列出多个列,并且在实际情况中可能是这样。 由于新创建的临时表中包含客户ID,因此您可以轻松地在原始表和修订表之间进行三向联接,以查看这些更改的上下文和内容,例如

 
SELECT *
FROM Original o
JOIN Revised r   ON o.CustId = r.CustId
JOIN #diffcols d ON o.CustId = d.CustId
 

Returns:

返回值:

For a very-wide table, this makes it easy to zero in on the differences, since for each row, you have a neat list of the columns that differ and you can select them accordingly.

对于一个非常宽的表,可以很容易地将差异归零,因为对于每一行,您都有一个整齐的差异列列表,您可以相应地选择它们。

生成明细差异统计 (Generating detail difference statistics)

There’s one other thing I can do with the temp table we just created. I can produce the table of differences I wanted. Here, I’m using those commas I inserted in the CSV column list to make it easy to search using a LIKE operator.

我刚创建的临时表可以做另一件事。 我可以生成我想要的差异表。 在这里,我使用在CSV列列表中插入的逗号来简化使用LIKE运算符的搜索。

 
WITH src AS (
    SELECT 
    	SUM(IIF(d.cols LIKE '%, CustId, %'       , 1, 0)) AS CustId,
    	SUM(IIF(d.cols LIKE '%, CustName, %'     , 1, 0)) AS CustName,
    	SUM(IIF(d.cols LIKE '%, CustAddress, %'  , 1, 0)) AS CustAddress,
    	SUM(IIF(d.cols LIKE '%, CustPhone, %'    , 1, 0)) AS CustPhone
    FROM #diffcols d
)
SELECT ca.col AS ColumnName, ca.diff AS [Difference Count]
FROM src
CROSS APPLY ( VALUES
    ('CustId      ',CustId      ),
    ('CustName    ',CustName    ),
    ('CustAddress ',CustAddress ),
    ('CustPhone   ',CustPhone   )
) ca(col, diff)
WHERE diff > 0
ORDER BY diff desc
;
 

This is an interesting query because I am not returning anything from the temp #diffcols table. Instead I use that table to create the sums of the differences then arrange the finally result using CROSS APPLY. You could do the same thing with UNPIVOT, but the CROSS APPLY VALUES syntax is shorter to write and easier on the eyes. Readability is always important, regardless of the language. There’s an interesting web site dedicated to writing obfuscated C. It’s interesting to see how much you can get done with a write-only program (one that you can’t read and make sense of). Don’t write obfuscated SQL, though!

这是一个有趣的查询,因为我没有从temp #diffcols表中返回任何内容。 相反,我使用该表创建差异的总和,然后使用CROSS APPLY安排最终结果。 您可以使用UNPIVOT进行相同的操作,但是CROSS APPLY VALUES语法编写起来更短,并且更容易理解。 无论语言如何,可读性始终很重要。 有一个有趣的网站致力于编写混淆的 C。很有趣的是,看到一个只写程序可以完成多少工作(一个您无法阅读和理解的程序)。 但是,请不要编写混淆SQL!

This query returns:

该查询返回:

Just what I wanted!

正是我想要的!

创建一个存储过程来帮助 (Creating a stored procedure to help)

If you’ve been following along, you’ve probably already realized that this will be tedious to write for anything but a trivial example. Help is on the way! I’ve included a stored procedure that constructs the queries discussed in this article, using nothing more than the table names and a list of key columns to use for the join predicate, that you are free to use or modify to suit.

如果您一直在遵循,那么您可能已经意识到,除了一个琐碎的示例之外,编写任何内容都将很繁琐。 帮助正在路上! 我已经包含了一个存储过程,该存储过程用于构造本文中讨论的查询,仅使用表名和用于连接谓词的键列列表,您可以随意使用或修改它们以适应需要。

摘要 (Summary)

Comparing tables or query results is a necessary part of database development. Whether you are modifying an application that should change the result or making a change that should not change the result, you need tools to do this efficiently.

比较表或查询结果是数据库开发的必要部分。 无论您是要修改应该更改结果的应用程序,还是要进行不 应该更改结果的更改, 需要工具来有效地做到这一点。

资料下载 (Downloads)

参考资料 (References)

翻译自: https://www.sqlshack.com/compare-tables-sql-server/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值