利用T-SQL动态定义重复SQL Server数据库表行

介绍 (Introduction)

Lever T-SQL to handle duplicate rows in SQL Server database tables article highlighted T-SQL features that detect and handle duplicate SQL Server table rows. The techniques work well, but they rely on fixed duplicate row definitions. This article extends those techniques, showing how to define duplicate rows in a dynamic way.

利用T-SQL处理SQL Server数据库表中的重复行的文章重点介绍了可检测和处理SQL Server表重复行的 T-SQL功能。 该技术很好用,但是它们依赖于固定的重复行定义。 本文扩展了这些技术,展示了如何以动态方式定义重复的行。

样本数据库 (The Sample Database)

For this article, we’ll slightly modify the OFFICE_EQUIPMENT_DATABASE database,first described in the earlier article. I built this new database version in SQL Server 2014 Standard Edition, on an updated Windows 10 PC. For this article, we’ll add some rows to the original OFFICE_EQUIPMENT table, and we’ll add a stored procedure. Run this script to build the complete database:

对于本文,我们将略微修改OFFICE_EQUIPMENT_DATABASE数据库,该数据库在上一篇文章中首次介绍。 我在更新的Windows 10 PC上SQL Server 2014 Standard Edition中构建了这个新的数据库版本。 对于本文,我们将向原始OFFICE_EQUIPMENT表添加一些行,并添加存储过程。 运行此脚本以构建完整的数据库:

USE [master]
GO
 
--	This script will place
--
--		OFFICE_EQUIPMENT_DATABASE
--
--	in the
--
--		C:\OFFICE_EQUIPMENT_DATABASE
--
--	directory. The
--
--		xp_create_subdir
--
--	will create this directory. However, because of its extended stored procedure status,
--	Microsoft might scrap it, but it still works as of April, 2020.
 
--	EXEC xp_create_subdir N'C:\OFFICE_EQUIPMENT_DATABASE'
 
/****** Object:  Database [OFFICE_EQUIPMENT_DATABASE]    Script Date: 4/27/2020 1:49:59 PM ******/
CREATE DATABASE [OFFICE_EQUIPMENT_DATABASE]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'OFFICE_EQUIPMENT_DATABASE', FILENAME = N'C:\OFFICE_EQUIPMENT_DATABASE\OFFICE_EQUIPMENT_DATABASE.mdf' , SIZE = 4096KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'OFFICE_EQUIPMENT_DATABASE_log', FILENAME = N'C:\OFFICE_EQUIPMENT_DATABASE\OFFICE_EQUIPMENT_DATABASE_log.ldf' , SIZE = 1024KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO
 
USE [OFFICE_EQUIPMENT_DATABASE]
 
CREATE TABLE [dbo].[OFFICE_EQUIPMENT](
	[OFFICE_EQUIPMENT_ID] [int] NOT NULL,
	[OFFICE_EQUIPMENT_NAME] [nvarchar](750) NULL,
	[OFFICE_EQUIPMENT_DESCRIPTION] [nvarchar](max) NULL,
	[PURCHASE_PRICE] [money] NULL,
	[PURCHASE_QUANTITY] [int] NULL,
	[PURCHASE_DATE] [datetime] NULL,
 CONSTRAINT [PK_OFFICE_EQUIPMENT] PRIMARY KEY CLUSTERED 
(
	[OFFICE_EQUIPMENT_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
 
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (1, N'PRINTER PAPER', N'20 LB. PRINTER PAPER (ONE REAM)', 2.4900, 7, CAST(N'2019-03-17 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (2, N'PEN', N'BIC BALLPOINT PEN MEDIUM (BLUE)', 0.7000, 42, CAST(N'2020-01-08 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (3, N'PAPER CLIP', N'MEDIUM AND JUMBO PAPER CLIPS', 3.7900, 118, CAST(N'2019-11-12 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (4, N'STAPLER', N'SWINGLINE STAPLER - 20 SHEET CAPACITY', 5.1100, 3, CAST(N'2018-10-01 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (5, N'ENVELOPE', N'#10 BUSINESS SECURITY ENVELOPES - SINGLE WINDOW', 0.0400, 500, CAST(N'2019-08-22 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (6, N'PENCIL', N'#2 PENCIL', 0.0800, 150, CAST(N'2020-02-17 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (7, N'ERASER', N'TICONDEROGA PINK ERASER', 1.6700, 3, CAST(N'2020-01-22 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (8, N'PEN', N'BIC BALLPOINT PEN MEDIUM (BLUE)', 0.7000, 42, CAST(N'2020-01-08 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (9, N'PAPER CLIP', N'MEDIUM AND JUMBO PAPER CLIPS', 3.7900, 118, CAST(N'2019-11-12 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (10, N'PAPER CLIP', N'MEDIUM AND JUMBO PAPER CLIPS', 3.7900, 118, CAST(N'2019-11-12 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (11, N'PAPER CLIP', N'MEDIUM AND JUMBO PAPER CLIPS', 3.7900, 120, CAST(N'2019-11-12 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (12, N'PRINTER PAPER', N'20 LB. PRINTER PAPER (ONE REAM)', 2.4900, 7, CAST(N'2019-03-17 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (13, N'PRINTER PAPER', N'20 LB. PRINTER PAPER (ONE REAM)', 2.4900, 7, CAST(N'2019-03-17 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (14, N'PRINTER PAPER', N'20 LB. PRINTER PAPER (ONE REAM)', 2.4900, 7, CAST(N'2018-03-17 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (15, N'PAPER CLIP', N'MEDIUM AND JUMBO PAPER CLIPS', 3.8900, 118, CAST(N'2019-11-12 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (16, N'PRINTER PAPER', N'35 LB. PRINTER PAPER (ONE REAM)', 2.4900, 7, CAST(N'2019-11-12 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (17, N'PRINTER PAPER', N'20 LB. PRINTER PAPER (ONE REAM)', 3.4900, 22, CAST(N'2020-01-18 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (18, N'PAPER CLIP', N'SMALL AND MEDIUM PAPER CLIPS', 3.1900, 8, CAST(N'2019-01-28 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (19, N'ENVELOPE', N'PRE-ADDRESSED AND PRE-STAMPED', 0.0500, 750, CAST(N'2019-08-22 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (20, N'PAPER CLIPS', N'MEDIUM AND JUMBO PAPER CLIPS', 3.7900, 8, CAST(N'2019-11-12 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (21, N'PEN', N'PENTEL BALLPOINT PEN (RED)', 0.7000, 42, CAST(N'2019-01-08 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (22, N'PENCIL', N'#2 PENCIL', 0.0800, 150, CAST(N'2020-02-17 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (23, N'GRAY STAPLER', N'SWINGLINE STAPLER - 20 SHEET CAPACITY', 5.1100, 3, CAST(N'2018-10-01 00:00:00.000' AS DateTime))
GO
INSERT [dbo].[OFFICE_EQUIPMENT] ([OFFICE_EQUIPMENT_ID], [OFFICE_EQUIPMENT_NAME], [OFFICE_EQUIPMENT_DESCRIPTION], [PURCHASE_PRICE], [PURCHASE_QUANTITY], [PURCHASE_DATE]) VALUES (24, N'GRAY STAPLER', N'SWINGLINE STAPLER - 20 SHEET CAPACITY', 5.1100, 3, CAST(N'2018-10-01 00:00:00.000' AS DateTime))
GO
/****** Object:  StoredProcedure [dbo].[DYNAMIC_ROW_NUMBER_PARTITIONS]    Script Date: 4/27/2020 1:49:59 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[DYNAMIC_ROW_NUMBER_PARTITIONS]
 
	@PARTITION_COLUMN_LIST int
 
AS
 
/*
	To use: 
 
			EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 000001
			EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001
			EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001011
			EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 010001
 */
 
SELECT		OFFICE_EQUIPMENT_ID, OFFICE_EQUIPMENT_NAME, OFFICE_EQUIPMENT_DESCRIPTION,
			PURCHASE_PRICE, PURCHASE_QUANTITY, PURCHASE_DATE, ROW_NUM
FROM
(
	SELECT	OFFICE_EQUIPMENT_ID, OFFICE_EQUIPMENT_NAME, OFFICE_EQUIPMENT_DESCRIPTION,
			PURCHASE_PRICE, PURCHASE_QUANTITY, PURCHASE_DATE,
			ROW_NUMBER() OVER	(
									PARTITION BY	IIF(RIGHT((@PARTITION_COLUMN_LIST / 100000), 1) = 1, OFFICE_EQUIPMENT_ID, NULL),
													IIF(RIGHT((@PARTITION_COLUMN_LIST / 10000), 1) = 1, OFFICE_EQUIPMENT_NAME, NULL),
													IIF(RIGHT((@PARTITION_COLUMN_LIST / 1000), 1) = 1, OFFICE_EQUIPMENT_DESCRIPTION, NULL),
													IIF(RIGHT((@PARTITION_COLUMN_LIST / 100), 1) = 1, PURCHASE_PRICE, NULL),
													IIF(RIGHT((@PARTITION_COLUMN_LIST / 10), 1) = 1, PURCHASE_QUANTITY, NULL),
													IIF(RIGHT(@PARTITION_COLUMN_LIST, 1) = 1, PURCHASE_DATE, NULL)
 
									ORDER BY		OFFICE_EQUIPMENT_DESCRIPTION	--	THIS ORDER BY DOES NOT SORT THE OUTER RESULT SET
								)	AS ROW_NUM
	FROM	OFFICE_EQUIPMENT
) TMP
WHERE		ROW_NUM > 1;

See the earlier article for more details about the script, the table, and the table structure. This article will focus on the T-SQL features in the stored procedure, and will describe how these features offer a flexible, dynamic way to define duplicate table rows.

有关脚本,表和表结构的更多详细信息,请参见前面的文章。 本文将重点介绍存储过程中的T-SQL功能,并将描述这些功能如何提供一种灵活,动态的方式来定义重复的表行。

运行中的存储过程 (The Stored Procedure in action)

The OFFICE_EQUIPMENT_DATABASE database has one stored procedure DYNAMIC_ROW_NUMBER_PARTITIONS with this code:

OFFICE_EQUIPMENT_DATABASE数据库具有一个存储过程DYNAMIC_ROW_NUMBER_PARTITIONS,其代码如下:

CREATE PROCEDURE [dbo].[DYNAMIC_ROW_NUMBER_PARTITIONS]
 
	@PARTITION_COLUMN_LIST int
 
AS
 
/*
	To use: 
 
			EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 000001
			EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001
			EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001011
			EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 010001
 */
 
SELECT		OFFICE_EQUIPMENT_ID, OFFICE_EQUIPMENT_NAME, OFFICE_EQUIPMENT_DESCRIPTION,
			PURCHASE_PRICE, PURCHASE_QUANTITY, PURCHASE_DATE, ROW_NUM
FROM
(
	SELECT	OFFICE_EQUIPMENT_ID, OFFICE_EQUIPMENT_NAME, OFFICE_EQUIPMENT_DESCRIPTION,
			PURCHASE_PRICE, PURCHASE_QUANTITY, PURCHASE_DATE,
			ROW_NUMBER() OVER	(
									PARTITION BY	IIF(RIGHT((@PARTITION_COLUMN_LIST / 100000), 1) = 1, OFFICE_EQUIPMENT_ID, NULL),
													IIF(RIGHT((@PARTITION_COLUMN_LIST / 10000), 1) = 1, OFFICE_EQUIPMENT_NAME, NULL),
													IIF(RIGHT((@PARTITION_COLUMN_LIST / 1000), 1) = 1, OFFICE_EQUIPMENT_DESCRIPTION, NULL),
													IIF(RIGHT((@PARTITION_COLUMN_LIST / 100), 1) = 1, PURCHASE_PRICE, NULL),
													IIF(RIGHT((@PARTITION_COLUMN_LIST / 10), 1) = 1, PURCHASE_QUANTITY, NULL),
													IIF(RIGHT(@PARTITION_COLUMN_LIST, 1) = 1, PURCHASE_DATE, NULL)
 
									ORDER BY		OFFICE_EQUIPMENT_DESCRIPTION	--	THIS ORDER BY DOES NOT SORT THE OUTER RESULT SET
								)	AS ROW_NUM
	FROM	OFFICE_EQUIPMENT
)	TMP
WHERE		ROW_NUM > 1;

The DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, with a T-SQL PARTITION BY clause that levers the IIF() function.

As seen in the above screenshot, the stored procedure has one integer data type parameter @PARTITION_COLUMN_LIST at line 4. We can manually “paint” and run lines 11 through 14 to test the stored procedure. This screenshot runs line 11 EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 000001 with a 000001 argument value:

如上面的屏幕快照所示,该存储过程在第4行有一个整数数据类型参数@PARTITION_COLUMN_LIST 。我们可以手动“绘制”并运行11至14行以测试存储过程。 此屏幕快照运行第11行EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 000001 ,其参数为000001:

Testing the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, with a T-SQL EXEC statement.

The result set defined duplicate rows based on the PURCHASE_DATE column values. Later on, we’ll see how different argument values drive these definitions. This screenshot runs line 13 EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001011 with a 001011 argument value:

结果集基于PURCHASE_DATE列值定义了重复行。 稍后,我们将看到不同的参数值如何驱动这些定义。 此屏幕快照运行第13行EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001011,其参数值为001011:

Testing the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, with a T-SQL EXEC statement.

The result set defined duplicate rows based on the below column values:

结果集基于以下列值定义了重复行:

  • OFFICE_EQUIPMENT_DESCRIPTION

    OFFICE_EQUIPMENT_DESCRIPTION
  • PURCHASE_QUANTITY

    购买数量
  • PURCHASE_DATE

    购买日期

The stored procedure filters out rows with ROW_NUM values below 1, with the T-SQL WHERE clause at line 35. The stored procedure does not necessarily need this filter, but with it, we can more clearly see how the engineering works.

存储过程使用第35行的T-SQL WHERE子句过滤掉ROW_NUM值小于1的行。存储过程不一定需要此过滤器,但是有了它,我们可以更清楚地了解工程原理。

If we comment out line 35 and run line 12 EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001, the result set defines duplicate rows based on the OFFICE_EQUIPMENT_DESCRIPTION PURCHASE_DATE column values. The stored procedure returns a result set with 24 rows, as shown in this screenshot:

如果我们注释掉第35行并运行第12行EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001,则结果集将基于OFFICE_EQUIPMENT_DESCRIPTION PURCHASE_DATE列值来定义重复的行。 该存储过程返回一个包含24行的结果集,如以下屏幕截图所示:

Comment the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, and then run it.

If we restore the line 35 T-SQL WHERE clause and again run line 12, the stored procedure returns the result set shown in this screenshot:

如果我们还原第35行的T-SQL WHERE子句并再次运行第12行,则存储过程将返回此屏幕快照中所示的结果集:

A DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure result set.

If we comment out line 35 and run the stored procedure with line 14 EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 010001 the result set defines duplicate rows based on the OFFICE_EQUIPMENT_NAME and PURCHASE_DATE column values. Compared to EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001 this result set looks very different, but it still returns 24 rows, as shown in this screenshot:

如果我们注释掉第35行,并使用第14行EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 010001运行存储过程,则结果集将基于OFFICE_EQUIPMENT_NAME和PURCHASE_DATE列值定义重复的行。 与EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001相比,此结果集看起来非常不同,但仍返回24行,如以下屏幕快照所示:

Comment the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, and then run it.

If we restore the line 35 T-SQL WHERE clause and again run line 14 EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 010001, we will get a different result set, as seen in this screenshot:

如果我们还原第35行T-SQL WHERE子句并再次运行第14行EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 010001,我们将获得不同的结果集,如以下屏幕快照所示:

A DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure result set.

Clearly, the line 35 WHERE clause helps show how the stored procedure works, because when we disable it, the stored procedure returns 24 rows, for all input parameter values. In these 24-row result sets, the ROW_NUM values show that the stored procedure builds different “groups” based on the different @PARTITION_COLUMN_LIST values. This behavior makes it harder to see how the stored procedure operates.

显然,第35行WHERE子句有助于说明存储过程的工作方式,因为当我们禁用它时,对于所有输入参数值,存储过程都会返回24行。 在这些24行结果集中,ROW_NUM值表明存储过程基于不同的@PARTITION_COLUMN_LIST值构建不同的“组”。 此行为使得很难查看存储过程如何运行。

存储过程工程 (The Stored Procedure Engineering)

In this article, the first screenshot showed the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, but as we study its engineering, it will help to see the screenshot again:

在本文中,第一个屏幕截图显示了DYNAMIC_ROW_NUMBER_PARTITIONS存储过程,但是,当我们研究其工程技术时,将有助于再次查看该屏幕截图:

The DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, with a T-SQL PARTITION BY clause that levers the IIF() function.

Lever T-SQL to handle duplicate rows in SQL Server database tables article explained that a T-SQL query with ROW_NUMBER() and PARTITION BY clauses can’t directly filter its result set with a WHERE clause. However, the article also showed that if we place that query in a FROM clause subquery, and then SELECT from it, we can add a WHERE clause to the outer query. With this approach, we can filter the result set. In the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure, the subquery, between lines 21 and 31, has a ROW_NUMBER() clause at line 23. It also has a PARTITION BY clause between lines 24 and 29. In the stored procedure, the line 35 WHERE clause filters the result set that the subquery returns. At lines 17 and 18, the outer query T‑SQL SELECT clause selects columns from the inner subquery. Line 34 aliases the subquery as ‘TMP’.

利用T-SQL处理SQL Server数据库表中的重复行的文章解释说,使用ROW_NUMBER()和PARTITION BY子句的T-SQL查询不能直接使用WHERE子句过滤其结果集。 但是,该文章还显示,如果将查询放在FROM子句子查询中,然后从中进行SELECT,则可以将WHERE子句添加到外部查询中。 使用这种方法,我们可以过滤结果集。 在DYNAMIC_ROW_NUMBER_PARTITIONS存储过程中,在第21行和第31行之间的子查询在第23行具有ROW_NUMBER()子句。在第24行和第29行之间还具有PARTITION BY子句。在存储过程中,第35行WHERE子句过滤了子查询返回的结果集。 在第17和18行,外部查询T-SQL SELECT子句从内部子查询中选择列。 第34行将子查询别名为“ TMP”。

The PARTITION BY clause, seen here between lines 24 and 29, builds the duplicate row definition. Each PARTITION BY clause line extracts a specific parameter digit, and indirectly maps to one column in the SELECT clause of lines 21 and 22. We’ll focus on line 25 as an example, and work from the inside out to see the technique.

在第24和29行之间可以看到PARTITION BY子句,它建立了重复的行定义。 每条PARTITION BY子句行提取一个特定的参数数字,并间接映射到第21行和第22行的SELECT子句中的一列。我们将以第25行为例,并从内而外地了解该技术。

The DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure returns result sets with these six columns:

DYNAMIC_ROW_NUMBER_PARTITIONS存储过程返回具有以下六列的结果集:

  1. OFFICE_EQUIPMENT_ID

    OFFICE_EQUIPMENT_ID
  2. OFFICE_EQUIPMENT_NAME

    OFFICE_EQUIPMENT_NAME
  3. OFFICE_EQUIPMENT_DESCRIPTION

    OFFICE_EQUIPMENT_DESCRIPTION
  4. PURCHASE_PRICE

    购买价格
  5. PURCHASE_QUANTITY

    购买数量
  6. PURCHASE_DATE

    购买日期

The test code examples of lines 11 to 14 use six-digit integer arguments:

第11至14行的测试代码示例使用六位整数参数:

EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 000001
EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001
EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001011
EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 010001

In the T-SQL PARTITION BY clause, we can map the individual digits in the arguments to specific columns. Then, the PARTITION BY clause can use those values to define duplicate rows, if we use only ones or zeros for each digit in the original argument values.

在T-SQL PARTITION BY子句中,我们可以将参数中的各个数字映射到特定的列。 然后,如果我们对原始参数值中的每个数字仅使用一或零,则PARTITION BY子句可以使用这些值来定义重复的行。

For example, EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001 sets @PARTITION_COLUMN_LIST to 001001. With this value, line 25

例如,EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001将@PARTITION_COLUMN_LIST设置为001001。使用此值,第25行

IIF(RIGHT((@PARTITION_COLUMN_LIST / 10000), 1) = 1, OFFICE_EQUIPMENT_NAME, NULL)

maps digit 2 to the OFFICE_EQUIPMENT_NAME column. We’ll unpack this line from the inside out.

将数字2映射到OFFICE_EQUIPMENT_NAME列。 我们将从内到外解压缩此行。

The @PARTITION_COLUMN_LIST / 10000 calculation divides @PARTITION_COLUMN_LIST by 10000, and truncates, or throws away, the remainder. Dividing a six-digit integer by 10000 in this way removes the four digits on the right, as seen in this screenshot:

@PARTITION_COLUMN_LIST / 10000计算将@PARTITION_COLUMN_LIST除以10000,然后截断或丢弃其余部分。 以这种方式将六位整数除以10000会删除右边的四位数字,如以下屏幕快照所示:

Example T-SQL statements: the / (DIVISION) operator.

The next function RIGHT((@PARTITION_COLUMN_LIST / 10000), 1) extracts the first right digit of 00100 – in this case, a zero (0). The third query in the previous screenshot shows that we need the T‑SQL RIGHT() function, to extract the specific right-most digit. Finally, the IIF() function on the “outside” looks at the value that the T-SQL RIGHT() function returns, as seen here:

下一个函数RIGHT((@ PARTITION_COLUMN_LIST / 10000),1)提取第一个右数字00100-在这种情况下为零(0)。 上一个屏幕快照中的第三个查询显示,我们需要T‑SQL RIGHT()函数来提取特定的最右边的数字。 最后,“外部”上的IIF()函数查看T-SQL RIGHT()函数返回的值,如下所示:

IIF(RIGHT((@PARTITION_COLUMN_LIST / 10000), 1) = 1, OFFICE_EQUIPMENT_NAME, NULL)

The IIF() function short-hands the SQL Server CASE expression. As used here, if the RIGHT() function returns 1, or TRUE, IIF() returns OFFICE_EQUIPMENT_NAME and places the OFFICE_EQUIPMENT_NAME column in the PARTITION BY column list. When the RIGHT function returns 0, or FALSE, IIF() returns NULL. The stored procedure ignores these NULL values as it dynamically builds the PARTITION BY clause, and when IIF() returns a NULL value, the stored procedure never sees the column name in the IIF() function. As a special case, line 29

IIF()函数是SQL Server CASE表达式的简称。 如此处所用,如果RIGHT()函数返回1或TRUE,则IIF()返回OFFICE_EQUIPMENT_NAME并将OFFICE_EQUIPMENT_NAME列放置在PARTITION BY列列表中。 当RIGHT函数返回0或FALSE时,IIF()返回NULL。 当存储过程动态生成PARTITION BY子句时,该存储过程将忽略这些NULL值,并且当IIF()返回NULL值时,该存储过程从不会在IIF()函数中看到列名。 作为特殊情况,第29行

IIF(RIGHT(@PARTITION_COLUMN_LIST, 1) = 1, PURCHASE_DATE, NULL)

avoids the division calculation, because the RIGHT() function can directly extract the far-right digit of the @PARTITION_COLUMN_LIST value. With this technique, a six-digit integer maps to all six columns. For example, 001001 maps to all six columns as seen in this diagram:

避免除法计算,因为RIGHT()函数可以直接提取@PARTITION_COLUMN_LIST值的最右边的数字。 使用此技术,六位数整数映射到所有六列。 例如,如下图所示, 001001映射到所有六列:

Mapping @PARTITION_COLUMN_LIST component digits to DYNAMIC_ROW_NUMBER_PARTITIONS subquery columns.

The T-SQL PARTITION BY clause requires at least one column in the line 31 ORDER BY clause, but this ORDER BY clause does not operate on the finished result set. It independently operates inside each row partition, sorting by the columns specified in its column list. As explained above, the line 35 T-SQL WHERE clause removes rows with ROW_NUM values of 1.

T-SQL PARTITION BY子句在第31行的ORDER BY子句中至少需要一列,但是此ORDER BY子句不能对完成的结果集进行操作。 它在每个行分区内独立运行,并按其列列表中指定的列进行排序。 如上所述,第35行T-SQL WHERE子句删除ROW_NUM值为1的行。

PARTITION BY列的顺序 (The Order of PARTITION BY Columns)

The technique behind this stored procedure, in part, involves the below:

此存储过程背后的技术部分涉及以下内容:

  1. The PARTITION BY column list, between lines 24 and 29

    第24和29行之间的PARTITION BY列列表
  2. The ORDER BY clause, at line 31

    第31行的ORDER BY子句
  3. The WHERE clause, at line 35

    WHERE子句,第35行

We can easily add an ORDER BY clause after line 35. The stored procedure, and the engineering behind it, all work in a consistent way. However, we should keep the listed aspects in mind, because stored procedures that use them can build result sets that might seem unexpected.

我们可以在第35行之后轻松添加ORDER BY子句。存储过程及其背后的工程都以一致的方式工作。 但是,我们应该牢记列出的方面,因为使用它们的存储过程可能会生成结果集,这看起来似乎是意外的。

CREATE PROCEDURE [dbo].[DYNAMIC_ROW_NUMBER_PARTITIONS]
 
@PARTITION_COLUMN_LIST int
 
AS
 
/*
To use: 
 
EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 000001
			EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001
			EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001011
			EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 010001
 */
 
SELECT	OFFICE_EQUIPMENT_ID, OFFICE_EQUIPMENT_NAME, OFFICE_EQUIPMENT_DESCRIPTION,
			PURCHASE_PRICE, PURCHASE_QUANTITY, PURCHASE_DATE, ROW_NUM
FROM
(
SELECT	OFFICE_EQUIPMENT_ID, OFFICE_EQUIPMENT_NAME, OFFICE_EQUIPMENT_DESCRIPTION,
			PURCHASE_PRICE, PURCHASE_QUANTITY, PURCHASE_DATE,
			ROW_NUMBER() OVER	(
								PARTITION BY	IIF(RIGHT((@PARTITION_COLUMN_LIST / 100000), 1) = 1, OFFICE_EQUIPMENT_ID, NULL),
												IIF(RIGHT((@PARTITION_COLUMN_LIST / 10000), 1) = 1, OFFICE_EQUIPMENT_NAME, NULL),
												IIF(RIGHT((@PARTITION_COLUMN_LIST / 1000), 1) = 1, OFFICE_EQUIPMENT_DESCRIPTION, NULL),
												IIF(RIGHT((@PARTITION_COLUMN_LIST / 100), 1) = 1, PURCHASE_PRICE, NULL),
												IIF(RIGHT((@PARTITION_COLUMN_LIST / 10), 1) = 1, PURCHASE_QUANTITY, NULL),
												IIF(RIGHT(@PARTITION_COLUMN_LIST, 1) = 1, PURCHASE_DATE, NULL)
 
								ORDER BY		OFFICE_EQUIPMENT_DESCRIPTION	--	THIS ORDER BY DOES NOT SORT THE OUTER RESULT SET
							)	AS ROW_NUM
FROM	OFFICE_EQUIPMENT
)		TMP
WHERE		ROW_NUM > 1;

If we run the stored procedure we saw above with this statement EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001, we will get this result set as seen in this screenshot:

如果运行上面用语句EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001看到的存储过程,我们将获得此结果集,如以下屏幕快照所示:

Run the DYNAMIC_ROW_NUMBER_PARTITIONS stored procedure with a T-SQL EXEC statement, and see the result set.

We can exchange lines 26 and 29, as seen here:

我们可以交换26和29行,如下所示:

CREATE PROCEDURE [dbo].[DYNAMIC_ROW_NUMBER_PARTITIONS]
 
@PARTITION_COLUMN_LIST int
    
AS
    
/*

To use:

使用方法:

EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 000001
            EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001
            EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001011
            EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 010001
 */
    
SELECT	OFFICE_EQUIPMENT_ID, OFFICE_EQUIPMENT_NAME, OFFICE_EQUIPMENT_DESCRIPTION,
            PURCHASE_PRICE, PURCHASE_QUANTITY, PURCHASE_DATE, ROW_NUM
FROM
(
SELECT	OFFICE_EQUIPMENT_ID, OFFICE_EQUIPMENT_NAME, OFFICE_EQUIPMENT_DESCRIPTION,
            PURCHASE_PRICE, PURCHASE_QUANTITY, PURCHASE_DATE,
            ROW_NUMBER() OVER	(
                                    PARTITION BY	IIF(RIGHT((@PARTITION_COLUMN_LIST / 100000), 1) = 1, OFFICE_EQUIPMENT_ID, NULL),
                                                    IIF(RIGHT((@PARTITION_COLUMN_LIST / 10000), 1) = 1, OFFICE_EQUIPMENT_NAME, NULL) ,
            /* LINE 26: SWITCHED LINE */		IIF(RIGHT(@PARTITION_COLUMN_LIST, 1) = 1, PURCHASE_DATE, NULL),
                                                    IIF(RIGHT((@PARTITION_COLUMN_LIST / 100), 1) = 1, PURCHASE_PRICE, NULL),
                                                    IIF(RIGHT((@PARTITION_COLUMN_LIST / 10), 1) = 1, PURCHASE_QUANTITY, NULL) ,
            /* LINE 29: SWITCHED LINE */		IIF(RIGHT((@PARTITION_COLUMN_LIST / 1000), 1) = 1, OFFICE_EQUIPMENT_DESCRIPTION, NULL)
 
                                    ORDER BY		OFFICE_EQUIPMENT_DESCRIPTION	--	THIS ORDER BY DOES NOT SORT THE OUTER RESULT SET
                                )	AS ROW_NUM
FROM	OFFICE_EQUIPMENT
)		TMP
WHERE		ROW_NUM > 1;

When we run the stored procedure with the SQL Server statement used above EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001, we’ll get this result set:

当我们使用EXEC DYNAMIC_ROW_NUMBER_PARTITIONS 001001以上使用SQL Server语句运行存储过程时,我们将获得以下结果集:

Result set

Because of the ways, the factors listed above can interact, combined with a potential ORDER BY clause at the outer query, we need to test queries and stored procedures built with this technique, to verify that they operate as expected.

由于这些方式,上面列出的因素可以交互,并在外部查询上与潜在的ORDER BY子句结合,我们需要测试使用此技术构建的查询和存储过程,以验证它们是否按预期运行。

结论 (Conclusion)

The PARTITION BY clause gives us a powerful tool, right out of the box, to define duplicate rows in a database table. This article showed how to combine the PARTITION BY clause with other available SQL Server tools, to define duplicate rows in a dynamic, granular way.

PARTITION BY子句为我们提供了一个功能强大的工具,可以立即使用它来定义数据库表中的重复行。 本文介绍了如何将PARTITION BY子句与其他可用SQL Server工具结合在一起,以动态,精细的方式定义重复的行。

翻译自: https://www.sqlshack.com/lever-t-sql-to-dynamically-define-duplicate-sql-server-database-table-rows/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值