如何在SQL Server计算列上创建索引

最新推荐文章于 2023-03-06 23:46:54 发布

culuo4781

最新推荐文章于 2023-03-06 23:46:54 发布

阅读量920

点赞数

文章标签：数据库 mysql java python sql

原文链接：https://www.sqlshack.com/how-to-create-indexes-on-sql-server-computed-columns/

版权

A SQL Server Computed Column is a virtual column that is not stored physically on the table, unless it is previously specified as PERSISTED. A computed Column value is calculated using a specific expression that can be constant, function, data from other columns on the same table or a combination of these types connected together by one or more operators.

SQL Server计算列是一个虚拟列，除非预先指定为PERSISTED ，否则它不会物理存储在表上。列的计算值是使用特定的表达式计算的，该表达式可以是常量，函数，来自同一表上其他列的数据或由一个或多个运算符连接在一起的这些类型的组合。

The Computed Column’s value will be calculated each time the column is referenced in a query. If the Computed Column is specified as PERSISTED while creating a table using a CREATE TABLE statement or after the table creation using the ALTER TABLE statements, the column’s value will be physically stored in the table, and will be updated automatically if any column involved in the Computed Column expression is changed.

每次在查询中引用该列时，都会计算“计算列”的值。如果在使用CREATE TABLE语句创建表时或使用ALTER TABLE语句创建表时将Computed Column指定为PERSISTED，则该列的值将物理存储在表中，并且如果涉及到的任何列将自动更新。计算列的表达式已更改。

The PERSISTED property is important for the Computed Columns as many additional features depend on that property. Setting the Computed Column as PERSISTED will reduce the expression value calculation overhead at runtime but it will consume more space on the underlying disk. The Computed Column should be deterministic in order to be set to PERSISTED. For example, you can’t use the GETDATE() value in the Computed Column expression if you want to set the column to PERSISTED, as the result will be different each time you retrieve it and will not be deterministic.

PERSISTED属性对于“计算列”很重要，因为许多其他功能都依赖于该属性。将“计算列”设置为“ PERSISTED”将减少运行时的表达式值计算开销，但会消耗基础磁盘上更多的空间。计算列应该确定，以便设置为“持久”。例如，如果要将列设置为PERSISTED，则不能在“计算列”表达式中使用GETDATE（）值，因为每次检索结果都会不同，并且不确定。

The Computed Column can be referenced by a SELECT statement columns list, WHERE or ORDER BY clauses, but it can’t be used in an INSERT or UPDATE statements as the value will be calculated automatically. A Computed Column cannot be used in a DEFAULT, FOREIGN KEY or NOT NULL constraints. If the expression that is used to define the Computed Column value is deterministic, the Computed Column can be involved in a PRIMARY KEY or UNIQUE constraint. The Nullability of the Computed Column is defined automatically by the SQL Server Database Engine itself, so that, you will not specify that column property in the column definition.

可以通过SELECT语句的列列表，WHERE或ORDER BY子句来引用Computed Column，但不能在INSERT或UPDATE语句中使用它，因为该值将自动计算。计算列不能在DEFAULT，FOREIGN KEY或NOT NULL约束中使用。如果用于定义“计算列”值的表达式是确定性的，则“计算列”可以包含在PRIMARY KEY或UNIQUE约束中。 SQL Server数据库引擎本身自动定义了计算列的可空性，因此，您将不会在列定义中指定该列属性。

Let us create two new tables in the SQLShackDemo testing database with a computed column in each table that calculate the age of the employee when he is employed to serve a certain survey required by the HR team. The first table CompanyEmployees will have a virtual Computed Column which value will be calculated at runtime, and the CompanyEmployees_Persisted table that have a PERSISTED Computed Column that will save the expression value physically in the table:

让我们在SQLShackDemo测试数据库中创建两个新表，每个表中都有一个计算列，用于计算员工受雇为人力资源团队进行某项调查时的年龄。第一个表CompanyEmployees将具有一个虚拟的计算列，该值将在运行时计算，而CompanyEmployees_Persisted表具有一个PERSISTED计算列，该列将在物理上将表达式值保存在表中：

 
USE [SQLShackDemo]
GO
CREATE TABLE [dbo].[CompanyEmployees](
	[EmpID] [int] IDENTITY (1,1) NOT NULL PRIMARY KEY,
	[EmpName] [varchar](50) NOT NULL,
	[EmpAddress] [varchar](50) NOT NULL,
	[EmpBirthDate] [datetime] NULL,
	[EmpEmploymentDate] [datetime] NULL,
	[EmpEmploymentAgeInYears] AS (DATEDIFF(YY,[EmpBirthDate],[EmpEmploymentDate])) 
) ON [PRIMARY]
 
GO
CREATE TABLE [dbo].[CompanyEmployees_Persisted](
	[EmpID] [int] IDENTITY (1,1) NOT NULL PRIMARY KEY,
	[EmpName] [varchar](50) NOT NULL,
	[EmpAddress] [varchar](50) NOT NULL,
	[EmpBirthDate] [datetime] NULL,
	[EmpEmploymentDate] [datetime] NULL,
	[EmpEmploymentAgeInYears] AS (DATEDIFF(YY,[EmpBirthDate],[EmpEmploymentDate])) PERSISTED
) ON [PRIMARY]
GO

The expression of the Computed Column can be also set and modified using the SQL Server Management Studio, by opening the table design window and checking the Computed Column Specification column property as follows:

还可以使用SQL Server Management Studio通过打开表设计窗口并按如下所示检查“计算列规范”列属性来设置和修改“计算列”的表达式：

After that, we will fill the first table CompanyEmployees with one hundred thousand records using ApexSQL Generate, taking into consideration that this tool will automatically know that the EmpEmploymentAgeInYears is a Computed Column and exclude it from the insertion process:

之后，我们将使用ApexSQL Generate填充第一个表CompanyEmployees十万条记录，并考虑到此工具将自动知道EmpEmploymentAgeInYears是Compute Column并将其从插入过程中排除：

We will fill the second table CompanyEmployees_Persisted with the same data from the first table in order to have a fair comparison during our test using the below T-SQL INSERT statement:

我们将使用来自第一个表的相同数据填充第二个表CompanyEmployees_Persisted，以便在测试期间使用下面的T-SQL INSERT语句进行公平的比较：

 
USE [SQLShackDemo]
GO
 
INSERT INTO [dbo].[CompanyEmployees_Persisted]
           ([EmpName]
           ,[EmpAddress]
           ,[EmpBirthDate]
           ,[EmpEmploymentDate])
     SELECT [EmpName]
           ,[EmpAddress]
           ,[EmpBirthDate]
           ,[EmpEmploymentDate]
     FROM [CompanyEmployees]
GO

As we mentioned previously, the Computed Column in the CompanyEmployees table is a virtual column that will be calculated at runtime, while the Computed Column in the CompanyEmployees_Persisted table is PERSISTED, with its value saved in the table itself, consuming more space from the table. If we try to query the sp_spaceused for both CompanyEmployees and CompanyEmployees_Persisted tables:

如前所述，CompanyEmployees表中的Computed Column是一个虚拟列，将在运行时计算，而CompanyEmployees_Persisted表中的Computed Column是PERSISTED，其值保存在表本身中，从而占用了表中的更多空间。如果我们尝试查询CompanyEmployees和CompanyEmployees_Persisted表的sp_spaceused：

 
sp_spaceused 'CompanyEmployees'
GO
sp_spaceused 'CompanyEmployees_Persisted'
GO

The result will show us that, the space consumed by storing the data in the CompanyEmployees_Persisted table (7056 KB) is larger than the space consumed by storing the data in the CompanyEmployees table (6656), with extra 400KB used to store the PERSISTED Computed Column values:

结果将向我们显示，将数据存储在CompanyEmployees_Persisted表中的空间消耗（7056 KB）大于通过将数据存储在CompanyEmployees表中的空间消耗（6656），另外还有400KB的空间用于存储PERSISTED计算列值：

Looking at the other side of the performance equation, calculating the values of the virtual Computed Column at the runtime will have an extra performance overhead. If we try to run the below SELECT statements on both tables:

从性能方程式的另一端来看，在运行时计算虚拟计算列的值将产生额外的性能开销。如果我们尝试在两个表上运行以下SELECT语句：

 
SELECT [EmpID], [Empname],[EmpEmploymentAgeInYears] 
FROM [dbo].[CompanyEmployees]
WHERE [EmpEmploymentAgeInYears]>=50
GO
SELECT [EmpID], [Empname],[EmpEmploymentAgeInYears] 
FROM [dbo].[CompanyEmployees_Persisted]
WHERE [EmpEmploymentAgeInYears]>=50

The execution plans comparison using the APEXSQL PLAN application will show us that, both SELECT statements will perform a Clustered Index Scan on the related tables with similar costs, with extra step performed on the table with the virtual Computed Column to calculate the Computed Column values as below:

使用APEXSQL PLAN应用程序进行的执行计划比较将向我们显示，两个SELECT语句都将以相似的成本对相关表执行聚簇索引扫描，并在具有虚拟计算列的表上执行额外的步骤以计算计算列值，如下所示：下面：

Also as expected, querying the CompanyEmployees table with the virtual Computed Columns will take more time (326 ms) than the querying the CompanyEmployees_Persisted table that took (303ms) with extra 23ms calculating the Computed Column values in the CompanyEmployees table:

同样符合预期，使用虚拟Compute Columns查询CompanyEmployees表将花费更多的时间（326 ms）比查询CompanyEmployees_Persisted表（303ms）花费更多的23ms来计算CompanyEmployees表中的Compute Column值：

To enhance the performance of the queries that reference to the Computed Colum, it is recommended to create an index on that Computed Column. There are number of requirements that should be met before adding the index.

为了提高引用“计算列”的查询的性能，建议在该“计算列”上创建索引。添加索引之前，应满足许多要求。

The first requirement is that; the Computed Column expression should be deterministic. This requires that all functions that are referenced by the expression are deterministic and precise, all columns that are referenced in the Computed Column expression must be from the same table that contains the Computed Column and the expression does not pull data from more than one row such as aggregating multiple rows.

第一个要求是：计算列表达式应该是确定性的 。这要求该表达式引用的所有函数都是确定性和精确的，在“计算列”表达式中引用的所有列必须来自包含“计算列”的同一表，并且该表达式不能从多于一行的数据中提取数据，例如汇总多行。

The second requirement for creating an index on a Computed Column is that the Computed Column expression should be precise. This requires that the expression data type is not Float or Real and the data type of the columns involved in the Computed Column expression is not Float or Real.

在计算列上创建索引的第二个要求是计算列表达式应精确。这要求表达式数据类型不是Float或Real，并且Computed Column表达式中涉及的列的数据类型不是Float或Real。

The third requirement is that the expression of the Computed Column cannot evaluate to the image, text or ntext data types. The Computed Column that is derived from these three datatypes can be involved in the non-clustered index as non-key columns in the include part of the index. Take into consideration that, if the Computed Column is PERSISTED, then you are able to create an index on the Computed Column if it is deterministic but not precise. In addition, if the Computed Column is PERSISTED and references a CLR function, you can create an index on that column without checking its deterministic, as the SQL Server Database Engine is not able to check if that function is truly deterministic and cannot prove with accuracy if a function that evaluates the computed column expressions is both deterministic and precise.

第三个要求是“计算列”的表达式不能求值为image，text或ntext 数据类型 。从这三种数据类型派生的计算列可以作为索引的包含部分中的非关键列包含在非聚集索引中。考虑到，如果“计算列”是“持久”的，则可以在“计算列”上创建索引（如果它是确定性的但不精确的）。另外，如果“计算列”是PERSISTED并引用了CLR函数，则您可以在该列上创建索引而无需检查其确定性，因为SQL Server数据库引擎无法检查该函数是否确实是确定性的并且无法准确证明如果计算计算的列表达式的函数既确定性又精确。

Let us check the IsComputed, IsDeterministic, IsPrecise and IsIndexable Computed Column properties from both tables created previously as follows:

让我们从先前创建的两个表中检查IsComputed，IsDeterministic，IsPrecise和IsIndexable Computed Column属性，如下所示：

 
SELECT 
     'CompanyEmployees_Persisted' AS TableName,
     COLUMNPROPERTY( OBJECT_ID('DBO.CompanyEmployees_Persisted'), 'EmpEmploymentAgeInYears','IsComputed') AS IsComputed,
     COLUMNPROPERTY( OBJECT_ID('DBO.CompanyEmployees_Persisted'), 'EmpEmploymentAgeInYears','IsDeterministic') AS IsDeterministic,
	 COLUMNPROPERTY( OBJECT_ID('DBO.CompanyEmployees_Persisted'), 'EmpEmploymentAgeInYears','IsPrecise') AS IsPrecise,
	 COLUMNPROPERTY( OBJECT_ID('DBO.CompanyEmployees_Persisted'), 'EmpEmploymentAgeInYears','IsIndexable') AS IsIndexable
GO
SELECT
     'CompanyEmployees' AS TableName,
     COLUMNPROPERTY( OBJECT_ID('DBO.CompanyEmployees'), 'EmpEmploymentAgeInYears','IsComputed') AS IsComputed,
	 COLUMNPROPERTY( OBJECT_ID('DBO.CompanyEmployees'), 'EmpEmploymentAgeInYears','IsDeterministic') AS IsDeterministic,
	 COLUMNPROPERTY( OBJECT_ID('DBO.CompanyEmployees'), 'EmpEmploymentAgeInYears','IsPrecise') AS IsPrecise,
	 COLUMNPROPERTY( OBJECT_ID('DBO.CompanyEmployees'), 'EmpEmploymentAgeInYears','IsIndexable') AS IsIndexable

The results from both tables show us that the EmpEmploymentAgeInYears column from these tables is Computed Column, and that the expression used to calculate its values is deterministic and precise, so that the result is that this column is indexable:

这两个表的结果表明，这些表中的EmpEmploymentAgeInYears列是Computed Column，并且用于计算其值的表达式是确定性和精确的，因此结果是此列可索引：

Now we will create a non-clustered index on the EmpEmploymentAgeInYears Computed Column in addition to the EmpName column as below:

现在，除了EmpName列外，我们还将在EmpEmploymentAgeInYears计算列上创建非聚集索引，如下所示：

 
CREATE NONCLUSTERED INDEX IX_CompanyEmployees_BirthMonth
ON dbo.CompanyEmployees (EmpEmploymentAgeInYears,Empname)
GO

Then trying to SELECT from that table without using the newly created non-clustered index in the first statement and using the newly created non-clustered index in the second one:

然后尝试从该表中进行SELECT，而不在第一条语句中使用新创建的非聚集索引，而在第二条语句中使用新创建的非聚集索引：

 
SELECT [EmpID], [Empname],[EmpEmploymentAgeInYears] 
FROM [dbo].[CompanyEmployees] WITH (INDEX(PK__CompanyEmp))
WHERE [EmpEmploymentAgeInYears]>=50
GO
SELECT [EmpID], [Empname],[EmpEmploymentAgeInYears] 
FROM [dbo].[CompanyEmployees]
WHERE [EmpEmploymentAgeInYears]>=50
GO

Wow. The execution plans comparison using the APEXSQL PLAN application show us a big enhancement when indexing the Computed Column, where the performance enhanced more than 5 times when querying the indexed Computed Column as follows:

哇。使用APEXSQL PLAN应用程序执行计划的比较向我们显示了对计算列进行索引时的重大改进，其中在查询索引的计算列时，性能提高了5倍以上，如下所示：

The enhancement is also made clear by checking the Time Statistics difference between the two executions. Where executing the query using the non-clustered index is better than executing it without that index by 1.2 times, as you can see from the following statistics:

通过检查两个执行之间的时间统计差异，也可以使增强功能更加清晰。从以下统计数据可以看出，使用非聚集索引执行查询比不使用非索引执行查询要好1.2倍。

结论 (Conclusion)

The SQL Server Computed Column is a special type of column that can be stored in a table if specified as PERSISTED or calculated at runtime when the column is used in the query. As with any other types of columns, you can index that column to enhance the search process. There are number of requirements that the Computed Column should meet in order to be indexed. During the demo of this article, we have checked these prerequisites before creating the index. After that, we have created an index on the Computed Column and saw clearly the performance enhancement when using that index by comparing execution plans before and after the index in addition to the enhancement in the execution time.

“ SQL Server计算列”是一种特殊类型的列，如果将其指定为PERSISTED或在查询中使用该列时在运行时计算，则可以存储在表中。与任何其他类型的列一样，您可以为该列编制索引以增强搜索过程。为了被索引，“计算列”应满足许多要求。在本文的演示过程中，我们在创建索引之前已经检查了这些先决条件。之后，我们在“计算列”上创建了一个索引，并通过比较索引前后的执行计划以及执行时间的增强，清楚地看到了使用该索引时的性能增强。

翻译自: https://www.sqlshack.com/how-to-create-indexes-on-sql-server-computed-columns/

culuo4781

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
如何在SQL Server计算列上创建索引

A SQL Server Computed Column is a virtual column that is not stored physically on the table, unless it is previously specified as PERSISTED. A computed Column value is calculated using a specific...
复制链接

扫一扫