设计有效SQL Server非聚集索引

In the previous articles of this series (see below for the full index of articles), we went through the internal structure of SQL Server tables and indexes, listed a number of guidelines that help in designing a proper index, discussed the operations that can be performed on SQL Server indexes and finally showed how to design and create a SQL Server Clustered index to speed up data retrieval operations. In this article, we will see how to design an effective Non-clustered index that will improve the performance of frequently used queries that are not covered with a Clustered index and, in doing so, enhance the overall system performance.

在本系列的前几篇文章中(有关文章的完整索引,请参见下文),我们介绍了SQL Server表和索引的内部结构,列出了许多有助于设计适当索引的准则,并讨论了可以进行的操作。在SQL Server索引上执行,最后展示了如何设计和创建SQL Server群集索引以加快数据检索操作。 在本文中,我们将看到如何设计有效的非聚集索引,该索引将提高聚集索引未涵盖的常用查询的性能,并以此提高整体系统性能。

非聚集索引结构概述 (Non-clustered index structure overview)

A Non-clustered index is built using the same 8K-page B-tree structure that is used to build a Clustered index, except that the data and the Non-clustered index are stored separately. A Non-clustered index is different from a Clustered index in that, the underlying table rows will not be stored and sorted based on the Non-clustered key, and the leaf level nodes of the Non-clustered index are made of index pages instead of data pages. The index pages of the Non-clustered index contain Non-clustered index key values with pointers to the storage location of these rows in the underlying heap table or the Clustered index.

非聚集索引使用与用于建立聚集索引相同的8K页B树结构构建,除了数据和非聚集索引是分开存储的。 非聚集索引与聚集索引的不同之处在于,将不会基于非聚集键存储基础表行并对其进行排序,并且非聚集索引的叶级节点由索引页而不是索引页组成数据页。 非聚集索引的索引页包含非聚集索引键值,这些键值具有指向基础堆表或聚集索引中这些行的存储位置的指针。

If a Non-Clustered index is built over a heap table or view (read more about SQL Server indexed views, that have no Clustered index) the leaf level nodes of that index hold the index key values and Row ID (RID) pointers to the location of the rows in the heap table. The RID consists of the file identifier, the data page number, and the number of rows on that data page. On the other hand, if a Non-clustered index is created over a Clustered table, the leaf level nodes of that index contain Non-clustered index key values and clustering keys for the base table, that are the locations of the rows in the Clustered index data pages. If a Non-clustered index is built over a non-unique Clustered index, the leaf level nodes of the Non-clustered index will hold additional uniqueifier values of the data rows, that is added by the SQL Server Engine to ensure uniqueness of the Clustered index.

如果在堆表或视图(非SQL索引视图的更多信息,没有聚簇索引)上构建了非聚集索引,则该索引的叶级节点将保留索引键值和指向该索引的行ID(RID)指针。行在堆表中的位置。 RID由文件标识符,数据页号和该数据页上的行数组成。 另一方面,如果在聚簇表上创建了非聚簇索引,则该索引的叶级节点包含基表的非聚簇索引键值和聚簇键,它们是聚簇中行的位置索引数据页。 如果在非唯一聚集索引上构建了非聚集索引,则非聚集索引的叶级节点将保存数据行的其他唯一标识符值,该值由SQL Server引擎添加以确保簇的唯一性指数。

When submitting a query that searches for specific rows based on Non-clustered index key values, the SQL Server Query Optimizer will search for that key value in the Non-clustered index pages and use the row locator value to locate the requested row in the underlying table, then retrieve the requested records directly from the data storage location, speeding up the data retrieval process, as the Non-clustered index holds a full description for the data exact location in underlying table, based on the index key values.

提交基于非聚集索引键值搜索特定行的查询时,SQL Server查询优化器将在非聚集索引页中搜索该键值,并使用行定位符值在基础中找到请求的行表,然后直接从数据存储位置检索请求的记录,从而加快了数据检索过程,因为非聚集索引根据索引键值对基础表中的数据确切位置拥有完整的描述。

The below figure, from Microsoft Books Online, shows the structure of a Non-clustered index that is built over a Clustered or heap table as described previously. SQL Server allows us to create multiple Non-clustered indexes, up to 999 Non-clustered indexes, on each table, with index IDs values assigned to each index starting from 2 for each partition used by the index, as you can find in the sys.partitions table. Depending on the data type of the columns participating in Non-clustered index key, a SQL Server Non-clustered index will have one or more allocation units that are used to store and manage the index data. Minimally, each Non-clustered index will have the IN_ROW_DATA allocation unit to store the index data. Other special types of allocation units can be also used to store the Non-clustered index data, such as the LOB_DATA allocation unit that is used to store the large object data (LOB) and ROW_OVERFLOW_DATA allocation unit that is used to store the columns with variable length exceeds the 8,060-byte size limit of the row.

下图来自Microsoft联机丛书 ,显示了如前所述在群集表或堆表上构建的非群集索引的结构。 SQL Server允许我们在每个表上创建多个非聚集索引,最多可创建999个非聚集索引,并为索引所使用的每个分区的索引分配从2开始的索引ID值,如您在sys中所发现的那样.partitions表。 根据参与非聚集索引键的列的数据类型,SQL Server非聚集索引将具有一个或多个用于存储和管理索引数据的分配单元。 最少,每个非聚集索引都将具有IN_ROW_DATA分配单元来存储索引数据。 其他特殊类型的分配单元也可以用于存储非聚集索引数据,例如用于存储大对象数据(LOB)的LOB_DATA分配单元和用于存储具有可变列的列的ROW_OVERFLOW_DATA分配单元长度超过了该行的8,060字节大小限制。

非聚集索引设计注意事项 (Non-clustered index design considerations)

The main goal of creating a Non-clustered index is to improve query performance by speeding up the data retrieval process. Although SQL Server allows us to create multiple Non-clustered indexes, up to 999 Non-clustered on each table that can cover our queries, any index added to the table will negatively impact data modification performance on that table. This is due to the fact that, when you modify a key column in the underlying table, the Non-clustered indexes should be adjusted appropriately as well.

创建非聚集索引的主要目标是通过加快数据检索过程来提高查询性能。 尽管SQL Server允许我们创建多个非聚集索引,但每个表上最多可以包含999个非聚集索引,这些索引可以覆盖我们的查询,但是添加到表中的任何索引都会对该表的数据修改性能产生负面影响。 这是由于以下事实:在修改基础表中的键列时,也应适当调整非聚集索引。

When designing a Non-clustered index, you should consider the type of the workload performed on your database or table by compromising between the benefits taken from creating a new index and the data modification overhead that will be caused by this index creation. It is recommended to create a minumu of narrow indexes, with a minimum number of columns participating in the index key, on the heavily updated table. A table that has a large number of rows with a low data modification requirement can heavily benefit from more Non-clustered indexes with composite index keys, that contain more than one column in the index key, that cover all columns in the query to improve the data retrieval performance.

在设计非聚集索引时,应通过在创建新索引所带来的好处与该索引创建将导致的数据修改开销之间折中考虑,来考虑在数据库或表上执行的工作负载的类型。 建议在高度更新的表上创建最少的窄索引,并在索引键中包含最少的列。 具有大量行且数据修改要求低的表可以从具有复合索引键的更多非聚簇索引中受益匪浅,这些索引索引的索引键中包含多个列,这些索引覆盖了查询中的所有列,从而改善了数据检索性能。

When the index contains all columns required by the query, the SQL Server Query Optimizer will retrieve all column values from the index itself, without the need to perform lookup operations to retrieve the rest of columns in the underlying table or the Clustered index, reducing the costly disk I/O operations. In addition, if the Non-clustered index is built over a Clustered table, the columns that participate in the Clustered index will be appended automatically to the end of each Non-clustered index on that Clustered table, without the need to include these columns to the Non-clustered index key or non-key columns to cover the queries.

当索引包含查询所需的所有列时,SQL Server查询优化器将从索引本身检索所有列值,而无需执行查找操作来检索基础表或群集索引中的其余列,从而减少了昂贵的磁盘I / O操作。 另外,如果非聚簇索引建立在聚簇表之上,则参与聚簇索引的列将自动附加到该聚簇表上每个非聚簇索引的末尾,而无需将这些列包括在内。非聚集索引键或非键列来覆盖查询。

Rather than creating a Non-clustered index with a wide key, large columns that are used to cover the query can be included to the Non-clustered index as non-key columns, up to 1023 non-key columns, using the INCLUDE clause of the CREATE INDEX T-SQL statement, that is introduced in SQL Server 2005 version, with a minimum of one key column. The INCLUDE feature extends the functionality of the Non-clustered index, by allowing us to cover more queries by adding the columns as non-key columns to be stored and sorted only in the leaf level of the index, without considering that columns values in the root and intermediate levels of the Non-clustered index. In this case, the SQL Server Query Optimizer will locate all required columns from that index, without the need for any extra lookups. Using the included columns can help to avoid exceeding the Non-clustered size limit of 900 bytes and 16 columns in the index key, as the SQL Server Database Engine will not consider the columns in the Non-clustered index non-key when calculating the size and number of columns of the index key. In addition, SQL Server allows us to include the columns with data types that are not allowed in the index key, such as VARCHAR(MAX), NVARCHAR(MAX), text, ntext and image, as Non-clustered index non-key columns. For more information about including columns to the Non-clustered index key, review SQL Server non-clustered indexes with included columns.

除了使用宽键创建非聚集索引外,使用以下内容的INCLUDE子句,用于覆盖查询的大列可以作为非关键列包含在非聚集索引中,最多1023个非关键列。 SQL Server 2005版本中引入的CREATE INDEX T-SQL语句,至少包含一个键列。 INCLUDE功能扩展了非聚集索引的功能,它允许我们通过将列添加为仅在索引叶级存储和排序的非关键列来覆盖更多查询,而无需考虑索引中的列值。非聚集索引的根和中间级别。 在这种情况下,SQL Server查询优化器将找到该索引中所有必需的列,而无需任何额外的查找。 使用包含的列有助于避免超出900字节的非聚集大小限制和索引键中的16列 ,因为SQL Server数据库引擎在计算大小时将不考虑非聚集索引非键中的列和索引键的列数。 此外,SQL Server允许我们将具有索引键中不允许的数据类型的列(如VARCHAR(MAX),NVARCHAR(MAX),text,ntext和image)包括为非聚集索引非键列。 有关将列包括到非聚集索引键的更多信息,请查看带有包含的列SQL Server非聚集索引

With all these capabilities provided by SQL Server, it is highly recommended to avoid adding too many key or non-key columns to the Non-clustered index that are not required by the queries. This is due to the large disk space that is required to store that index and the large number of pages required to store the index data, because adding too many columns to the index will result with fewer number of rows that can be fit in each data page, increasing the I/O overhead and reducing the caching efficiency. You can also imagine the data modification overhead resulted from such large indexes.

利用SQL Server提供的所有这些功能,强烈建议避免向查询的非聚集索引中添加过多的键或非键列。 这是由于存储该索引所需的磁盘空间较大,以及存储索引数据所需的页面数较大,因为向索引中添加过多列将导致可容纳在每个数据中的行数减少页面,增加了I / O开销并降低了缓存效率。 您也可以想象如此大的索引导致数据修改开销。

The candidate columns for the Non-clustered index key are the ones that are frequently involved in the GROUP BY clause or in the JOIN or WHERE conditions, that will cover the submitted queries and return exact match values, rather than returning a large set of data. The semi-unique columns that have a large number of distinct values are good candidates also as Non-clustered index key columns. For the column that has few numbers of distinct values, such as the Gender column, you can take benefits from creating a filtered index, as we will see in the next article.

非聚集索引键的候选列是GROUP BY子句或JOINWHERE条件中经常涉及的那些列,这些列将覆盖提交的查询并返回精确匹配值,而不是返回大量数据。 有大量 不同半唯一列都不错的候选人也作为非聚集索引键列。 对于具有少量不同值的列(例如“性别”列),您可以从创建过滤索引中受益,这将在​​下一篇文章中看到。

非聚集索引实现 (Non-clustered index implementation)

At this point, we are familiar with Non-clustered index structure and the guidelines that should be followed when designing a Non-clustered index. Now we will learn how to implement a Non-clustered index.

在这一点上,我们熟悉非聚集索引的结构以及设计非聚集索引时应遵循的准则。 现在,我们将学习如何实现非聚集索引。

When you create a UNIQUE constraint, a unique Non-clustered index will be created automatically to enforce that constraint. Non-clustered indexes can be created independently of the constraints using the SQL Server Management Studio New Index dialog box or using the CREATE INDEX T-SQL command. To be able to create a Non-clustered index, you should be a member of the sysadmin fixed server role or the db_ddladmin and db_owner fixed database roles.

当您创建UNIQUE约束时,将自动创建一个唯一的非聚集索引以强制执行该约束。 可以使用SQL Server Management Studio的“新建索引”对话框或使用CREATE INDEX T-SQL命令独立于约束来创建非聚集索引。 为了能够创建非聚集索引,您应该是sysadmin固定服务器角色或db_ddladmin和db_owner固定数据库角色的成员。

Let us create a new heap table to be used in our demo, using the CREATE TABLE T-SQL statement below:

让我们使用下面的CREATE TABLE T-SQL语句创建一个新的堆表以在演示中使用:

USE SQLShackDemo
GO
CREATE TABLE NonClusteredIndexDemo
(
  ID INT IDENTITY (1,1) NOT NULL,
  StudentName NVARCHAR(50)  NULL,
  STDBirthDate DATETIME  NULL,
  STDPhone VARCHAR(20) NULL,
  STDAddress NVARCHAR(MAX) NULL
 )

Non-clustered indexes can be created using SSMS by expanding the Tables folder under your database. To accomplish this, expand the table on which you plan to create the Non-clustered index on, then right-click on the Indexes node under your table and choose to create the Non-Clustered Index type from the New Index option, as shown below:

可以使用SSMS通过展开数据库下的Tables文件夹来创建非聚集索引。 为此,请展开计划在其上创建非聚集索引的表,然后右键单击表下方的“ 索引”节点,然后从“ 新建索引”选项中选择创建“ 非聚集索引”类型,如下所示:

In the opened New Index window, you will see that both the name of the table, on which the index will be created, and the “Non-Clustered” type of the index is filled automatically, with no option to change it. What is required from your side is to provide a suggested name for that index, following a specific naming convention, if the index key values will be unique or not, the column or list of columns that will participate in that index key, and finally the list of columns that will be included to the Non-clustered index as non-key columns, by clicking on the Add button, as shown below:

在打开的“新建索引”窗口中,您将看到将自动创建要在其上创建索引的表的名称以及该索引的“非聚集”类型,没有选项可以对其进行更改。 从您的角度出发,需要按照特定的命名约定为该索引提供建议的名称,如果索引键值是否唯一,则将参与该索引键的列或列的列表,最后是单击“ 添加”按钮,将这些列作为非关键列包含在非聚集索引中的列的列表,如下所示:

The Options tab of the New Index dialog box allows you also to set the different index creation options that control the index creation performance, which is described deeply in the previous article, as shown below:

“新建索引”对话框的“选项”选项卡还允许您设置控制索引创建性能的不同索引创建选项,这在上一篇文章中已进行了详细介绍,如下所示:

Another way to create the Non-clustered index using SSMS is using the Table Designer. Browse the table on which the index will be created then right-clicking on that table and choose the Design option as shown below:

使用SSMS创建非聚集索引的另一种方法是使用表设计器。 浏览将在其上创建索引的表,然后右键单击该表并选择“设计”选项,如下所示:

From the opened Table Designer window, right-click anywhere and choose the Indexes/Keys… option to open the index creation dialog box, as shown below:

在打开的“表设计器”窗口中,右键单击任何地方,然后选择“ 索引/键…”选项以打开索引创建对话框,如下所示:

On the opened window, click on the Add bottom to add a new Non-clustered index. Set the Create As Clustered option to “No”, provide a name for the index, if the key values of the Non-clustered index will be unique or not, the column or columns that will participate in the index key, with the required order, as clear below:

在打开的窗口中,单击添加底部以添加新的非聚集索引。 将“作为集群创建的选项”设置为“否”,为索引提供名称,如果非集群索引的键值是唯一的,则将按要求的顺序参与索引键的一列或多列,如下所示:

From the same window, expand the Fill Specifications in order to specify the non-key columns that will be included in the Non-clustered index and the different index creation options, as shown below:

在同一窗口中,展开“ 填充规格” ,以指定将包含在“非聚集”索引中的非关键列以及不同的索引创建选项,如下所示:

A Non-clustered index can be also created using the CREATE NONCLUSTERED INDEX command, by providing the name of the index, the name of the table on which the index will be created, the index key uniqueness and the column or list of columns that will participate in the index key and the non-key columns, optionally, in the INCLUDE clause, as in the command below:

也可以使用CREATE NONCLUSTERED INDEX命令创建非聚集索引,方法是提供索引名称,将在其上创建索引的表的名称,索引键唯一性以及将要创建的列或列的列表参与索引键和非键列(可选),包括在INCLUDE子句中,如以下命令所示:

CREATE NONCLUSTERED INDEX IX_NonClusteredIndexDemo_StudentName 
ON NonClusteredIndexDemo (StudentName)
WITH (ONLINE = ON , FILLFACTOR=90)

By default, the CREATE INDEX T-SQL statement will create a Non-clustered index if the type of the index is not specified. Consider setting the specified index creation options in the previous CREATE INDEX statement that affect the performance of the index creation process. The ONLINE option allows concurrent users access to the underlying table or the Clustered index data during the Non-clustered index creation process, where the FILLFACTOR option is used to set the percentage of free space that will be left in the leaf level nodes of the Non-clustered index during the index creation, in order to minimize the page split and fragmentation performance issues.

默认情况下,如果未指定索引类型,则CREATE INDEX T-SQL语句将创建非聚集索引。 考虑在前面的CREATE INDEX语句中设置指定的索引创建选项,这些选项会影响索引创建过程的性能。 ONLINE选项允许并发用户在非聚集索引创建过程中访问基础表或聚集索引数据,其中FILLFACTOR选项用于设置将保留在Non的叶级节点中的可用空间的百分比在索引创建过程中使用-clustered索引,以最大程度地减少页面拆分和碎片性能问题。

性能比较 (Performance comparison)

Before starting the performance comparison examples, we will fill the previously created table with 200K rows, using ApexSQL Generate, as shown below:

在开始性能比较示例之前,我们将使用A pexSQL Generate ,用200K行填充先前创建的表,如下所示:

堆表上的非聚集索引 (Non-clustered index over heap table)

The NonClusteredIndexDemo heap table is ready now, filled with 200K rows and has no index created on that table, assuming that the previous CREATE INDEX statement is not executed, as shown below:

NonClusteredIndexDemo堆表现已准备就绪,已填充200K行,并且在该表上未创建索引,假定之前的CREATE INDEX语句未执行,如下所示:

If we try to execute the SELECT statement below, after enabling the IO and TIME statistics and the actual execution plan on the query, as shown below:

如果我们尝试在查询上启用IO和TIME统计信息以及实际执行计划后执行以下SELECT语句,如下所示:

SET STATISTICS TIME ON
SET STATISTICS IO ON
 
SELECT * FROM [dbo].[NonClusteredIndexDemo] 
WHERE StudentName ='Edward' AND STDAddress LIKE '%Route%'

The TIME and IO statistics show that 2109 logical read operations are performed, 47ms from the CPU time is consumed in 204ms of time to retrieve the requested data as shown in the snapshot below:

TIME和IO统计数据表明,执行了2109个逻辑读取操作,从CPU时间中消耗了47ms的时间,即204ms的时间用于检索请求的数据,如下面的快照所示:

Checking the execution plan generated after executing the query, a Full Scan will be performed on that heap table in order to retrieve the requested data, as below:

检查执行查询后生成的执行计划,将对该堆表执行完全扫描,以检索请求的数据,如下所示:

If we create a Non-Clustered index over that heap table on the StudentName and STDAddress columns used in the WHERE clause, using the CREATE INDEX T-SQL statement below:

如果我们在WHERE子句中使用的StudentName和STDAddress列上的那个堆表上创建非聚集索引,请使用以下CREATE INDEX T-SQL语句:

CREATE NONCLUSTERED INDEX [IX_NonClusteredIndexDemo_StudentName]
ON [dbo].[NonClusteredIndexDemo] ([StudentName],[STDAddress])
WITH ( ONLINE=ON, FILLFACTOR=90)
GO

The CREATE INDEX statement will fail, as the STDAddress column, with datatype NVARCHAR(MAX), cannot be added to the Non-clustered index as a key column, as shown in the error message below:

CREATE INDEX语句将失败,因为不能将数据类型为NVARCHAR(MAX)的STDAddress列作为键列添加到非聚集索引中,如以下错误消息所示:

If we try again to create a Non-Clustered index over that heap table on the StudentName only, using the CREATE INDEX T-SQL statement below:

如果我们再次尝试仅使用StudentName在该堆表上创建一个非聚集索引,请使用以下CREATE INDEX T-SQL语句:

CREATE NONCLUSTERED INDEX IX_NonClusteredIndexDemo_StudentName 
ON NonClusteredIndexDemo (StudentName)
WITH (ONLINE = ON , FILLFACTOR=90)

And execute the same SELECT statement, that searches based on the StudentName and STDAdderss columns:

并执行相同的SELECT语句,该语句基于StudentName和STDAdderss列进行搜索:

SELECT * FROM [dbo].[NonClusteredIndexDemo] 
WHERE StudentName ='Edward' AND STDAddress LIKE '%Route%'

Checking the TIME and IO statistics, you will see that, 2109 logical read operations are performed, 47ms from the CPU time are consumed in 110ms of time to retrieve the requested data, which is somehow similar to the values generated after retrieving the data without an index, but little bit faster, as shown in the snapshot below:

检查TIME和IO统计信息,您将看到执行了2109次逻辑读取操作,从CPU时间中消耗了47ms的时间(即110ms的时间)来检索请求的数据,这在某种程度上类似于在不经过任何操作的情况下检索数据后生成的值。索引,但速度更快,如下面的快照所示:

The execution plan generated after executing the query will show that a Full table scan will be also performed on that table, without considering the created Non-clustered index, with a message indicating that there is a missing index that can enhance the performance of the query with about 72%. The script to create that index can be shown by right-clicking on the execution plan and choose the Missing Index Details option, as shown below:

执行查询后生成的执行计划将显示,将对该表执行全表扫描,而无需考虑创建的非聚集索引,并显示一条消息,指出缺少索引可以增强查询的性能。约占72%。 右键单击执行计划并选择Missing Index Details选项,可以显示用于创建该索引的脚本,如下所示:

The previous missing index message includes all the columns requested by the query in the suggested index non-key columns. Let us move on gradually in the index creation process, by including only the ID and the STDAddress columns in the index non-key columns, using the CREATE INDEX T-SQL statement below:

先前的缺少索引消息包括建议的索引非关键字列中查询所请求的所有列。 让我们通过下面的CREATE INDEX T-SQL语句在索引非键列中仅包括ID和STDAddress列,从而逐步进行索引创建过程:

USE [SQLShackDemo]
GO
CREATE NONCLUSTERED INDEX [IX_NonClusteredIndexDemo_StudentName]
ON [dbo].[NonClusteredIndexDemo] ([StudentName])
INCLUDE ([ID],[STDAddress])
WITH (DROP_EXISTING=ON, ONLINE=ON, FILLFACTOR=90)
GO

Then execute the same SELECT statement below:

然后执行以下相同的SELECT语句:

SELECT * FROM [dbo].[NonClusteredIndexDemo] 
WHERE StudentName ='Edward' AND STDAddress LIKE '%Route%'

You will see from the TIME and IO statistics that, only 37 logical read operations are performed, 15ms from the CPU time is consumed in 76ms of time to retrieve the requested data, which is clearly better than the previous result generated by fully scanning the underlying table, as shown in the snapshot below:

您将从TIME和IO统计信息中看到,仅执行了37次逻辑读取操作,从CPU时间中花费了15ms的时间来获取所请求的数据需要76ms的时间,这明显好于通过全面扫描底层数据而产生的先前结果表,如下面的快照所示:

The execution plan, generated after executing the query, also shows that an Index Seek operation is performed to retrieve the data directly from the Non-clustered index, without the need to scan the underlying table. The only issue that can be derived from the plan is that RID Lookup heavy and costly operation is performed to retrieve the rest of columns that are not available in the index from the heap table based on the ID of each row, consuming most of the planned weight, as shown below:

执行查询后生成的执行计划还表明,执行了索引查找操作可以直接从非聚集索引中检索数据,而无需扫描基础表。 可以从计划中得出的唯一问题是,执行RID查找繁琐而昂贵的操作是根据每行的ID从堆表中检索索引中不可用的其余列,从而消耗了大部分计划重量,如下图:

If we create the covering index that is suggested by the SQL Server and include all the query columns, using the CREATE INDEX … WITH DROP_EXISTING option to drop the old one:

如果我们创建SQL Server建议的覆盖索引并包括所有查询列,请使用CREATE INDEX…WITH DROP_EXISTING选项删除旧的索引:

USE [SQLShackDemo]
GO
CREATE NONCLUSTERED INDEX [IX_NonClusteredIndexDemo_StudentName]
ON [dbo].[NonClusteredIndexDemo] ([StudentName])
INCLUDE ([ID],[STDBirthDate],[STDPhone],[STDAddress])
WITH (DROP_EXISTING=ON, ONLINE=ON, FILLFACTOR=90)
GO

And execute the same SELECT statement:

并执行相同的SELECT语句:

SELECT * FROM [dbo].[NonClusteredIndexDemo] 
WHERE StudentName ='Edward' AND STDAddress LIKE '%Route%'

The TIME and IO statistics will show that only 20 logical read operations are performed, 0ms from the CPU time is consumed in 65ms of time to retrieve the requested data, which is clearly better than using a Non-clustered index that does not cover all requested columns, as shown in the snapshot below:

TIME和IO统计信息将显示仅执行20个逻辑读取操作,从CPU时间中消耗0ms的时间需要65ms来检索请求的数据,这显然比使用不覆盖所有请求的非聚集索引要好列,如下面的快照所示:

And the execution plan generated after executing the query shows that an Index Seek operation will be performed to retrieve the data from the Non-clustered index directly, without the need to visit the underlying heap table, as shown below

执行查询后生成的执行计划表明,将执行索引查找操作以直接从非聚集索引中检索数据,而无需访问基础堆表,如下所示

聚簇表上的非聚簇索引 (Non-clustered index over clustered table)

In the previous scenario, we discussed building a Non-clustered index over a heap table. Now we will see how the query will behave when creating Non-clustered index over a Clustered table. Let us first drop the existing Non-clustered index before creating a Clustered index, to skip the Non-clustered indexes rebuild during the Clustered index creation process.

在前面的场景中,我们讨论了在堆表上构建非聚集索引。 现在,我们将看到在集群表上创建非集群索引时查询的行为。 让我们首先在创建聚簇索引之前删除现有的非聚簇索引,以跳过在聚簇索引创建过程中重建非聚簇索引。

DROP INDEX [IX_NonClusteredIndexDemo_StudentName] ON [dbo].[NonClusteredIndexDemo]

Now the table has no index. We will create a new Clustered index on the ID column to covert that table to a Clustered table that is sorted based on the ID column, using the CREATE INDEX statement below:

现在该表没有索引。 我们将使用下面的CREATE INDEX语句在ID列上创建一个新的聚簇索引,以将该表转换为基于ID列排序的聚簇表:

CREATE UNIQUE CLUSTERED INDEX IX_NonClusteredIndexDemo_ID ON NonClusteredIndexDemo(ID)
WITH (ONLINE=ON, FILLFACTOR=90)

If we execute the same SELECT statement that searches based on the StudentName and STDAddress columns values:

如果我们执行相同的SELECT语句,该语句基于StudentName和STDAddress列的值进行搜索:

SELECT * FROM [dbo].[NonClusteredIndexDemo] 
WHERE StudentName ='Edward' AND STDAddress LIKE '%Route%'

You will see from the Time and IO statistics that, the query will run faster than reading from the heap table, as the data pages are linked and sorted now. Where 2330 logical read operations are performed, 47ms from the CPU time is consumed in 106ms of time to retrieve the requested data, as shown below:

从时间和IO统计信息中可以看到,查询现在的运行速度比从堆表中读取数据要快,因为现在已链接和排序了数据页。 在执行2330个逻辑读取操作的情况下,从CPU时间开始的47ms消耗了106ms的时间来检索请求的数据,如下所示:

The execution plan generated after executing the query shows that the SQL Server scans the Clustered index, which is a sorted copy of the table, to retrieve the data. You can see that SQL Server scans the whole table records, 200K rows, to get the requested data, but faster than the table scan as it is sorted, as shown below:

执行查询后生成的执行计划表明,SQL Server扫描聚簇索引(该聚簇索引是表的排序副本)以检索数据。 您可以看到SQL Server扫描整个表记录(200K行)以获取请求的数据,但是比对表进行排序时要快,如下所示:

If we create a Non-clustered index on the StudentName column that partially covers the ID and STDAddress columns of the query, using the CREATE INDEX T-SQL statement below:

如果我们在StudentName列上创建一个非聚集索引,该索引部分覆盖了查询的ID和STDAddress列,请使用以下CREATE INDEX T-SQL语句:

USE [SQLShackDemo]
GO
CREATE NONCLUSTERED INDEX [IX_NonClusteredIndexDemo_StudentName]
ON [dbo].[NonClusteredIndexDemo] ([StudentName])
INCLUDE ([ID],[STDAddress])
WITH (ONLINE=ON, FILLFACTOR=90)
GO

Then execute the previous SELECT statement below:

然后执行下面的上一条SELECT语句:

SELECT * FROM [dbo].[NonClusteredIndexDemo] 
WHERE StudentName ='Edward' AND STDAddress LIKE '%Route%'

The Time and IO statistics will show that only 84 logical read operations are performed, 15ms from the CPU time is consumed in 68ms of time to retrieve the requested data, which is clearly faster than using the Clustered index only, as shown below:

时间和IO统计信息将显示仅执行了84次逻辑读取操作,从68ms的时间中消耗了CPU时间15ms的时间来检索请求的数据,这显然比仅使用聚簇索引要快,如下所示:

The generated execution plan also shows that an Index Seek operation will be performed to retrieve the data directly from the index, without the need to scan the Clustered index. As the Non-clustered index is not fully covering the query, the Key Lookup heavy and costly operation is performed to retrieve the rest of columns that are not available in the index from the Clustered index, based on the Clustered key of each row, consuming most of the planned weight, as shown below:

生成的执行计划还表明,将执行索引搜索操作以直接从索引中检索数据,而无需扫描聚簇索引。 由于非聚集索引没有完全覆盖查询,因此将执行键查找繁重且昂贵的操作,以基于每行的聚集键从聚集索引中检索索引中不可用的其余列,大部分计划重量,如下所示:

Creating a new Non-clustered index that covers all the query columns, using the CREATE INDEX … WITH DROP_EXISTING option to drop the old one:

使用CREATE INDEX…WITH DROP_EXISTING选项创建一个覆盖所有查询列的新非聚集索引,以删除旧的索引:

USE [SQLShackDemo]
GO
CREATE NONCLUSTERED INDEX [IX_NonClusteredIndexDemo_StudentName]
ON [dbo].[NonClusteredIndexDemo] ([StudentName])
INCLUDE ([ID],[STDBirthDate],[STDPhone],[STDAddress])
WITH (DROP_EXISTING=ON, ONLINE=ON, FILLFACTOR=90)
GO

Then execute the previous SELECT statement:

然后执行前面的SELECT语句:

SELECT * FROM [dbo].[NonClusteredIndexDemo] 
WHERE StudentName ='Edward' AND STDAddress LIKE '%Route%'

You can see from the TIME and IO statistics that, only 18 logical read operations are performed, 0ms from the CPU time is consumed in 59ms of time to retrieve the requested data, that is obviously faster than using a Non-clustered index that does not cover all requested columns, as shown in the snapshot below:

您可以从TIME和IO统计信息中看到,仅执行18个逻辑读操作,从CPU时间中花费0ms花费了59ms的时间来检索所请求的数据,这显然比使用非聚集索引的速度要快。涵盖所有请求的列,如下面的快照所示:

From the generated execution plan, you can see that an Index Seek operation will be performed to retrieve all the data from the created Non-clustered index, without visiting the Clustered index, as shown below:

从生成的执行计划中,您可以看到将执行索引查找操作,以从创建的非聚集索引中检索所有数据,而无需访问聚集索引,如下所示:

数据修改开销 (Data modification overhead)

The Non-clustered is very useful in speeding up the data retrieval process, as we saw from the previous examples. On the other hand, each created index will add extra overhead to data modification operations. Let us check the index overhead on the data modification operations. Assume that we drop our testing table and create an empty heap copy of that table, using the script below:

从前面的示例中可以看到,非集群对加快数据检索过程非常有用。 另一方面,每个创建的索引都会为数据修改操作增加额外的开销。 让我们检查数据修改操作的索引开销。 假设我们使用以下脚本删除测试表并创建该表的空堆副本:

USE SQLShackDemo
GO
DROP TABLE NonClusteredIndexDemo
GO
CREATE TABLE NonClusteredIndexDemo
(
  ID INT IDENTITY (1,1) NOT NULL,
  StudentName NVARCHAR(50)  NULL,
  STDBirthDate DATETIME  NULL,
  STDPhone VARCHAR(20) NULL,
  STDAddress NVARCHAR(MAX) NULL
 )

Then we will fill the heap table with 200K rows, using ApexSQL Generate. You will see that it will take 4.53 seconds to fill the table, as shown below:

然后,我们将使用ApexSQL Generate将20万行填充到堆表中。 您将看到填写表格需要4.53秒,如下所示:

If you truncate the table and create a new Non-clustered index on that table, using the T-SQL script below:

如果截断表并使用以下T-SQL脚本在该表上创建新的非聚集索引:

TRUNCATE TABLE [NonClusteredIndexDemo]
GO
CREATE NONCLUSTERED INDEX [IX_NonClusteredIndexDemo_StudentName]
ON [dbo].[NonClusteredIndexDemo] ([StudentName])
INCLUDE ([ID],[STDBirthDate],[STDPhone],[STDAddress])
WITH ( ONLINE=ON, FILLFACTOR=90)
GO

And try now to fill the table with 200K rows again, using ApexSQL Generate. You will see that it will take 5.01 seconds to fill the table, with about 0.48 seconds overhead due to writing to both the underlying table and the Non-clustered index, as shown below:

现在,使用ApexSQL Generate再次尝试用200K行填充表。 您将看到填充表将花费5.01秒,由于同时写入基础表和非聚集索引,因此开销约为0.48秒,如下所示:

You can imagine the extra data modification overhead that will be caused by adding other indexes to that table. Let us take another example. We will drop the previously created Non-clustered index and try to update the addresses of specific students, using the T-SQL script below:

您可以想象,由于向该表添加其他索引而导致的额外数据修改开销。 让我们再举一个例子。 我们将使用以下T-SQL脚本删除之前创建的非聚集索引,并尝试更新特定学生的地址:

DROP INDEX [IX_NonClusteredIndexDemo_StudentName]
ON [dbo].[NonClusteredIndexDemo]
 
UPDATE [dbo].[NonClusteredIndexDemo]  SET STDAddress='335 Delaware Avenue'
WHERE  STDAddress LIKE '%Delaware%'

You will see from the IO and Time statistics show that the SQL Server Engine will perform 2117 logical reads, consume 375ms of the CPU time within 374ms, to update the table that has no index, as shown below:

您将从IO和时间统计信息中看到,SQL Server Engine将执行2117个逻辑读取,在374ms内消耗375ms的CPU时间,以更新没有索引的表,如下所示:

If we create the covering Non-clustered index again using the CREATE INDEX T-SQL statement below:

如果我们使用下面的CREATE INDEX T-SQL语句再次创建覆盖的非聚集索引:

CREATE NONCLUSTERED INDEX [IX_NonClusteredIndexDemo_StudentName]
ON [dbo].[NonClusteredIndexDemo] ([StudentName])
INCLUDE ([ID],[STDBirthDate],[STDPhone],[STDAddress])
WITH ( ONLINE=ON, FILLFACTOR=90)
GO

Then execute a new UPDATE statement:

然后执行一个新的UPDATE语句:

UPDATE [dbo].[NonClusteredIndexDemo]  SET STDAddress='336 Delaware Avenue'
WHERE  STDAddress LIKE '%Delaware%'

The IO and Time statistics will show that, the SQL Server Engine will perform 7876 logical reads on the main table and the Non-clustered index pages “Worktable”, consume 391ms of the CPU time within 398ms, to update the table that has no index, with all counters larger than updating the table that has no index, as shown below:

IO和时间统计信息表明,SQL Server引擎将在主表和非聚集索引页“ Worktable”上执行7876逻辑读取,在398ms内消耗391ms的CPU时间,以更新没有索引的表,并且所有计数器都大于更新没有索引的表,如下所示:

The previous statistics show the overhead caused by performing an INSERT and an UPDATE operation one time. You can imagine the overhead caused on a table with heavy INSERT, UPDATE and DELETE requirements, with multiple numbers of Non-clustered indexes. But again, you need to compromise between data retrieval and the data modification operations that are performed on your table, before planning to create a new index on that table.

先前的统计信息显示了一次执行一次INSERT和UPDATE操作所导致的开销。 您可以想象在具有大量INSERT,UPDATE和DELETE要求且具有多个非聚集索引的表上造成的开销。 但是同样,在计划在该表上创建新索引之前,需要在数据检索和对表执行的数据修改操作之间做出折衷。

In this article, we tried to cover all aspects of the Non-Clustered index concept, theoretically and practically. In the next article, we will go through the other types of the SQL Server Indexes. Stay tuned!

在本文中,我们尝试在理论上和实践上涵盖非聚集索引概念的所有方面。 在下一篇文章中,我们将介绍其他类型SQL Server索引。 敬请关注!

目录 (Table of contents)

SQL Server indexes – series intro
SQL Server table structure overview
SQL Server index structure and concepts
SQL Server index design basics and guidelines
SQL Server index operations
Designing effective SQL Server clustered indexes
Designing effective SQL Server non-clustered indexes
Working with different SQL Server indexes types
Tracing and tuning queries using SQL Server indexes
Gathering SQL Server index statistics and usage information
Maintaining SQL Server Indexes
Top 25 interview questions and answers about SQL Server indexes
SQL Server索引–系列介绍
SQL Server表结构概述
SQL Server索引结构和概念
SQL Server索引设计基础和准则
SQL Server索引操作
设计有效SQL Server群集索引
设计有效SQL Server非聚集索引
使用不同SQL Server索引类型
使用SQL Server索引跟踪和调整查询
收集SQL Server索引统计信息和使用情况信息
维护SQL Server索引
有关SQL Server索引的25个最佳面试问答

翻译自: https://www.sqlshack.com/designing-effective-sql-server-non-clustered-indexes/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值