sql 查询数据库索引重建_不良的数据库索引– SQL查询性能的杀手–建议

最新推荐文章于 2023-07-17 12:46:31 发布

culuo4781

最新推荐文章于 2023-07-17 12:46:31 发布

阅读量215

点赞数

文章标签：数据库 mysql java sql oracle

原文链接：https://www.sqlshack.com/poor-database-indexing-sql-query-performance-killer-recommendations/

版权

sql 查询数据库索引重建

previous article, we explained what clustered and nonclustered indexes were, and showed how to create them. In this article, we will explain what poor indexing is and give general recommendations for table indexing and selecting the right columns to create indexes on. 在上一篇文章中，我们解释了什么是聚集索引和非聚集索引，并说明了如何创建它们。在本文中，我们将解释什么是差的索引编制，并提供有关表索引编制和选择正确的列以创建索引的一般建议。

Indexing strategy is complex; it depends on many factors, including database structure, queries, and stored procedures used. One of general recommendations is to create a clustered index on tables where data is frequently queried. Although some DBAs and developers don’t prefer having clustered indexes on tables frequently inserted or updated, others consider that a clustered index on the right column can improve performance in these situations.

索引策略很复杂；它取决于许多因素，包括数据库结构，查询和使用的存储过程。一般建议之一是在经常查询数据的表上创建聚簇索引。尽管有些DBA和开发人员不希望在频繁插入或更新的表上使用聚集索引，但其他人则认为在这种情况下，右列上的聚集索引可以提高性能。

Creating a clustered index on every table is highly recommended, the challenge is to create the right index.

强烈建议在每个表上创建聚簇索引，挑战是创建正确的索引。

With a proper clustered index, less reads are required to retrieve the records requested by a query or stored procedure. Therefore, fewer disk I/O are preformed and the operation is completed faster.

使用适当的聚集索引，只需较少的读取即可检索查询或存储过程所请求的记录。因此，执行的磁盘I / O更少，并且操作完成得更快。

A clustered index provides more efficient search for values in a specific range. When you already have a table where the records are sorted ascending by e.g. AddressID, it’s easy to find the rows where AddressID is between 100 and 200, or lower than 500.

聚簇索引可以更有效地搜索特定范围内的值。如果您已经有一个表，其中的记录按例如AddressID进行升序排序，则很容易找到AddressID在100到200之间或小于500的行。

One of the rare scenarios where a heap table can be a good practice is when the row identifier is smaller than the clustered index.

堆表可能是一种好习惯的罕见情况之一是，当行标识符小于聚集索引时。

什么是差索引？ (What is poor indexing?)

Any SQL Server table configuration where performance suffers due to excessive, improper, or missing indexes is considered to be poor indexing.

由于索引过多，不正确或丢失而导致性能下降的任何SQL Server表配置都被认为是索引编制不佳。

If indexes are not properly created, SQL Server has to go through more records in order to retrieve the data requested by a query. Therefore, it uses more hardware resources (processor, memory, disk, and network) and obtaining the data lasts longer.

如果未正确创建索引，则SQL Server必须遍历更多记录才能检索查询请求的数据。因此，它使用更多的硬件资源（处理器，内存，磁盘和网络），并且获取数据的时间更长。

A wrong index can be an index created on a column that doesn’t provide easier data manipulation or an index created on multiple columns which instead of speeding up queries, slows them down.

错误的索引可以是在无法提供更轻松数据操作的列上创建的索引，也可以是在多个列上创建的索引，而不是加快查询速度，而是降低查询速度。

A table without a clustered index can also be considered as a poor indexing practice. Execution of a SELECT statement, inserting, updating, and deleting records is in most cases slower on a heap table than on a clustered one.

没有聚集索引的表也可以被认为是不良的索引实践。在大多数情况下，在堆表上执行SELECT语句，插入，更新和删除记录要比在群集表上慢。

使用哪些列来建立索引？ (Which columns to use to build an index? )

Both clustered and nonclustered indexes can be built from one or more table columns.

聚集索引和非聚集索引都可以从一个或多个表列中构建。

When you create a new table with a primary key in a SQL Server database, a unique clustered index is automatically created on the primary key column. Although this default action is acceptable in most cases, this might not be the optimal clustered index.

当您在SQL Server数据库中使用主键创建新表时，会在主键列上自动创建唯一的聚集索引。尽管此默认操作在大多数情况下是可以接受的，但这可能不是最佳的聚集索引。

The column used for a clustered index should be a unique, identity, or primary key, or any other column where the value is increased for each new entry. As clustered indexes sort the records based on the value, using a column already ordered ascending, such as an identity column, is a good solution.

用于聚集索引的列应该是唯一键，标识或主键，或任何其他为每个新条目增加值的列。当聚簇索引根据值对记录进行排序时，使用已排序升序的列（例如标识列）是一个很好的解决方案。

If a column where new values are not higher than previous is used for a clustered index, adding each new row would require re-ordering, i.e. moving the whole row and placing it to its proper location in accordance with clustered index ordering, thus splitting data pages and affecting SQL Server performance. If such clustered index is created on a table with frequent inserts and updates, it can cause performance degradation.

如果将新值不高于先前值的列用于聚簇索引，则添加每个新行将需要重新排序，即根据聚簇索引的顺序移动整行并将其放置在适当的位置，从而拆分数据页并影响SQL Server性能。如果在具有频繁插入和更新的表上创建此类聚集索引，则可能导致性能下降。

It’s not recommended to use the primary key as a clustered key without checking whether that is the optimal solution in you scenario first. Also, note the difference between a primary key and clustered index – a primary key can’t have duplicate or null values, while a clustered index can.

不建议先使用主键作为群集键，而不先检查这是否是您方案中的最佳解决方案。另外，请注意主键和聚集索引之间的区别–主键不能具有重复或空值，而聚集索引可以。

Using a unique column for a clustered index enables more efficient search for a specific value.

对聚集索引使用唯一列可以更有效地搜索特定值。

On the other hand, a column that frequently changes its value should not be used for a clustered index. Each change of the column used for the clustered index requires the records to be reordered. This re-ordering can easily be avoided by using a column that is not updated frequently, or not updated at all.

另一方面，频繁更改其值的列不应用于聚集索引。用于聚集索引的列的每次更改都要求对记录进行重新排序。通过使用不经常更新或根本不更新的列，可以轻松避免这种重新排序。

Using a column that stores large data, such as BLOB columns (text, nvarchar(max), image, etc.), and GUID columns is not recommended. Using large values to sort the data is not efficient, and in case of GUID and image columns doesn’t seem to make sense.

不建议使用存储大数据的列，例如BLOB列（文本，nvarchar（max），图像等）和GUID列。使用大值对数据进行排序效率不高，并且在GUID和图像列的情况下似乎没有意义。

A clustered index should not be built on a column already used in a unique index.

聚簇索引不应建立在已用于唯一索引的列上。

In the tests below, we used the Person.Address table in the AdventureWorks database.

在下面的测试中，我们使用了AdventureWorks数据库中的Person.Address表。

CREATE TABLE [Person].[Address] (
	    [AddressID] [int] IDENTITY(1, 1) NOT FOR REPLICATION NOT NULL
	    ,[AddressLine1] [nvarchar](60) NOT NULL
	    ,[AddressLine2] [nvarchar](60) NULL
	    ,[City] [nvarchar](30) NOT NULL
	    ,[StateProvinceID] [int] NOT NULL
	    ,[PostalCode] [nvarchar](15) NOT NULL
	    ,[SpatialLocation] [geography] NULL
	    ,[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL
	    ,[ModifiedDate] [datetime] NOT NULL
	    ,CONSTRAINT [PK_Address_AddressID] PRIMARY KEY CLUSTERED ([AddressID] ASC) WITH (
		        PAD_INDEX = OFF
		        ,STATISTICS_NORECOMPUTE = OFF
		        ,IGNORE_DUP_KEY = OFF
		        ,ALLOW_ROW_LOCKS = ON
		        ,ALLOW_PAGE_LOCKS = ON
		        ) ON [PRIMARY]
	    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

We created almost identical Person.Address1 table. The difference is that the AddressID column is a nonclustered index and the clustered index is created on the non-unique City column. The data is the same as in Person.Address

我们创建了几乎相同的Person.Address1表。区别在于AddressID列是非聚集索引，而聚集索引是在非唯一City列上创建的。数据与Person.Address中的数据相同

CREATE TABLE [Person].[Address1] (
	    [AddressID] [int] IDENTITY(1, 1) NOT FOR REPLICATION NOT NULL
	    ,[AddressLine1] [nvarchar](60) NOT NULL
	    ,[AddressLine2] [nvarchar](60) NULL
	    ,[City] [nvarchar](30) NOT NULL
	    ,[StateProvinceID] [int] NOT NULL
	    ,[PostalCode] [nvarchar](15) NOT NULL
	    ,[SpatialLocation] [geography] NULL
	    ,[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL
	    ,[ModifiedDate] [datetime] NOT NULL
	    ,CONSTRAINT [PK_Address1] PRIMARY KEY NONCLUSTERED ([AddressID] ASC) 
WITH (
		        PAD_INDEX = OFF
		        ,STATISTICS_NORECOMPUTE = OFF
		        ,IGNORE_DUP_KEY = OFF
		        ,ALLOW_ROW_LOCKS = ON
		        ,ALLOW_PAGE_LOCKS = ON
		        ) ON [PRIMARY]
	    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
 
CREATE CLUSTERED INDEX [Clustered_City_Address1] ON [Person].[Address1]
(
	    [City] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

The actual query execution plan for a SELECT * statement looks the same for both tables. The operator that requires most resources is the Clustered Index Scan.

对于两个表，SELECT *语句的实际查询执行计划看起来相同。需要最多资源的操作员是“聚集索引扫描”。

The time needed to execute the Clustered Index Scan operator is significantly different.

执行“ 聚集索引扫描”运算符所需的时间明显不同。

As shown in the Clustered Index Scan properties, the Estimated I/O, Operator, and Subtree costs are almost twice as high for the Person.Address1 table.

如“ 聚集索引扫描”属性中所示，“ Person.Address1”表的“ 估计I / O”，“运算符 ”和“ 子树”成本几乎是其两倍。

使用索引的缺点 (Disadvantages of using indexes)

As noted above, wrong indexes can significantly slow down SQL Server performance. But even the indexes that provide better performance for some operations, can add overhead for others.

如上所述，错误的索引会大大降低SQL Server的性能。但是，即使为某些操作提供更好性能的索引也会增加其他操作的开销。

While executing a SELECT statement is faster on a clustered table, INSERTs, UPDATEs, and DELETEs require more time, as not only data is updated, but the indexes are updated also. For clustered indexes, the time increase is more significant, as the records have to maintain the correct order in data pages. Whether a new record is inserted, or an existing deleted or updated, this usually requires the records to be reordered.

虽然在群集表上执行SELECT语句的速度更快，但是INSERT，UPDATE和DELETE需要更多的时间，因为不仅要更新数据，而且还要更新索引。对于聚集索引，时间增加更为显着，因为记录必须保持数据页中的正确顺序。无论是插入新记录，还是现有的删除或更新记录，这通常都需要对记录进行重新排序。

Another cost of having indexes on tables is that more data pages and memory is used.

在表上建立索引的另一成本是使用更多的数据页和内存。

The operator costs from the Actual Query Execution Plan for the tables mentioned above are shown below. The table with the clustered index on the primary key column is [Person].[Address], the table with the clustered index on a non-primary key column is [Person].[Address1] . We created two more copies of the same table, one with a nonclustered index on the primary key column and the other without any indexes. The data in all four tables is identical.

下面显示了上述表的“实际查询执行计划”中的操作员成本。在主键列上具有聚集索引的表是[Person]。[Address]，在非主键列上具有聚集索引的表是[Person]。[Address1]。我们又创建了同一张表的两个副本，一个副本在主键列上具有非聚集索引，另一个副本没有任何索引。所有四个表中的数据都是相同的。

	Clustered index on primary key	Clustered index on any column	Nonclustered index	No indexes
SELECT *	0.28	0.43	0.28	0.28
SELECT <list>	0.18	0.43	0.28	0.28
INSERT	0.04	0.02	0.02	0.01
UPDATE	0.02	0.01	0.01	0.3
DELETE	0.05	0.02	0.02	0.3

	主键上的聚集索引	任何列上的聚集索引	非聚集索引	没有索引
选择 *	0.28	0.43	0.28	0.28
选择<列表>	0.18	0.43	0.28	0.28
插	0.04	0.02	0.02	0.01
更新	0.02	0.01	0.01	0.3
删除	0.05	0.02	0.02	0.3

When it comes to executing a SELECT statement, the cost is the lowest when the list of returned columns is specified and the statement is executed on the table where the clustered index is created on the primary key column. The cost can be higher for a table with a non-optimal clustered index (shown in column 2), then on tables with a nonclustered index or no indexes at all.

在执行SELECT语句时，指定返回列的列表并在主键列上创建了聚集索引的表上执行该语句时，成本最低。对于具有非最佳聚集索引的表（在第2列中显示），其成本可能更高，然后对于具有非聚集索引或根本没有索引的表，其成本可能更高。

While executing a SELECT statement is faster on a clustered table, executing DELETEs and UPDATEs requires more time. For the latter statements, the performance of a table with a nonclustered index is the same as for the table with a clustered index on a column other than the primary key.

虽然在群集表上执行SELECT语句的速度更快，但是执行DELETE和UPDATE则需要更多时间。对于后面的语句，具有非聚集索引的表的性能与具有非主键的列上具有聚集索引的表的性能相同。

Note that INSERTs on a table without indexes is the fastest of all – this is expected as neither re-ordering nor index updating is required. On the same table, executing UPDATEs and DELETEs is the most expensive. Again, this is expected, as SQL Server requires most time to find the specific records in such table.

请注意，没有索引的表上的INSERT是所有表中最快的-这是预期的，因为不需要重新排序或更新索引。在同一张表上，执行UPDATE和DELETE最昂贵。同样，这是预料之中的，因为SQL Server需要大多数时间才能在此表中找到特定的记录。

As shown, indexes can speed up some queries and slow down others. In this article, we provided some basic guidelines for clustered and nonclustered indexes, as well as which columns are preferred to build indexes on, and which should be avoided. Finding the right balance between the benefits and overhead indexes bring provides optimal performance to your queries and stored procedures.

如图所示，索引可以加快某些查询的速度，而降低其他查询的速度。在本文中，我们提供了一些有关聚簇索引和非聚簇索引的基本准则，以及在其上建立索引的首选列，以及应避免使用的列。在收益和开销索引之间找到合适的平衡，可以为查询和存储过程提供最佳性能。