索引sql server_SQL Server索引设计基础和准则

最新推荐文章于 2023-10-14 06:40:31 发布

culuo4781

最新推荐文章于 2023-10-14 06:40:31 发布

阅读量264

点赞数

文章标签：数据库大数据 python mysql java

原文链接：https://www.sqlshack.com/sql-server-index-design-basics-and-guidelines/

版权

本文探讨了SQLServer索引设计的基本原则和最佳实践，包括理解工作负载类型、考虑列特性、优化列顺序和排序、选择合适的索引类型以及考虑索引存储方式，旨在帮助读者设计出既能提高数据检索效率又不会过度增加数据修改开销的有效索引。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

索引sql server

In the previous article of this series, SQL Server Index Structure and Concepts, we described, in detail, the anatomy of QL Server indexes, the B-Tree structure, the properties of the indexes, the main types of indexes and the advantages of using the indexes for performance tuning.

在本系列的上一篇文章《 SQL Server索引结构和概念》中，我们详细描述了QL Server索引的结构，B-Tree结构，索引的属性，索引的主要类型以及使用索引的优势。性能调整的索引。

You may recall how we described the index previously as a double-edged sword, where you can derive significant benefits from an index, that is well designed, in improving the performance of your data retrieval operations. On the other hand, a poorly designed index, or the lack of necessary indexes, will cause performance degradation in your system. In this article, we will go through the basics and best practices that help you in designing the most effective index to meet the requirements of your system with the possible performance enhancements.

您可能还记得我们之前将索引描述为一把双刃剑的情况，在这种情况下，可以从精心设计的索引中受益匪浅，以提高数据检索操作的性能。另一方面，索引设计不当或缺少必要的索引将导致系统性能下降。在本文中，我们将介绍一些基础知识和最佳实践，这些基础知识和最佳实践可帮助您设计最有效的索引，并通过可能的性能增强来满足系统要求。

The decision of choosing the right index that fits the system’s workload is not an easy task, as you need to compromise between the speed of data retrieval operations and the overhead of adding that index on the data modification operations. Another factor that affects index design decisions is the list of columns that will participate in the index key, where you should take into consideration that, more columns included in the index key to cover your application queries will require more disk space and extra maintenance overhead. You need also to make sure that, when you create the table index, the SQL Server Query Optimizer will choose the created index to retrieve data from the table in most cases. The SQL Server Query Optimizer decision whether to use the index or not depends on the performance gain that will be taken from using that index.

选择适合系统工作负载的正确索引不是一件容易的事，因为您需要在数据检索操作的速度和在数据修改操作上添加该索引的开销之间进行权衡。影响索引设计决策的另一个因素是将参与索引键的列的列表，您应考虑到，索引键中包含的更多列以覆盖您的应用程序查询将需要更多的磁盘空间和额外的维护开销。您还需要确保在创建表索引时，在大多数情况下，SQL Server查询优化器将选择创建的索引来从表中检索数据。 SQL Server查询优化器决定是否使用索引取决于使用该索引将获得的性能提升。

工作量类型 (Workload type)

Before creating an index, you should understand the workload type of the database. On Online Transaction Processing (OLTP) database, workloads are used for transactional systems, in which most of the submitted queries are data modification queries. In contrast, Online Analytical Processing (OLAP) database workloads are used for data warehousing systems, in which most of the submitted queries are data retrieval queries that filter, group, aggregate and join large data sets quickly. The difference between the OLTP and OLAP databases can be summarized in the figure below:

在创建索引之前，您应该了解数据库的工作负载类型 。在联机事务处理（OLTP）数据库上，工作负载用于事务系统，其中大多数提交的查询是数据修改查询。相反，在线分析处理（OLAP）数据库工作负载用于数据仓库系统，其中大多数提交的查询是数据检索查询，这些查询可以快速过滤，分组，聚合和联接大型数据集。下图概述了OLTP和OLAP数据库之间的区别：

Creating a large number of indexes on a database table affects data modification (e.g. Updates) operations performance. When you add or modify a row in the underlying table, the row will also be adjusted appropriately in all related table indexes. Because of that, you need to avoid creating a large number of indexes on the heavily modified tables and create the minimum possible number of indexes, with the least possible number of columns on each index. You can overcome this problem by writing queries that add or modify rows in batches, rather than writing a single query for each insert or modify operation. For Online Analytical Processing (OLAP) workloads, in which tables have low modification requirements, you can create a large number of indexes that improve the performance of the data retrieval operations.

在数据库表上创建大量索引会影响数据修改（例如，更新）操作的性能。在基础表中添加或修改行时，该行还将在所有相关表索引中进行适当调整。因此，您需要避免在经过大量修改的表上创建大量索引，并创建尽可能少的索引，而在每个索引上创建尽可能少的列。您可以通过编写查询来批量添加或修改行，而不是为每个插入或修改操作编写单个查询来解决此问题。对于表具有较低修改要求的联机分析处理（OLAP）工作负载，您可以创建大量索引来改善数据检索操作的性能。

桌子尺寸 (Table Size)

It is not recommended to create indexes on small tables, as it takes the SQL Server Engine less time scanning the underlying table than traversing the index when searching for a specific data. In this case, the index will not be used and still affect the data modification performance, as it will be always adjusted when modifying the underlying table’s data.

不建议在小型表上创建索引，因为在搜索特定数据时，与遍历索引相比，SQL Server Engine扫描基础表所花的时间更少。在这种情况下，将不使用索引，并且仍会影响数据修改性能，因为在修改基础表的数据时将始终对其进行调整。

表格栏 (Table Columns)

In addition to database workload characteristics, the characteristics of the table columns that are used in the submitted queries should be also considered when designing an index. For instance, the columns with exact numeric data types, such as INT and BIGINT data types and that are UNIQUE and NOT NULL are considered optimal columns to participate in the index key.

除了数据库工作负载特征外，在设计索引时还应考虑在提交的查询中使用的表列的特征。例如，具有精确数值数据类型（例如INT和BIGINT数据类型）且为UNIQUE和NOT NULL的列被视为参与索引键的最佳列。

In most cases, a long-running query is caused by indexing a column with few unique values. Although it is not possible to add the columns with ntext, text, image, varchar(max), nvarchar(max), and varbinary(max) data types to the index key columns, it is possible to add these data types to non-key index columns, but only in case of critical need. A column with XML data type can be added only to an XML index type.

在大多数情况下，长时间运行的查询是由索引唯一值很少的列引起的。尽管无法将具有ntext，text，image，varchar（max），nvarchar（max）和varbinary（max）数据类型的列添加到索引键列，但是可以将这些数据类型添加到非键索引列，但仅在有关键需求的情况下。具有XML数据类型的列只能添加到XML索引类型。

列顺序和排序 (Columns Order and Sorting)

It is recommended to create the indexes on columns that are used in the query predicates and join conditions in the correct order that is specified in the predicate. In this way, the goal is keeping the index key short, without including the rarely used columns, in order to minimize the index complexity, storage and maintenance overheads. You can also improve query performance by creating a covering index that contains all data required by the query, without the need to read from the underlying table.

建议在查询谓词中使用的列上创建索引，并以谓词中指定的正确顺序连接条件。通过这种方式，目标是使索引键短而不包含很少使用的列，以最大程度地减少索引复杂性，存储和维护开销。您还可以通过创建覆盖索引来包含查询所需的所有数据，而无需从基础表中读取数据，从而提高查询性能。

To check the importance of the column order in the index key, let us create a simple test table using the CREATE TABLE T-SQL statement below:

为了检查索引键中列顺序的重要性，让我们使用下面的CREATE TABLE T-SQL语句创建一个简单的测试表：

And create a non-clustered index on that table using the LastName and FirstName columns, as shown below:

然后使用LastName和FirstName列在该表上创建一个非聚集索引，如下所示：

Then we will fill the table with 10K records, using the ApexSQL Generate, test data generator tool, as shown below:

然后，我们将使用ApexSQL Generate测试数据生成器工具向该表填充10K条记录，如下所示：

The table is ready now for our testing scenario. We will execute two SELECT statements, the first one will search using the FirstName column, which is the second column in the index key columns, and the second query will search using the LastName column, which is the first column in the index key columns.

该表现已准备好用于我们的测试方案。我们将执行两个SELECT语句，第一个将使用FirstName列（索引键列中的第二列）进行搜索，第二个查询将使用LastName列（索引键列中的第一列）进行搜索。

From the result, it is clear that the SQL Server Engine will scan all the index data to search for the students with the FirstName value specified in the WHERE clause. The SQL Server Engine is not able to fully take benefits from the created index, due to the columns order in the created index. In the second query, the SQL Server Engine seeks directly for the requested value and takes benefits from the created index as we are using the LastName column in the WHERE clause, which is the first column in the index. You can also see that the weight of the query that is scanning the index is 92% of the overall weight, compared to the one that is scanning the index with weight equal to 8% of the overall weight, as shown clearly below:

从结果很明显，SQL Server引擎将扫描所有索引数据，以使用WHERE子句中指定的FirstName值搜索学生。由于创建索引中的列顺序，SQL Server Engine无法充分利用创建的索引。在第二个查询，SQL Server引擎直接要求赔偿请求的值，并为我们的WHERE子句，这是该指数的第一列中使用姓氏列从创建的索引需要的好处。您还可以看到，扫描索引的查询的权重为总权重的92％，而扫描索引的权重等于总权重的8％，如下所示：

It is also recommended when designing an index, to consider if the columns that participate in the index key will be sorted in ascending or descending order, with the ascending order as the default order, depending on the system requirements. In the previously created Students table, assume that we are interested in returning the students information sorted ascending by the FirstName and descending by the LastName. If the table has no index created on the FirstName and LastName, after dropping the old index as shown below:

还建议在设计索引时，考虑参与索引键的列是按升序还是降序排序，并根据系统要求将升序设置为默认顺序。在先前创建的“学生”表中，假设我们有兴趣返回按“姓”升序和“姓”降序的学生信息。如果在表上未在FirstName和LastName上创建索引，则在删除旧索引后，如下所示：

Then execute the below query, that will return the result sorted ascending by the FirstName and descending by the LastName. You will see that an extra operator will be used in the execution plan to Sort the result as specified in the ORDER BY clause. You can imagine the extra execution time consumed and performance overhead that is caused by the Sort operator, as shown below:

然后执行以下查询，该查询将返回按名字升序排列和按姓氏降序排列的结果。您将看到执行计划中将使用一个额外的运算符对ORDER BY子句中指定的结果进行排序。您可以想象一下由Sort运算符引起的额外执行时间消耗和性能开销，如下所示：

If we create an index on that table that includes both the FirstName sorted ascending and the LastName sorted descending, using the CREATE INDEX T-SQL statement below:

如果我们在该表上创建一个索引，该索引同时包含FirstName排序的升序和LastName排序的降序，请使用以下CREATE INDEX T-SQL语句：

Then trying to run the same previous query again, you will see that the SQL Server Engine will directly seek for the requested data, without an extra overhead to sort the data, as it already matches the order specified in the ORDER BY clause, as shown below:

然后尝试再次运行相同的先前查询，您将看到SQL Server Engine将直接查找请求的数据，而无需对数据进行排序，因为它已经与ORDER BY子句中指定的顺序相匹配，如图所示。下面：

索引类型 (Index Type)

Studying the available types of SQL Server indexes is also recommended, in order to decide which index type will enhance the performance of the current workload, such as Clustered indexes that can be used to sort huge tables, Columnstore Indexes that can be used to enhance the processing performance of the Online Analytical Processing (OLAP) read only workload of data warehouses databases or Filtered indexes for columns that have well-defined subsets of data, such as NULL values, distinct ranges or categorized values.

还建议研究可用SQL Server索引类型，以便确定哪种索引类型将增强当前工作负载的性能，例如可用于对大型表进行排序的聚簇索引 ，可用于增强大型表的列存储索引。联机分析处理（OLAP）的处理性能仅读取数据仓库数据库的工作负荷或具有明确定义的数据子集（例如NULL值，不同范围或分类值）的列的过滤索引 。

索引储存 (Index Storage)

The storage location of the index may also affect the performance of the queries reading from the index. By default, the index will be stored in the same filegroup as the underlying table on which the index is created. If you design a Non-Clustered index to be stored in a data file different from the underlying table data file and located in a separate disk drive, or horizontally partition the index to span multiple filegroups, the performance of the queries that are reading from the index will be improved, due to the I/O performance enhancement resulted from hitting on different data files and disk drives at the same time.

索引的存储位置也可能会影响从索引读取的查询的性能。默认情况下，索引将与在其上创建索引的基础表存储在同一文件组中。如果您将非聚集索引设计为存储在与基础表数据文件不同的数据文件中，并位于单独的磁盘驱动器中，或者将索引水平分区为跨多个文件组，则从数据库读取的查询的性能由于同时击中不同的数据文件和磁盘驱动器而导致的I / O性能增强，索引将得到改善。

The initial storage of the index can be also optimized by setting the FILLFACTOR option to a value different from the default value of 0 or 100. FILLFACTOR is the value that determines the percentage of space on each leaf-level page to be filled with data. Setting the FILLFACTOR to 90% when you create or rebuild an index, SQL Server will try to leave 10% of each leaf page empty, reserving the remainder on each page as free space for future growth to prevent the page splitting and index fragmentation performance problems.

还可以通过将FILLFACTOR选项设置为不同于默认值0或100的值来优化索引的初始存储。FILLFACTOR是确定每个叶子级页面上要填充数据的空间百分比的值。在创建或重建索引时，将FILLFACTOR设置为90％，SQL Server将尝试将每个叶子页的10％留空，将每个页面上的其余部分保留为可用空间以供将来增长，以防止页面拆分和索引碎片性能问题。

结论 (Conclusion)

After drawing your indexes design strategy, you can easily start applying it to create new useful indexes that the SQL Server Query Optimizer will choose and trust, in order to create the most optimal execution plan for the submitted queries, providing the best overall system performance.

绘制索引设计策略后，您可以轻松地开始将其应用到SQL Server查询优化器将选择并信任的新有用索引上，从而为提交的查询创建最优化的执行计划，从而提供最佳的整体系统性能。

Stay tuned for the next articles in this series, to be familiar with the operations that can be performed on the indexes and how to design, create, maintain and tune indexes.

请继续关注本系列的下一篇文章，以熟悉可以对索引执行的操作以及如何设计，创建，维护和调整索引。

目录 (Table of contents)

SQL Server indexes – series intro

SQL Server table structure overview

SQL Server index structure and concepts

SQL Server index design basics and guidelines

SQL Server index operations

Designing effective SQL Server clustered indexes

Designing effective SQL Server non-clustered indexes

Working with different SQL Server indexes types

Tracing and tuning queries using SQL Server indexes

Gathering SQL Server index statistics and usage information

Maintaining SQL Server Indexes

Top 25 interview questions and answers about SQL Server indexes

SQL Server索引设计基础和准则

使用SQL Server索引跟踪和调整查询

收集SQL Server索引统计信息和使用情况信息

维护SQL Server索引

有关SQL Server索引的25个最佳面试问答

翻译自: https://www.sqlshack.com/sql-server-index-design-basics-and-guidelines/

索引sql server