sql聚集索引和非聚集索引
This article gives an introduction of the non-clustered index in SQL Server using examples.
本文使用示例介绍SQL Server中的非聚集索引。
介绍 (Introduction)
In a previous article Overview of SQL Server Clustered indexes, we explored the requirement of an index and clustered indexes in SQL Server.
在上一篇文章SQL Server群集索引概述中 ,我们探讨了SQL Server中索引和群集索引的需求。
Before we proceed, let’s have a quick summary of the SQL Server clustered index:
在继续之前,让我们快速总结一下SQL Server聚集索引:
- It physically sorts data according to the clustered index key 它根据聚簇索引键对数据进行物理排序
- We can have only one clustered index per table 每个表只能有一个聚集索引
- A table without a clustered index is a heap, and it might lead to performance issues 没有聚集索引的表是堆,它可能会导致性能问题
- SQL Server automatically creates a clustered index for the primary key column SQL Server自动为主键列创建聚簇索引
- A clustered index is stored in b-tree format and contains the data pages in the leaf node, as shown below 聚簇索引以b树格式存储,并包含叶节点中的数据页,如下所示
Non-Clustered indexes are also useful for query performance and optimization depending upon query workload. In this article, let’s explore the non-clustered index and its internals.
非聚集索引对于查询性能和优化(取决于查询工作负载)也很有用。 在本文中,让我们探讨非聚集索引及其内部。
SQL Server中非聚集索引概述 (Overview of the non-clustered index in SQL Server)
In a non-clustered index, the leaf node does not contain the actual data. It consists of a pointer to the actual data.
在非聚集索引中,叶节点不包含实际数据。 它由指向实际数据的指针组成。
- If the table contains a clustered index, leaf node points to the clustered index data page that consists of actual data 如果表包含聚簇索引,则叶节点指向由实际数据组成的聚簇索引数据页面
- If the table is a heap (without a clustered index), leaf node points to the heap page 如果表是堆(没有聚簇索引),则叶节点指向堆页面
In the below image, we can look at the leaf level of non-clustered index pointing towards data page in the clustered index:
在下图中,我们可以查看非聚集索引的叶级别,该叶子级别指向聚集索引中的数据页:
We can have multiple non-clustered indexes in SQL tables because it is a logical index and does not sort data physically as compared to the clustered index.
我们可以在SQL表中有多个非聚集索引,因为它是一个逻辑索引,并且与聚集索引相比并不对数据进行物理排序。
Let’s understand the non-clustered index in SQL Server using an example.
让我们通过一个示例来了解SQL Server中的非聚集索引。
Create an Employee table without any index on it
创建一个没有任何索引的Employee表
CREATE TABLE dbo.Employee (EmpID INT, EMpName VARCHAR(50), EmpAge INT, EmpContactNumber VARCHAR(10) );
Insert few records in it
在其中插入几条记录
Insert into Employee values(1,'Raj',32,8474563217) Insert into Employee values(2,'kusum',30,9874563210) Insert into Employee values(3,'Akshita',28,9632547120)
Search for the EmpID 2 and look for the actual execution plan of it
搜索EmpID 2并查找其实际执行计划
Select * from Employee where EmpID=2
It does a table scan because we do not have any index on this table:
它进行表扫描,因为我们在此表上没有任何索引:
Create a unique clustered index on the EmpID column
在EmpID列上创建唯一的聚集索引
CREATE UNIQUE CLUSTERED INDEX IX_Clustered_Empployee ON dbo.Employee(EmpID);
Search for the EmpID 2 and look for the actual execution plan of it
搜索EmpID 2并查找其实际执行计划
In this execution plan, we can notice that the table scan changes to a clustered index seek:
在此执行计划中,我们可以注意到表扫描更改为聚集索引查找:
Let’s execute another SQL query for searching Employee having a specific contact number:
让我们执行另一个SQL查询来搜索具有特定联系人号码的Employee:
Select * from Employee where EmpContactNumber='9874563210'
We do not have an index on the EmpContactNumber column, therefore Query Optimizer uses the clustered index, but it scans the whole index for retrieving the record:
我们在EmpContactNumber列上没有索引,因此Query Optimizer使用聚簇索引,但是它会扫描整个索引以检索记录:
Right-click on the execution plan and select Show Execution Plan XML:
右键单击执行计划,然后选择“ 显示执行计划XML” :
It opens the XML execution plan in the new query window. Here, we notice that it uses the clustered index key and reads the individual rows for retrieving the result:
它将在新的查询窗口中打开XML执行计划。 在这里,我们注意到它使用聚簇索引键并读取各个行以检索结果:
Let’s insert a few more records in the Employee table using the following script:
让我们使用以下脚本在Employee表中插入更多记录:
Insert into Employee values(4,'Manoj',38,7892145637)
Insert into Employee values(5,'John',33,7900654123)
Insert into Employee values(6,'Priya',18,9603214569)
We have six employees’ records in this table. Now, execute the select statement again for retrieving employee records with a specific contact number:
该表中有六个员工记录。 现在,再次执行select语句以检索具有特定联系号码的员工记录:
It again scans all six rows for the result based on the specified condition. Imagine we have millions of records in the table. If SQL Server has to read all index key rows, it would be a resource and time-consuming task.
它将根据指定条件再次扫描所有六行以获取结果。 假设我们在表中有数百万条记录。 如果SQL Server必须读取所有索引键行,那将是一项资源和耗时的任务。
We can represent clustered index (not actual representation) in the B-tree format as per the following image:
我们可以按照以下图像以B树格式表示聚集索引(不是实际表示):
In the previous query, SQL Server reads the root node page and retrieves each leaf node page and row for data retrieval.
在上一个查询中,SQL Server读取根节点页面,并检索每个叶节点页面和行以进行数据检索。
Now Let’s create a unique non-clustered index in SQL Server on the Employee table on the EmpContactNumber column as the index key:
现在,让我们在EmpContactNumber列的Employee表上SQL Server中创建唯一的非聚集索引作为索引键:
CREATE UNIQUE NONCLUSTERED INDEX IX_NonClustered_Employee ON dbo.Employee(EmpContactNumber);
Before we explain this index, rerun the SELECT statement and view the actual execution plan:
在解释该索引之前,请重新运行SELECT语句并查看实际的执行计划:
In this execution plan, we can see two components:
在此执行计划中,我们可以看到两个组件:
- Index Seek (NonClustered) 索引查询(非聚集)
- Key Lookup (Clustered) 关键点查找(集群)
To understand these components, we need to look at a non-clustered index in SQL Server design. Here, you can see that the leaf node contains the non-clustered index key (EmpContactNumber) and clustered index key (EmpID):
若要了解这些组件,我们需要查看SQL Server设计中的非聚集索引。 在这里,您可以看到叶节点包含非聚集索引键(EmpContactNumber)和聚集索引键(EmpID):
Now, if rerun the SELECT statement, it traverses using the non-clustered index key and points to a page with clustered index key:
现在,如果重新运行SELECT语句,它将使用非聚集索引键遍历并指向具有聚集索引键的页面:
It shows that it retrieves the record with a combination of clustered index key and non-clustered index key. You can see complete logic for the SELECT statement as shown below:
它显示了它是使用聚集索引键和非聚集索引键的组合来检索记录的。 您可以看到SELECT语句的完整逻辑,如下所示:
- A user executes a select statement to find employee records matching with a specified contact number 用户执行选择语句以查找与指定联系人号码匹配的员工记录
- Query Optimizer uses a non-clustered index key and finds out the page number 1001 查询优化器使用非聚集索引键并找到页码1001
- This page consists of a clustered index key. You can see EmpID 1 in the above image 该页面包含一个聚集索引键。 您可以在上图中看到EmpID 1
- SQL Server finds out page no 101 that consists of EmpID 1 records using the clustered index key SQL Server使用聚簇索引键找出由EmpID 1记录组成的第101页
- It reads the matching row and returns the output to the user 它读取匹配的行并将输出返回给用户
Previously, we saw that it reads six rows to retrieve the matching row and returns one row in the output. Let’s look at an execution plan using the non-clustered index:
之前,我们看到它读取六行以检索匹配的行并在输出中返回一行。 让我们看一下使用非聚集索引的执行计划:
SQL Server中的非唯一非聚集索引 (Non-unique non-clustered index in SQL Server)
We can have multiple non-clustered indexes in a SQL table. Previously, we created a unique non-clustered index on the EmpContactNumber column.
一个SQL表中可以有多个非聚集索引。 以前,我们在EmpContactNumber列上创建了一个唯一的非聚集索引。
Before creating the index, execute the following query so that we have duplicate value in the EmpAge column:
在创建索引之前,请执行以下查询,以使EmpAge列中的值重复:
Update Employee set EmpAge=32 where EmpID=2
Update Employee set EmpAge=38 where EmpID=6
Update Employee set EmpAge=38 where EmpID=3
Let’s execute the following query for a non-unique non-clustered index. In the query syntax, we do not specify a unique keyword, and it tells SQL Server to create a non-unique index:
让我们对非唯一的非聚集索引执行以下查询。 在查询语法中,我们没有指定唯一关键字,它告诉SQL Server创建一个非唯一索引:
CREATE NONCLUSTERED INDEX NCIX_Employee_EmpAge ON dbo.Employee(EmpAge);
As we know, the key of an index should be unique. In this case, we want to add a non-unique key. The question arises: How will SQL Server make this key as unique?
众所周知,索引的键应该是唯一的。 在这种情况下,我们要添加一个非唯一键。 问题出现了: SQL Server将如何使该密钥具有唯一性 ?
SQL Server does the following things for it:
SQL Server为此执行以下操作:
- It adds the clustered index key in the leaf and non-leaf pages of the non-unique non-clustered index 它将聚集索引键添加到非唯一非聚集索引的叶子和非叶子页面中
- If the clustered index key is also non-unique, it adds a 4-byte uniquifier so that the index key is unique 如果聚簇索引键也不唯一,它将添加一个4字节的唯一化符,以便索引键是唯一的
在SQL Server的非聚集索引中包括非关键列 (Include non-key columns in non-clustered index in SQL Server)
Let’s look at the following actual execution plan again of the following query:
让我们再次查看以下查询的以下实际执行计划:
Select * from Employee
where EmpContactNumber='8474563217'
It includes index seek and key lookup operators, as shown in the above image:
它包括索引查找和关键字查找运算符,如上图所示:
- EmpID, EmpID , EmpContactNumber columns EmpContactNumber列
In this step, Query Optimizer uses key lookup on the clustered index and fetches values for EmpName and EmpAge columns
在此步骤中,Query Optimizer在聚簇索引上使用键查找,并为EmpName和EmpAge列获取值
In this step, Query Optimizer uses the nested loops for each row output from the non-clustered index for matching with the clustered index row
在此步骤中,Query Optimizer将嵌套循环用于非聚集索引输出的每一行,以与聚集索引行匹配
The nested loop might be a costly operator for large tables. We can reduce the cost using the non-clustered index non-key columns. We specify the non-key column in the non-clustered index using the index clause.
对于大型表,嵌套循环可能是一个昂贵的运算符。 我们可以使用非聚集索引非键列来降低成本。 我们使用index子句在非聚集索引中指定非关键字列。
Let’s drop and created the non-clustered index in SQL Server using the included columns:
让我们使用包含的列在SQL Server中删除并创建非聚集索引:
DROP INDEX [IX_NonClustered_Employee] ON [dbo].[Employee]
GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_NonClustered_Employee] ON [dbo].[Employee]
(
[EmpContactNumber] ASC
)
INCLUDE(EmpName,EmpAge)
Included columns are part of the leaf node in an index tree. It helps to fetch the data from the index itself instead of traversing further for data retrieval.
包含的列是索引树中叶节点的一部分。 它有助于从索引本身获取数据,而不是进一步遍历以进行数据检索。
In the following image, we get both included columns EmpName and EmpAge as part of the leaf node:
在下图中,我们将包含的列EmpName和EmpAge都作为叶节点的一部分:
Re-execute the SELECT statement and view the actual execution plan now. We do not have key lookup and nested loop in this execution plan:
重新执行SELECT语句,现在查看实际的执行计划。 在此执行计划中,我们没有键查找和嵌套循环:
Let’s hover the cursor over the index seek and view the output columns list. SQL Server can find all the columns using this non-clustered index seek:
让我们将光标悬停在索引查找上方并查看输出列列表。 SQL Server可以使用此非聚集索引查找来查找所有列:
We can improve query performance using the covering index with the help of included non-key columns. However, it does not mean we should all non-key columns in the index definition. We should be careful in index design and should test the index behavior before deployment in the production environment.
我们可以在包含的非关键列的帮助下使用覆盖索引来提高查询性能。 但是,这并不意味着我们应该在索引定义中使用所有非关键列。 我们在索引设计上应该小心谨慎,并且应该在部署到生产环境中之前测试索引行为。
结论 (Conclusion)
In this article, we explored the non-clustered index in SQL Server and its usage in combination with the clustered index. We should carefully design the index as per the workload and query behavior.
在本文中,我们探讨了SQL Server中的非聚集索引及其与聚集索引的结合使用。 我们应该根据工作量和查询行为精心设计索引。
翻译自: https://www.sqlshack.com/overview-of-non-clustered-indexes-in-sql-server/
sql聚集索引和非聚集索引