索引sql server_优化SQL Server索引策略-CSDN博客

索引sql server

指数策略概述 (Index strategies overview)

This article is about techniques for optimizing the SQL Server indexes strategy. It is an appendix of the SQL index overview and strategy article in which I covered different areas like what indexes actually do, how to create them, and I briefly mentioned some index design guidelines. Furthermore, I also presented an example of how to design them by tuning and optimizing queries, so I’ve really tried to cover all but there is always more when it comes to SQL Server indexes.

本文介绍有关优化SQL Server索引策略的技术。它是SQL索引概述和策略文章的附录，其中涵盖了不同的领域，例如索引的实际作用，如何创建索引，并简要介绍了一些索引设计准则。此外，我还提供了一个示例，说明如何通过优化和优化查询来设计它们，因此我确实尝试涵盖所有内容，但在涉及SQL Server索引时总会有更多内容。

So, this is going to be another brief article to go over a lot of important areas and take a high-level look at what kind of strategy we should employ for each major kind of index.

因此，这将是另一篇简短的文章，涉及许多重要领域，并从高层次看待每种主要索引应采用哪种策略。

If you head over to the initial article that I put as a “plug” at the beginning and read it, you’ll notice that some of this stuff I already covered but we’re going to look at it again from a higher level and show some different examples, so we can get another perspective of how all this stuff works.

如果您翻阅我作为“插件”开头的第一篇文章并阅读，您会注意到我已经介绍了其中一些内容，但是我们将在更高层次上再次进行研究。显示一些不同的示例，因此我们可以对所有这些东西的工作方式有一个新的认识。

聚集索引 (Clustered indexes)

To kick off, let’s see what clustered SQL Server indexes are and how they should be used. A clustered index is one that physically stores the data sorted on disk by the key. Now, before we continue, you most likely have heard of clustered index and have some understanding of what it is, how it works, etc. but I bet not everyone has heard about heap. Heaps are tables that don’t contain a clustered SQL Server index. They are just a pile of data, neither sorted logically or physically in any order. Having said that, it brings up the question is there any situation where we should or would have use heap? It is obviously bad. In other words, this is roughly translated to is there any situation where we wouldn’t use an index. The answer to this question is rarely. The best-case scenario would be to put a clustered SQL Server index on every table but there are cases when a table is so small that it simply won’t be beneficial to have an index on it because SQL Server would choose to do a table scan anyway, so the cost of having and managing the index would be greater than not having one.

首先，让我们看看什么是群集SQL Server索引以及如何使用它们。聚集索引是一种物理存储按键排序在磁盘上的数据的索引。现在，在继续之前，您很可能已经听说过聚集索引，并且对聚集索引，它如何工作等有所了解。但是我敢打赌，并不是所有人都听说过堆。堆是不包含群集SQL Server索引的表。它们只是一堆数据，没有以逻辑或物理顺序进行排序。话虽如此，但它提出了一个问题，在任何情况下，我们应该或应该使用堆？显然是不好的。换句话说，在任何情况下我们都不会使用索引，这可以粗略地解释为。这个问题的答案很少。最好的情况是在每个表上放置一个群集SQL Server索引，但是在某些情况下，表太小以至于不能在上面建立索引，因为SQL Server会选择做一个表无论如何进行扫描，因此拥有和管理索引的成本将比没有索引的成本高。

The bottom line is the heaps are bad because they represent unorganized data. They cause a lot more I/O for SQL Server to handle and the best way to organize this pile of unorganized data is to simply put a clustered index on it.

最重要的是，堆是坏的，因为它们表示无组织的数据。它们导致SQL Server需要处理更多的I / O，而组织这堆无组织的数据的最佳方法是简单地在其上放置聚簇索引。

Having a clustered SQL Server index on a table means the data will be physically stored and organized in order on disk by the key. The key is simply a field that we choose when creating a clustered index and it’s also going to form the base for all other non-clustered indexes which will essentially store pointers to where that data sits inside the clustered indexes. Therefore, it important to ensure the following:

在表上具有群集SQL Server索引意味着，将通过密钥在磁盘上按顺序物理存储和组织数据。密钥只是我们在创建聚簇索引时选择的字段，它也将构成所有其他非聚簇索引的基础，这些非聚簇索引实际上将存储指向数据在聚簇索引内的位置的指针。因此，重要的是要确保以下几点：

Static – First of all, it’s really important to choose a static key or in other words one that doesn’t change. Otherwise, if we choose a field that is often modified then any time that clustered index is modified SQL Server has a lot of organizing to do at the physical level. In addition to this, all non-clustered indexes need to be changed too to reflect the changes 静态 –首先，选择一个静态密钥或换句话说，不要更改一个静态密钥非常重要。否则，如果我们选择一个经常修改的字段，那么只要修改聚簇索引，SQL Server就会在物理级别进行大量组织工作。除此之外，所有非聚集索引也需要更改以反映更改
Narrow – Having a narrow key for the clustered index is equity important because non-clustered indexes are going to store clustered indexes key in their index as a pointer. Meaning if we have a wide key, all the non-clustered indexes are going to store that wide key which will essentially require a lot more data pages which will ultimately require more I/O, memory, CPU, and more work for SQL Server to do 窄 –对聚集索引使用窄键非常重要，因为非聚集索引将把聚集索引键存储在其索引中作为指针。意味着如果我们有一个宽键，那么所有非聚集索引都将存储该宽键，这实际上将需要更多的数据页，最终将需要更多的I / O，内存，CPU和更多SQL Server工作。做
Unique – This is a good SQL Server index. No matter if we choose a non-unique index key, SQL Server will always make it unique by appending four bytes to the end of it. How unique the key is directly connected to how narrow it is. If we don’t have a unique column, then we don’t have a very narrow column either 唯一 –这是一个很好SQL Server索引。无论我们是否选择非唯一索引键，SQL Server都会始终通过在其末尾附加四个字节来使其唯一。密钥的独特程度与它的狭窄程度直接相关。如果我们没有唯一的列，那么我们也没有很窄的列
Sequential – Finally, we always want a sequential key. The one that will always auto-increase. This will guarantee that whenever there is new data coming in, it will be placed in order (at the end of the index). Otherwise, if it’s not sequential, SQL Server would have a lot of shifting to do which creates fragmentation, and potential problems along the way 顺序键 –最后，我们总是需要顺序键。总是会自动增加的那个。这将确保无论何时有新数据传入，都将按顺序放置它们（在索引的末尾）。否则，如果不是连续的，SQL Server将要做很多改变，这会产生碎片，并在此过程中可能出现问题

So, when choosing a key for a clustered SQL Server index, ask yourself four questions. Is it all from above? If it has all of the above characteristics, then you have a very good clustered index key.

因此，在为群集SQL Server索引选择键时，问自己四个问题。都是从上面来的吗？如果它具有上述所有特征，那么您将拥有一个非常好的聚簇索引键。

Also, worth mentioning is that almost every table should have a clustered key except for the very rare cases when the heap is okay.

另外，值得一提的是，除了非常少见的堆可以使用的情况以外，几乎每个表都应具有集群键。

Identity columns are good because they have all of the good characteristics for an excellent clustered SQL Server index. The only downside of the identity column is that it doesn’t really describe data in any way. A good practice here would be if you can find a natural key, that’s great but if you can’t, don’t force it. Why? Because if you force it, there is a big chance that it will not have the best characteristics mentioned above.

标识列之所以不错，是因为它们具有出色的群集SQL Server索引的所有优良特性。标识列的唯一缺点是它实际上并没有以任何方式描述数据。这里的一个好习惯是，如果您可以找到一个自然键，那很好，但如果找不到，就不要强行使用它。为什么？因为如果您强制使用它，则很有可能不会具有上述最佳特性。

GUID columns are also acceptable but the big problem with those is that they are not narrow – on the contrary, they are very wide. They will most likely solve the problem of generating unique IDs almost anywhere, but they are not a great clustered key and in addition to this, they are not sequential by default. The general rule of thumb here is to only use GUID columns if you have to because there is a bigger problem. Sure thing, you can use the new sequential ID but there’s a catch. For example, if you reboot a SQL Server, there’s a chance that it will start generating them sequentially from a point prior to the most recent one before the SQL Server is rebooted.

GUID列也是可以接受的，但是它们的最大问题是它们不窄-相反，它们很宽。它们很可能会解决几乎在任何地方生成唯一ID的问题，但是它们不是一个很好的群集密钥，而且除此之外，默认情况下它们也不是顺序的。一般的经验法则是仅在必要时才使用GUID列，因为存在更大的问题。当然，您可以使用新的顺序ID，但是有一个问题。例如，如果您重新启动SQL Server，则有可能从重新启动SQL Server之前的最新点开始按顺序开始生成它们。

Multiple columns are also generally bad for SQL Server indexes, same as GUID columns, they are not narrow so we bump into the same issue with characteristics.

与GUID列相同，多列通常也不利于SQL Server索引，它们不窄，因此我们遇到了具有特征的相同问题。

Bottom line is to try to stay away from composite keys. The only use case when it would be okay to use those is when using intermediate tables because usually when we create an intermediate table, it will have a few keys in there, so it can represent many-to-many relationships and a lot of time those are going to be integers. So, a couple of integers as a composite key is okay because it’s still going to be pretty narrow. But then again if you choose a composite key that is comprised of natural keys then you can get into trouble. So, the ultimate advice is just to pay attention to those four characteristics as this is definitely the best strategy.

底线是试图远离复合键。可以使用它们的唯一用例是在使用中间表时，因为通常当我们创建一个中间表时，它将在其中有一些键，因此它可以表示多对多关系和很多时间这些将是整数。因此，可以使用几个整数作为复合键，因为它仍然很窄。但是，如果再次选择包含自然键的复合键，则可能会遇到麻烦。因此，最终建议只是注意这四个特征，因为这绝对是最佳策略。

非聚集索引 (Non-clustered indexes)

Non-clustered SQL Server indexes are dependably performing hard work when it comes to the performance of our databases. They represent a separate structure that is logically sorted and they just point to the physical data. This means that a non-clustered index can point to either a heap or a clustered index. If it’s a clustered index, it’s going to use a key as its pointer, and if it’s a heap it’s going to use row ID as its pointer.

当涉及到数据库性能时，非群集SQL Server索引可以可靠地完成艰苦的工作。它们代表逻辑上已排序的单独结构，并且它们仅指向物理数据。这意味着非聚集索引可以指向堆索引或聚集索引。如果是聚簇索引，它将使用键作为其指针，如果是堆，则将使用行ID作为其指针。

Non-clustered SQL Server indexes are most commonly used for searching columns. A good practice would be to analyze our queries every time we define an index, look for the predicates (a logical condition being applied to rows in a table), also look at all the filters in the Where clause and try to create indexes that include anything that is inside the Where clause.

非群集SQL Server索引最常用于搜索列。一个好的做法是，每次定义索引时都要分析查询，查找谓词（将逻辑条件应用于表中的行），还要查看Where子句中的所有过滤器，并尝试创建包含以下内容的索引Where子句中的任何内容。

If we do this right, that will increase chances that it can pull all the data from the actual indexes rather than have to go into the data pages which will ultimately give us huge performance gain.

如果我们做对了，那将增加它可以从实际索引中提取所有数据的机会，而不必进入数据页面，这最终将给我们带来巨大的性能提升。

Filtered SQL Server indexes are also good because they allow us to create an index on a subset of data and this is accomplished by putting a Where clause inside an index. These are great for sparse columns (ordinary columns that have an optimized storage for null values) and popular subsets of data.

筛选后SQL Server索引也很好，因为它们允许我们在数据子集上创建索引，这是通过将Where子句放在索引内来实现的。这些对于稀疏列（具有优化的空值存储的普通列）和流行的数据子集非常有用。

Last but not least, one of the greatest strategies when it comes to non-clustered indexes is indexing foreign keys. This is pretty much the best practice and you should always do it.

最后但并非最不重要的一点是，对于非聚集索引而言，最大的策略之一就是索引外键。这几乎是最佳做法，您应该始终这样做。

列存储索引 (Columnstore indexes)

Columnstore SQL Server indexes are used to speed up access to a very large amount of data on large tables. Columnstore index is basically a vertical read-only index. To understand what this means, think of a picture where data is stored horizontally on data pages. This is how most of the relation engines stores data at the record level. This means field by field, column by column… the entire row is stored in a data page.

列存储SQL Server索引用于加快对大型表上大量数据的访问。列存储索引基本上是垂直只读索引。要了解这意味着什么，请考虑一张图片，其中的数据水平存储在数据页上。这就是大多数关系引擎在记录级别存储数据的方式。这意味着逐字段，逐列……整行存储在数据页中。

On the other hand, with the vertical index, it just stores a column. Performance wise, when SQL Server loads data pages into memory, it could be loading a lot of unnecessary data to extract what it needs. When only the column’s data is loaded we have efficient use of memory and SQL Server has a lot less to do because it’s loading only what it needs into memory.

另一方面，对于垂直索引，它仅存储一列。在性能方面，当SQL Server将数据页加载到内存中时，它可能正在加载很多不必要的数据以提取所需的内容。当仅加载列的数据时，我们可以有效地使用内存，而SQL Server只需要做的事情就很多，因为它仅将所需的内容加载到内存中。

战略方针 (Strategy guidelines)

How many SQL Server indexes should a table have is the most asked question. The rule of thumb is to keep it under 10 for OLTP environments. Depending on the type of table, the indexing can vary and be less or more than 10. For example, let’s say there is a table that is rarely hit with Inserts, Updates, and Deletes than we can go above 10. On the other side, if a table is extremely busy then it’s a good idea to go with far more less than 10. In general, we should stick to under 10 indexes for OLTP environments.

一个表应具有多少个SQL Server索引是最常见的问题。经验法则是在OLTP环境中将其保持在10以下。根据表的类型，索引可以变化，并且可以小于或大于10。例如，假设有一个表很少被插入，更新和删除命中，而不能超过10。，如果一个表非常忙，那么最好使用少于10个的表。通常，对于OLTP环境，我们应该坚持使用10个以下的索引。

In a world of data warehouses AKA OLAP, the above rule is the opposite. OLAP is characterized by a relatively low volume of transactions. So, what that means is there should be no activity or much less activity in our data warehouses. The only thing to be considered here is performance. Therefore, we should generously index OLAP environments.

在数据仓库（又称为OLAP）的世界中，上述规则是相反的。 OLAP的特点是交易量相对较低。因此，这意味着在我们的数据仓库中应该没有任何活动或更少的活动。这里唯一要考虑的是性能。因此，我们应该慷慨地索引OLAP环境。

Furthermore, don’t forget three important things to deploy to decrease fragmentation. Try to specify fill factor for a SQL Server index. If we have a table that is hit very often, and in addition to this there is high fragmentation, then this means a lot of page splitting is going under the hood. So, we can decrease the fill factor by leaving empty spaces on the pages which will ultimately lead to decrease fragmentation. Also, don’t forget to choose the right data types and specify default when you can because these will also lower the fragmentation.

此外，不要忘记部署三个重要的东西以减少碎片。尝试为SQL Server索引指定填充因子。如果我们有一个经常被击中的表，并且除此之外还有很高的碎片，那么这意味着很多页面拆分正在进行。因此，我们可以通过在页面上保留空白空间来减少填充因子，从而最终减少碎片。另外，不要忘记选择正确的数据类型并在可能的情况下指定默认数据类型，因为它们也会减少碎片。

结论 (Conclusion)

I would wrap things up with this and recommend a few other articles for reading that I wrote on indexing subject with the goal of boosting the performance of SQL Server. If you’re interested, go ahead and read the following articles:

我将对此进行总结，并推荐一些其他有关阅读的文章，这些文章是我为建立索引主题而写的，目的是提高SQL Server的性能。如果您有兴趣，请继续阅读以下文章：

I hope this article on optimizing SQL Server index strategies has been informative and I thank you for reading.

我希望这篇关于优化SQL Server索引策略的文章能为您提供有益的信息，并感谢您的阅读。