sql server锁_关于SQL Server自旋锁的全部

最新推荐文章于 2022-05-07 23:31:30 发布

culuo4781

最新推荐文章于 2022-05-07 23:31:30 发布

阅读量388

点赞数

文章标签：队列数据库 python 多线程 java

原文链接：https://www.sqlshack.com/sql-server-spinlocks/

版权

sql server锁

As discussed in the article All about latches in SQL Server, spinlocks are also a special form of locks that SQL Server uses to protect data. Spinlocks are lightweight objects used by the SQL Server Operating System (SQLOS) to protect data structure access. To understand, properly, the difference between latches and spinlocks and why the spinlock is an important SQL Server’ object type, it is vital to understand the cycle of query execution in SQL Server

如文章“关于SQL Server中的所有闩锁”所述，自旋锁也是SQL Server用于保护数据的一种特殊形式的锁。自旋锁是SQL Server操作系统（SQLOS）用来保护数据结构访问的轻量级对象。要正确理解闩锁和自旋锁之间的区别以及自旋锁为何是重要SQL Server对象类型，了解SQL Server中查询执行的周期至关重要

The SQL Server Operating System (SQLOS) is solely responsible for SQL Server’s thread scheduling and it is in no way dependent on the Windows operating system scheduler. In modern multi CPU environments, SQLOS will allocate a scheduler to each CPU core for the purpose of thread management. Therefore, all requests issued by users will be run via the SQLOS scheduler. This means, in practical terms, that when a 4 octa-core processors machine hosts SQL Server, the SQLOS will form 32 schedulers. Below is the graphical presentation of a query execution in SQL Server and how the scheduler is working

SQL Server操作系统（SQLOS）仅负责SQL Server的线程计划，并且绝不依赖于Windows操作系统计划程序。在现代的多CPU环境中，出于线程管理的目的，SQLOS将为每个CPU内核分配一个调度程序。因此，用户发出的所有请求都将通过SQLOS调度程序运行。实际上，这意味着当4个八核处理器的计算机托管SQL Server时，SQLOS将形成32个调度程序。以下是SQL Server中查询执行的图形表示以及调度程序的工作方式

The SQLOS scheduler (popularly called SOS Scheduler) is made of three components:

SQLOS调度程序（通常称为SOS调度程序）由三个组件组成：

Processor – the physical/logical CPU core responsible for processing the thread. It can process only one thread at a time

处理器 –负责处理线程的物理/逻辑CPU内核。一次只能处理一个线程

Runnable queue – it utilizes a stringent First-In-First-Out (FIFO) order of threads that have to be processed. A thread that is in transition from the processor/waiter list into a runnable queue will be positioned last in the queue, and it must wait for all threads previously moved into the runnable queue (when the resource governor is enabled the FIFO order can be overridden as with enabled governor, resource pool priorities can be assigned differently to a workload groups. Use of the resource governor is generally rare)

可运行队列 –它利用必须处理的严格的先进先出（FIFO）顺序的线程。从处理器/等待者列表转换为可运行队列的线程将位于队列的最后，并且它必须等待以前移入可运行队列的所有线程（启用资源调控器时，可以覆盖FIFO顺序）与启用的调控器一样，可以将资源池优先级不同地分配给工作负载组。通常很少使用资源调控器。

Waiter list – all the threads that are in a suspended state are stored in the waiter list. Suspended threads are the one that must wait for a resource to become available to them. There are no limitations imposed to a waiter list regarding the time that thread can be contained and there are no options to limit the time a thread can spend in the waiter list. However, when the timeout is specified for the query execution session, it takes precedence and the thread will be canceled as a result of the execution timeout

服务员列表 –所有处于挂起状态的线程都存储在服务员列表中。挂起线程是必须等待资源变为可用状态的线程。对于等待者列表，可以包含线程的时间没有任何限制，也没有任何选项可以限制线程可以在等待者列表中花费的时间。但是，当为查询执行会话指定超时时，它具有优先权，并且由于执行超时，线程将被取消。

Each thread run by the scheduler can be in one of the three states:

调度程序运行的每个线程可以处于以下三种状态之一：

RUNNING – a thread is executing on the processor physical or logical core. One thread at a time can be executed on a single core

正在运行 –线程正在处理器的物理或逻辑内核上执行。一次可以在一个内核上执行一个线程

SUSPENDED – a thread requests a resource that is not available, so it is pushed into the waiter list. A thread stays in the waiter list as long as the requested resource became available

已暂停 -线程请求的资源不可用，因此将其推送到服务员列表中。只要请求的资源可用，线程就会保留在服务员列表中

RUNNABLE – a thread that requests a resource that is available, but moved into the runnable queue to wait for the CPU to became available

RUNNABLE –一个线程，该线程请求可用资源，但已移入可运行队列以等待CPU变得可用

There are two scenarios how scheduler can work:

调度程序的工作方式有两种：

A thread runs through three states to complete the request issued by the user
线程通过三种状态运行以完成用户发出的请求
A thread runs through two states to complete the request issued by the user
线程通过两种状态运行以完成用户发出的请求

Since only the first scenario is within the context of this article, the second scenario will not be explained here

由于只有第一种情况在本文的上下文内，因此在此将不解释第二种情况

In the image above, the running thread, 5, tries to access a needed resource, which is not available at the moment of request. Therefore, the thread, 5, state changes to suspended and it is moved to the wait list. The thread, 5, will remain suspended until the needed resource become available and there is no time limitation for how long a thread stays in the waiter list (the number of threads in waiter list is not limited as well). The moment when the required resource becomes available, thread 5 will be pushed to a runnable queue to wait for the processor to become available. Once the processor becomes available, the thread 5 will complete the request (in case that it requires another unavailable resource that it has to wait for, the sequence will be repeated)

在上图中，正在运行的线程5尝试访问所需资源，该资源在请求时不可用。因此，线程5的状态变为暂停状态，并移至等待列表。线程5将保持挂起状态，直到所需的资源可用为止，并且线程在等待者列表中停留的时间没有时间限制（也没有限制等待者列表中的线程数）。当所需资源可用时，线程5将被推送到可运行队列，以等待处理器可用。一旦处理器可用，线程5将完成请求（如果它需要另一个必须等待的不可用资源，则将重复该顺序）

什么是SQL Server自旋锁 (What is the SQL Server spinlock)

So now that we have explained the query execution process, let’s see what the difference is between latches and spinlocks and why the spinlock was introduced in SQL Server, in the first place. As it can be seen in the article, All about Latches in SQL Server. it would be overkill for SQLOS to use latches for every shared data structure in order to synchronize the multiple threads access to a shared data structure. When a thread tries to acquire a latch over a shared data structure where another thread has already acquired a latch that is not compatible, the thread will have to wait. It enters the suspend state and is moved in the waiter list. The thread will remain in the suspended state until it is signaled that it can acquire the latch, and then it will be moved to the runnable queue to wait for the CPU resource to become available and it will remain there until the CPU became available for thread execution. As soon as the CPU is available, the thread goes in the running state again and it can now acquire the latch on the shared data structure

因此，既然我们已经解释了查询执行过程，那么让我们首先了解一下闩锁和自旋锁之间的区别以及为什么在SQL Server中引入了自旋锁。从文章“ SQL Server中的所有有关闩锁”可以看出。对于SQLOS而言，为每个共享数据结构使用闩锁以使多个线程对共享数据结构的访问同步将是过大的。当一个线程试图获取共享数据结构上的闩锁时，其中另一个线程已经获取了不兼容的闩锁，则该线程将不得不等待。它进入挂起状态，并在等待者列表中移动。线程将一直处于挂起状态，直到发信号通知它可以获取闩锁为止，然后将其移至可运行队列以等待CPU资源变得可用，并且它将一直保持在那里，直到CPU对线程可用为止执行。一旦CPU可用，线程将再次进入运行状态，并且它现在可以获取共享数据结构上的闩锁

As it can be seen, the latch could be a way for synchronizing highly active data structures. To reduce the overhead that latches could impose over the busy data structures, SQL Server has implemented a spinlock as a different form of the synchronization object. While similar to a latch, there is one important difference – the wait for the spinlock is “active” one as the thread will not leave the running state while waiting, which practically means that spinlocks always occur on the CPU.

可以看出，锁存器可能是一种用于同步高度活跃的数据结构的方法。为了减少闩锁可能会给繁忙的数据结构带来的开销，SQL Server已将自旋锁实现为另一种形式的同步对象。尽管类似于闩锁，但有一个重要的区别–等待自旋锁是“活动的”，因为线程在等待时不会离开运行状态，这实际上意味着自旋锁始终在CPU上发生。

Spinlocks are used by SQLOS when thread access time, to a commonly accessed data structure, is expected to be very short. If a thread that tries to acquire a spinlock cannot get the access to a target data structure, it executes periodically while in the running state, in the so-called “spin loops” to check for the resource availability, instead of yielding immediately into a suspend state.

当预期对通常访问的数据结构的线程访问时间非常短时，SQLOS将使用自旋锁。如果试图获取自旋锁的线程无法访问目标数据结构，则它将在运行状态下定期执行，即所谓的“自旋循环”，以检查资源可用性，而不是立即屈服于暂停状态。

A spinlocks’ advantage over latches is that a thread that acquired the SQL Server spinlock will be executed in a loop on a CPU in a running state, while the thread that unsuccessfully tries to acquire a latch will yield immediately in a suspend state. This is an expensive operation as it requires switching the context of a thread to go off the CPU to allow another thread to enter the running state, especially for data resources that are frequently seized for a very short period of time. Conversely, a spinlock waists CPU cycles while waiting, that could be used by the other thread

自旋锁相对于锁闩的优势在于，获取SQL Server自旋锁的线程将在运行状态下在CPU上的循环中执行，而未成功尝试获取锁存器的线程将立即处于挂起状态。这是一项昂贵的操作，因为它需要切换线程的上下文以离开CPU，以允许另一个线程进入运行状态，特别是对于那些在很短的时间内频繁占用的数据资源。相反，在等待时，自旋锁会使CPU周期紧张，这可能被其他线程使用

To explain the mechanism behind SQL Server spinlocks in a simple way, let’s assume that there is a request for the frequently used resource, but used for a very short time. Since this is the resource that is used often and for a very short period of time, when a thread tries to access that resource and cannot gain the access immediately it will be in the so-called “collision” state. In such a situation, the SQLOS will assume that the chances that resource will become available, if the thread waits just a tiny bit longer, are high. In such situations, the SQLOS assumes that the time needed for the resource to become available is much shorter than the time needed for the thread to voluntarily yield to a suspended state and go back to a running state. So, the thread will stay in the running state and it will keep repeating the question to SQLOS “Can I get CPU?… Can I get CPU?” and so on. In most cases, the CPU should become quickly available. A case where thread have to “spin” on CPU longer than expected, will be explained in the rest of the text as well

为了以一种简单的方式解释SQL Server自旋锁背后的机制，我们假设存在对频繁使用的资源的请求，但是使用时间很短。由于这是经常使用且在很短时间内使用的资源，因此当线程尝试访问该资源并且无法立即获得访问权限时，它将处于所谓的“冲突”状态。在这种情况下，如果线程等待的时间稍长一点，则SQLOS会假定资源变得可用的机会很高。在这种情况下，SQLOS假定资源变得可用所需的时间比线程自愿屈服到挂起状态并返回运行状态所需的时间短得多。因此，线程将保持运行状态，并将继续向SQLOS重复该问题：“我可以获得CPU吗？……我可以获得CPU吗？” 等等。在大多数情况下，CPU应该很快可用。在本文的其余部分还将说明线程必须在CPU上“旋转”的时间比预期长的情况。

SQL Server is using a lot of different spinlocks and the number of spinlocks increases with each new SQL Server version. To check the spinlock counters there are two commands that could be used:

SQL Server使用许多不同的自旋锁，并且随着每个新SQL Server版本的增加，自旋锁的数量也会增加。要检查自旋锁计数器，可以使用两个命令：

DBCC SQLPERF(SPINLOCKSTATS)

DBCC SQLPERF（SPINLOCKSTATS）

In the latest SQL Server 2017 RC2 version, the number of spinlocks that will be displayed with this command is 352 vs. 324 in SQL Server 2016

在最新SQL Server 2017 RC2版本中，此命令将显示的自旋锁数量是352对SQL Server 2016中为324
sys.dm_os_spinlock_stats

sys.dm_os_spinlock_stats

The important metrics related to spinlocks are collisions and spins_per_collision. The DMV provides both metrics in the columns with the same name in this DMV. Of course, the name column is equally important as it displays the spinlock type. So, the above query can be used most time in the following form:

与自旋锁相关的重要指标是冲撞和spins_per_collision 。 DMV在此DMV中以相同名称在列中提供两个指标。当然，名称列同样重要，因为它显示了自旋锁类型。因此，以上查询可以大多数时间以以下形式使用：
```
 
SELECT name, collisions, spins_per_collision
   FROM sys.dm_os_spinlock_stats
   ORDER BY spins_per_collision DESC;
 
```

The details about columns that this DMV contains:

此DMV包含的有关列的详细信息：

name: the spinlock name 名称：自旋锁名称
collisions: shows how many times a thread was blocked by a spinlock when attempting to access a protected resource 冲突：显示尝试访问受保护资源时线程被自旋锁阻止的次数
spins: shows how many times a thread loop spun while trying to acquire spinlock on resource spins ：显示在尝试获取资源上的自旋锁时线程循环旋转了多少次
spins_per_collision: shows the spins/collisions ratio spins_per_collision ：显示旋转/碰撞比率
sleep_time: shows the sleep time of a thread as a consequence of the backoff mechanism sleep_time ：显示由于退避机制而导致的线程的睡眠时间
backoffs: shows how many times the thread was backed off to grant CPU resource use to other threads
backoffs ：显示线程被退回多少次以将CPU资源分配给其他线程使用

Unfortunately, none of these commands are documented by Microsoft.

不幸的是，Microsoft没有记录这些命令。

退避机制 (Backoff mechanism)

To prevent a thread from waiting for too many CPU cycles while attempting to acquire a spinlock, after some time a thread waiting on a SQL Server spinlock will assume a “sleep” state while remaining in the running state, even it hasn’t managed to obtain the resource. In that way, it will allow to another thread to be executed on the same CPU. This “sleep” state is known as a backoff SQLOS mechanism. By default, SQLOS will spin for a constant period of time first before it switches the thread to the sleep state by performing a backoff. Trying to acquire the spinlock again considers that the cash concurrency state is maintained all the time and this is the CPU intensive operation. To avoid that, attempts to acquire a spinlock are performed in a restricted manner and therefore not performed on each thread spin

为了防止线程在尝试获取自旋锁时等待过多的CPU周期，一段时间后，等待SQL Server自旋锁的线程将处于“睡眠”状态，同时保持运行状态，即使它没有设法获取资源。这样，它将允许另一个线程在同一CPU上执行。这种“睡眠”状态称为回退 SQLOS机制。默认情况下，SQLOS将先执行恒定的自旋时间，然后再通过执行退避将线程切换到睡眠状态。再次尝试获取自旋锁认为现金并发状态始终保持不变，这是CPU密集型操作。为避免这种情况，尝试以受限方式执行自旋锁，因此不会在每次线程自旋时执行

Starting with SQL Server 2008R2, for some spinlock types, an improved backoff mechanism was implemented that allows an interval between two attempts, by a thread, to acquire the spinlock to be increased in an exponential manner, thus reducing in most cases, the influence on CPU performance. In such situations, SQLOS will intelligently presume that in situation where thread cannot acquire spinlock after he resumed from the sleep state, the next sleep state should be exponentially longer as there is a less chance now that the thread will manage to acquire spinlock if the sleep state time remains the same

从SQL Server 2008R2开始，对于某些自旋锁类型，实施了一种改进的退避机制，该机制允许线程两次尝试获取自旋锁的时间间隔以指数方式增加，从而在大多数情况下减少了对自旋锁的影响。 CPU性能。在这种情况下，SQLOS会智能地假定在线程从睡眠状态恢复后无法获取自旋锁的情况下，下一个睡眠状态应成倍增长，因为现在如果睡眠，线程将设法获取自旋锁的可能性较小状态时间保持不变

The sqlos.spinlock_backoff extended event can be also used for analysis as this event is raised each time when the backoff happens.

sqlos.spinlock_backoff扩展事件也可以用于分析，因为每次发生退避时都会引发此事件。

SQL Server自旋锁争用 (SQL Server spinlock contention)

The spinlock itself, as a little-documented SQL Server internal, is often overlooked as a potential or real cause of SQL Server performance issues. Not to mention that due to lack of official documentation, it is not easy to diagnose the spinlock as a root cause of SQL Server performance problems.

自旋锁本身作为SQL Server内部记录很少的文件，通常被视为SQL Server性能问题的潜在或真正原因。更不用说由于缺乏官方文档，将自旋锁诊断为SQL Server性能问题的根本原因并不容易。

So, let’s use a simplified version in order to explain the basics of spinlock contention

因此，让我们使用简化版本来解释自旋锁争用的基础

SQL Server uses scalability to maintain the system performance. Good SQL Server scalability means that the system will grow in relatively predictable and stable way with changes of its capacity in terms of resources that will be used. Since the SQL Server spinlock is CPU related, the following example will use CPU as an example of scalability.

SQL Server使用可伸缩性来维护系统性能。良好SQL Server可伸缩性意味着，随着使用资源的变化，系统将以相对可预测和稳定的方式增长。由于SQL Server自旋锁与CPU有关，因此以下示例将使用CPU作为可伸缩性示例。

Let’s take the situation where SQL Server has to process 2,000 threads per second, and in a system that consists of 12 CPUs, the CPU usage is at 10%. If the number of tasks that have to be processed doubles, to 4,000 threads per second, optimal scalability would mean that CPU usage goes up to 20-25%. In situations like this, it can be said that a system is “scalable”. As it can be seen, a “scalable” system means that resource usage change should be more or less proportional to the capacity imposed by the system. Doubling the capacity should ideally double the resource usage (in real-world scenario a bit more than expected use of resources is expected)

让我们来看一下SQL Server必须每秒处理2,000个线程的情况，并且在由12个CPU组成的系统中，CPU使用率为10％。如果必须处理的任务数量增加一倍，达到每秒4,000个线程，那么最佳的可伸缩性将意味着CPU使用率将提高20-25％。在这种情况下，可以说系统是“可扩展的”。可以看出，“可伸缩”系统意味着资源使用变化应与系统施加的容量或多或少成比例。理想情况下，将容量增加一倍应该使资源使用增加一倍（在实际情况下，比预期的资源使用要多一点）

But if there is a similar situation where CPU usage goes up to 70+% for example, then it is obvious that for some reason SQL Server scalability doesn’t work and it is an indication that system doesn’t have a proper scalability

但是，如果有类似的情况，例如CPU使用率上升到70％以上，那么很明显，由于某种原因，SQL Server可伸缩性无法正常工作，这表明系统没有适当的可伸缩性

A situation like this, where a proportional increase of transactions per second is followed by a disproportional increase of the CPU usage % and an excessive increase in spins and possibly backoff events number, is a typical scenario where spinlock contention could be the cause.

像这样的情况是典型的场景，其中自旋锁争用可能是这种情况，其中每秒事务按比例增加，然后CPU使用百分比成比例增加，旋转次数和可能的退避事件数量过度增加。

What happens, in such a situation, is that spinlock is “spinning” too long which takes up CPU and practically wastes the CPU time, as it will not much CPU left for other threads that are executing. It is obvious that SQLOS was not properly predicted the CPU usage by threads and the acquired spinlock spins too long instead of going into a sleep state.

在这种情况下，发生的旋转自旋锁“旋转”时间过长，这会占用CPU并实际上浪费了CPU时间，因为它不会为正在执行的其他线程留下太多的CPU。显然，SQLOS不能正确地预测线程的CPU使用率，并且获得的自旋锁旋转得太长，而不是进入睡眠状态。

As previously stated, this is the simplified scenario that serves to illustrate and explain the spinlock contention. The disproportional increase of the CPU usage with an increase of a number of threads per second doesn’t mean automatically that spinlock contention is in effect, and there is a lot of various causes that have to be evaluated and discarded before jumping into conclusion that spinlock contention is the root cause

如前所述，这是简化的场景，用于说明和解释自旋锁争用。 CPU使用率的不成比例的增加（每秒增加的线程数）并不意味着自旋锁争用有效，并且在得出自旋锁的结论之前，必须评估并丢弃许多原因。竞争是根本原因

LOCK_HASH自旋锁 (LOCK_HASH spinlock)

The most frequent situation where spinlock contention can occur is related to the LOCK_HASH spinlock type. When large LOCK_HASH spinlock values are followed by the large number of spins, collisions, and backoffs, then it is possible, but not certain, that spinlock contention is the root cause of the performance issues

自旋锁争用最频繁发生的情况与LOCK_HASH自旋锁类型有关。当较大的LOCK_HASH自旋锁值后跟大量的自旋，冲突和退避时，则可能（但不确定）自旋锁争用是性能问题的根本原因

The high number of LOCK_HASH spins could occur in situations where continuous and very frequent access to a lock manager hash table bucket is present. In such situations, it is highly possible that the excessive number of spins and backoffs will follow this spinlock. There are a couple of ways to mitigate LOCK_HASH spinlock contention:

在存在对锁管理器哈希表存储桶的连续且非常频繁的访问的情况下，可能发生大量LOCK_HASH旋转。在这种情况下，极有可能在此自旋锁之后出现过多的自旋和退避。有两种减轻LOCK_HASH自旋锁争用的方法：

Use the shortest possible transactions
使用最短的交易
Ensure that application is not using multiple sessions for updating the same data
确保应用程序没有使用多个会话来更新相同的数据
Turn off the page level locking
关闭页面级别锁定
Closely track lock escalation as the contention on the hash bucket will be higher, as the roughly gained is the SQL Server lock
密切跟踪锁升级，因为哈希桶上的争用会更高，因为大致获得的是SQL Server锁
Potentially use NOLOCK hint in queries in situations where more than one thread tries to read the same data at the same time
在多个线程尝试同时读取同一数据的情况下，可能在查询中使用NOLOCK提示

其他常见SQL Server自旋锁类型 (The other frequent SQL Server spinlock types)

LOGCACHE_ACCESS – this spinlock could be the bottleneck in situations when an OLTP based application is using SQL Server and it is not optimized to behave in accordance with whether the Hekaton in-memory engine is in use or not

LOGCACHE_ACCESS –当基于OLTP的应用程序正在使用SQL Server且未针对与是否使用Hekaton内存引擎进行优化而进行行为优化时，此自旋锁可能成为瓶颈

LOGFLUSHQ – this is the spinlock that a thread acquires when a commit operation takes place. The spinlock will be held all the way until the relevant thread logging info has been copied into the write queue

LOGFLUSHQ –这是发生提交操作时线程获得的自旋锁。自旋锁将一直保持到相关线程日志记录信息已复制到写入队列中为止

XDESMGR – this spinlock serves for protection of the database engine mechanism that is working with transaction ids. Using multiple DML statements per single transaction can decrease the pressure on this type of spinlock

XDESMGR –该自旋锁用于保护使用事务ID的数据库引擎机制。每笔交易使用多个DML语句可以减轻这种自旋锁的压力

硬件和SQL Server自旋锁 (Hardware and SQL Server spinlocks)

When designing the system, there are some tips that should be followed that can help minimize spinlock contention. When choosing the CPU for the system, if possible, select the CPU with the highest clock available as that will allow single threaded transactions to run faster. In systems with multiple CPUs, configuring the system out of the NUMA limits could cause increased latencies that will be caused by spinlock is being forced to move between the NUMA nodes using the quick path interconnect (QPI)

在设计系统时，应遵循一些提示，以帮助最小化自旋锁争用。在为系统选择CPU时，如果可能，请选择具有最高可用时钟的CPU，因为这将使单线程事务运行得更快。在具有多个CPU的系统中，将系统配置为超出NUMA限制可能会导致延迟增加，这是由于自旋锁被强制使用快速路径互连（QPI）在NUMA节点之间移动

翻译自: https://www.sqlshack.com/sql-server-spinlocks/

sql server锁

culuo4781

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
sql server锁_关于SQL Server自旋锁的全部

sql server 锁 As discussed in the article All about latches in SQL Server, spinlocks are also a special form of locks that SQL Server uses to protect data. Spinlocks are lightweight objects used by ...
复制链接

扫一扫