sql 闩锁 原因_如何识别和解决SQL Server中的热闩锁

sql 闩锁 原因

描述 (Description)

In SQL Server, internal latch architecture protects memory during SQL operations. It ensures the consistency of memory structures with read and write operation on pages. Rudimentarily, it has two classes, buffer latches, and non-buffer latches which perform lightweight synchronization in the SQL Engine.

在SQL Server中,内部闩锁体系结构可在SQL操作期间保护内存。 通过页面上的读写操作,可以确保内存结构的一致性。 简单地说,它具有两个类,即缓冲区锁存器和非缓冲区锁存器,它们在SQL Engine中执行轻量级同步。

The latch ensures memory consistency while Locks ensures logical transaction consistency. When multiple users or applications access the same data at the same time, locking prevents them from making simultaneous changes to the data. Locks are managed internally by the Microsoft SQL Server Compact Database Engine. While a user performs DML operations, locks are automatically acquired and released on resources. The latch ensures memory consistency on memory structures including indexes and data pages. Generally, SQL server uses buffer pool and IO latches to deal with synchronizing primitive manner in-memory structures. When there is a multiple thread concurrency load on the server then a latch result from an attempt to acquire an incompatible memory structure and in doing so, a latch contention issue can arise.

闩锁确保内存一致性,而锁确保逻辑事务一致性。 当多个用户或应用程序同时访问同一数据时,锁定会阻止他们同时更改数据。 锁由Microsoft SQL Server Compact数据库引擎在内部进行管理。 用户执行DML操作时,将自动获取锁并在资源上释放锁。 锁存器可确保包括索引和数据页在内的存储器结构上的存储器一致性。 通常,SQL Server使用缓冲池和IO锁存器来处理同步原始方式的内存结构。 当服务器上有多线程并发负载时,由于尝试获取不兼容的内存结构而导致闩锁 ,并且这样做会导致闩锁争用问题。

There are many types of SQL Server latches including buffer, non-buffer, and IO latches. For more clarification about the latch, kindly check out the article All about Latches in SQL Server by Nikola Dimitrijevic. Moreover, I will discuss in detail more about Hot latches, how to identify and resolve them.

SQL Server闩锁的类型很多,包括缓冲区,非缓冲区和IO闩锁。 有关闩锁的更多说明,请查阅Nikola Dimitrijevic撰写的文章“ SQL Server中所有闩锁”。 此外,我将详细讨论热闩锁 ,以及如何识别和解决它们。

锁存模式和兼容性 (Latch modes and compatibility)

Basically, as we know, latches are acquired in 5 different modes KP (Keep latch), SH (Shared latch), UP (Update latch), EX (Exclusive latch), DT (Destroy latch). I have summarized the latch modes and their compatibility.

众所周知,基本上,锁存器是在5种不同的模式下获取的:KP(保留锁存器),SH(共享锁存器),UP(更新锁存器),EX(排他锁存器),DT(销毁锁存器)。 我总结了闩锁模式及其兼容性。

KP (Keep latch)
Keep latches ensure that the referenced structure cannot be destroyed.

KP(保持闩锁)
保持闩锁可确保引用的结构不会被破坏。

SH (Shared latch)
A Shared latch is required to read a page data structure. A Shared latch (SH) is compatible with an update (UP) or keep (KP) latch, but incompatible with a destroy latch (DT).

SH(共享闩锁)
需要共享锁存器才能读取页面数据结构。 共享锁存器(SH)与更新(UP)或保持(KP)锁存器兼容,但与销毁锁存器(DT)不兼容。

UP (Update latch)
Update latch is compatible with Keep latch and shared latch but no one can allow to write to its reference structure.

UP(更新闩锁)
更新锁存器与Keep锁存器和共享锁存器兼容,但是没有人可以写入其引用结构。

EX (Exclusive latch)
This latch blocks other threads from waiting or reading from a reference area.

EX(专用闩锁)
该锁存器阻止其他线程等待或从参考区域读取。

DT (Destroy latch)
This latch assigned to the content of the referenced structure before destroying the content.

DT销毁闩锁
在销毁内容之前,此锁存器已分配给引用结构的内容。

All of these latch modes are not compatible with each other. For example, when a thread attempts to acquire a possible latch and the mode is not compatible, then it is placed into the queue to wait for resource availability. For more clarification, kindly review latch mode compatibility chart below.

所有这些锁存模式都不相互兼容。 例如,当线程尝试获取可能的闩锁且模式不兼容时,则将其放入队列中以等待资源可用性。 欲了解更多信息,请查看下面的闩锁模式兼容性表。

闩锁等待类型 (Latch wait types)

With a concurrency load, due to latch mode incompatibilities, page contention can arise. We can figure out these contention issues with the help of wait types which is reported from different SQL Server DMVs; sys.dm_os_wait_stats, sys.dm_os_latch_stats, sys.dm_exec_query_stats, etc.

对于并发负载,由于锁存模式不兼容,可能会导致页面争用。 我们可以借助不同SQL Server DMV报告的等待类型来找出这些争用问题。 sys.dm_os_wait_stats,sys.dm_os_latch_stats,sys.dm_exec_query_stats等。

  • PAGELATCH_*. (For an example PAGELATCH_ *报告缓冲区锁存器(BUF)。 (例如PAGELATCH_EX, PAGELATCH_SH) PAGELATCH_EX,PAGELATCH_SH
  • LATCH_*. (For an example LATCH_ *报告 。 (例如LATCH_UP, LATCH_EX, LATCH_SH, LATCH_DT) LATCH_UP,LATCH_EX,LATCH_SH,LATCH_DT的示例
  • PAGEIOLATCH_*. (For an example PAGEIOLATCH_ *。 (例如PAGEIOLATCH_SH, PAGEIOLATCH_EX) PAGEIOLATCH_SH,PAGEIOLATCH_EX

热锁(PAGELATCH_EX) (Hot Latches (PAGELATCH_EX))

As per the above-mentioned overview, there are the different wait types that arise due to these latch contentions. Out of these waits, I have focused on the wait type PAGELATCH_EX.

根据上述概述,由于这些锁存器争用而产生了不同的等待类型。 在这些等待中,我将重点放在等待类型PAGELATCH_EX上。

This wait type means that when a thread is waiting for access to a data file page in memory because it might be page structure in exclusive mode due to another running process. The structure would be a primary page from table or index.

这个 等待类型是指线程正在等待访问内存中的数据文件页面时,因为它可能是由于另一个正在运行的进程而处于排他模式的页面结构。 该结构将是表或索引的主页面。

Usually, when the concurrency request frequency is made higher on the server with insert operations, those multiple requests will be waiting on the same resource with a PAGELATCH_EX wait type on the index page. This occurrence is also called a “hot latches” issue or a “hot spot”. This type of latch contention also possible with an update or delete operation while a concurrency load.

通常,当通过插入操作在服务器上提高并发请求频率时,这些多个请求将在索引页上使用PAGELATCH_EX等待类型在同一资源上等待。 这种情况也称为“热点问题”或“热点” 。 这种类型的闩锁争用在并发加载时通过更新或删除操作也是可能的。

For this contention, it can be possible, that the issue has been generated due to sequential leading index keys. Generally, Indexing is the backbone of the database engine but aftereffect the contention can arise. To be more specific, if a table has a clustered index, it organizes data in a sorted manner while inserting the data. Though it’s adding a record at the end of the clustered index it could be more intuitive that the issue is page split occurrences. But the insertion requests queue is generated on the last page, and in addition to this, we may also add an identity column to cluster index, then it can lead to additional performance problems while the concurrent insertion frequency is higher on the single object. On concurrency insertion, how do these requests pile up? How does this latch contention PAGELATCH_EX come up in the picture? For more clarification, kindly review the diagram below.

对于此争用,可能是由于顺序的前导索引键导致了问题的产生。 通常,建立索引是数据库引擎的基础,但可能会产生争执。 更具体地说,如果表具有聚簇索引,则在插入数据时会以排序的方式组织数据。 尽管它在聚簇索引的末尾添加了一条记录,但更直观地说问题出在页面拆分发生上。 但是插入请求队列是在最后一页上生成的,此外,我们还可以在集群索引中添加一个身份列,这会导致其他性能问题,同时单个对象的并发插入频率更高。 在并发插入时,这些请求如何堆积? 图片中如何显示此闩锁争用PAGELATCH_EX? 如需更多说明,请查看下图。

As per the diagram, there are multiple requests to insert data into a single table which has a clustered index hence those are waiting on the last page because this insert statement is performed serially performed due to physical order, and as a result of this latch contention arises and results in excessive occurrences of the wait type PAGELATCH_EX. However, this contention refers to the last page insert contention issue.

如图所示,存在多个将数据插入具有聚集索引的单个表中的请求,因此这些请求正在最后一页上等待,因为此插入语句是由于物理顺序而串行执行的,并且是这种闩锁争用的结果出现并导致过多出现等待类型PAGELATCH_EX。 但是,此争用是指最后一页插入争用问题。

确定热闩锁争用问题 (Identifying the Hot latches contention issue)

For more clarification, I will recreate this scenario on my local server to do so, I have prepared sample script. In the script, I have introduced one base table and procedure and a database, Hotspot.

为了进一步说明,我将在本地服务器上重新创建此方案,为此,我准备了示例脚本。 在脚本中,我介绍了一个基本表和过程以及一个数据库Hotspot。

Sample Script

样例脚本

 
CREATE DATABASE Hotspot
GO
Use Hotspot
GO
CREATE TABLE audit_Data
(
	audit_id int primary key identity (1,1),
	audit_action nvarchar(max),
	audit_desc nvarchar(max),
	ref_person_action int,
	actionlist xml,
	actionarea xml,
	dtauditDate datetime default getdate ()
)
GO
CREATE PROCEDURE audit_history
(
  @audit_action  nvarchar(MAX),
  @audit_desc nvarchar(max),
  @ref_person_action int,
  @actionlist xml,
  @actionarea xml
)
as
begin
 
insert into audit_Data
select @audit_action,@audit_desc,@ref_person_action,@actionlist,@actionarea,GETUTCDATE()
 
select SCOPE_IDENTITY ()
 
end	
 
GO
 

I have applied above-mentioned script in my test server, now I have the database “Hotspot” ready for execution.

我已经在测试服务器中应用了上述脚本,现在已经可以执行数据库“ Hotspot”了。

Apply load test in local server

在本地服务器上应用负载测试

There are multiple tools and utility available for the load test in the testing server; I have used Adam Machanic’s SQLQuerystress tool.

测试服务器中有多种工具和实用程序可用于负载测试。 我使用了亚当· 马汉尼( Adam Machanic)的SQLQuerystress工具。

During this testing, It is necessary to get query statistics for troubleshooting this issue. For the purpose of demonstration, I have prepared scripts for getting same which is I mentioned below, which you can review.

在此测试过程中,有必要获取查询统计信息以解决此问题。 为了演示的目的,我准备了以下脚本来获取相同的脚本,您可以查看这些脚本。

Query for getting wait resource vs T-SQL execution

查询获取等待资源与T-SQL执行

I have used multiple SQL DMVs and based on that I have prepared a query for getting the wait statistics. Alternatively, we can also use a tool like a Profiler, Activity monitor, etc.

我使用了多个SQL DMV,并在此基础上准备了用于获取等待统计信息的查询。 另外,我们也可以使用分析器,活动监视器等工具。

 
use Hotspot
go
SELECT
wt.wait_duration_ms,
wt.wait_type,
s_text.text,
DB_NAME(req.database_id) DatabaseName,
req.wait_resource,
session.login_name,
req.last_wait_type
FROM sys.dm_os_waiting_tasks wt
INNER JOIN sys.dm_exec_requests req ON wt.session_id = req.session_id
INNER JOIN sys.dm_exec_sessions session ON session.session_id = req.session_id CROSS JOIN sys.dm_exec_sql_text (req.sql_handle) s_text
CROSS APPLY sys.dm_exec_query_plan (req.plan_handle) qp
WHERE session.is_user_process = 1
GO
 

Query for measuring statistics of the transaction, latch wait, batch requests etc.

查询事务的统计度量,闩锁等待,批处理请求等。

In addition, I have prepared this query for getting latches, batch requests in detail with respect to transaction vs time. I have configured 30-second delay by default in this query. Alternatively, we can also use performance monitor.

另外,我已经准备好此查询以获取锁存器,有关事务与时间的详细批处理请求。 我已在此查询中默认配置30秒延迟。 或者,我们也可以使用性能监视器。

 
use Hotspot
go
select object_name,counter_name,cntr_value into #perfcounter
From sys.dm_os_performance_counters
where counter_name in
(    'Transactions/sec',
    'Latch Waits/sec',
    'Average Latch Wait Time (ms)',
    'Batch Requests/sec')
and ( ltrim(rtrim(instance_name)) = 'HotSpot'
OR instance_name = '')
 
WAITFOR DELAY '00:00:30';
 
select counters.object_name,counters.counter_name As ActionName,counters.cntr_value - perf.cntr_value ActionValue
From #perfcounter perf
inner join sys.dm_os_performance_counters counters on counters.counter_name = perf.counter_name
where counters.counter_name in
(    'Transactions/sec',
    'Latch Waits/sec',
    'Average Latch Wait Time (ms)',
    'Batch Requests/sec')
and ( ltrim(rtrim(instance_name)) = 'HotSpot'
OR instance_name = '')
 
DROP TABLE #perfcounter
GO
 

Now I have the availability for monitoring queries as well as SQLQueryStress tool. Now I will go for executing both things in parallel.

现在,我可以监视查询以及使用SQLQueryStress工具。 现在,我将并行执行这两件事。

  • SQLQueryStress then applying procedure load 100 Thread * 1000 Iterations. SQLQueryStress,然后应用过程负载100线程* 1000迭代。
  • In Parallel, I have run both monitoring queries in SSMS.

    在Parallel中,我在SSMS中都运行了两个监视查询。

SQLQueryStress Load test result with 100 thread * 1000 iterations

SQLQueryStress使用100个线程* 1000次迭代加载测试结果

Monitoring query result in SSMS

在SSMS中监视查询结果

As per mentioned above monitoring queries, I am going to execute in two different sessions.

按照上面提到的监视查询,我将在两个不同的会话中执行。

As can be seen, as a result of this, the total process has been completed in 40 seconds and multiple instances of the wait type PAGELATCH_EX have been found in the monitoring query result.

可以看出,作为结果,整个过程已在40秒内完成,并且在监视查询结果中找到了多个等待类型PAGELATCH_EX的实例。

如何解决热闩锁争用 (How to resolve hot latches contention)

It might be possible to reduce the contention if I remove the clustered index but this might not be ideal. We have multiple ways to distribute insertion across the index range; horizontal partition techniques, using a hash value in the leading unique key column, etc.

如果删除聚簇索引,可能可以减少争用,但这可能并不理想。 我们有多种方法可以在整个索引范围内分配插入; 水平分区技术,使用前导唯一键列中的哈希值等。

Using hash value in the leading unique key column

在前导唯一键列中使用哈希值

A hash value means, a dynamically generated key value. If I use hash value in the leading primary key column, the unique key value is distributed with audit_id across the B-Tree structure. Because of this, I have changed my table structure. For demonstration purpose, I will create a table and a procedure with a different name which is I attached the script below.

哈希值是指动态生成的键值。 如果我在前导主键列中使用哈希值,则唯一键值将与audit_id一起分布在B树结构中。 因此,我更改了表结构。 出于演示目的,我将创建一个表和一个具有不同名称的过程,以下是我附加的脚本。

 
Use Hotspot
GO
CREATE TABLE audit_Data_with_Hashing
(
	audit_id int identity (1,1) NOT NULL,
	audit_action nvarchar(max),
	audit_desc nvarchar(max),
	ref_person_action int,
	actionlist xml,
	actionarea xml,
	dtauditDate datetime default getdate (),
	HashValue as (CONVERT([INT], abs([audit_id])%(30))) PERSISTED NOT NULL 
)
GO
 
ALTER TABLE audit_Data_with_Hashing 
ADD CONSTRAINT pk_hashvalue 
PRIMARY KEY CLUSTERED (HashValue, audit_id) 
 
GO
CREATE PROCEDURE audit_history_with_Hashing
(
  @audit_action  nvarchar(MAX),
  @audit_desc nvarchar(max),
  @ref_person_action int,
  @actionlist xml,
  @actionarea xml
)
as
begin
 
insert into audit_Data_with_Hashing
select @audit_action,@audit_desc,@ref_person_action,@actionlist,@actionarea,GETUTCDATE()
 
select SCOPE_IDENTITY ()
 
end	
 
GO
 

I have applied above-mentioned script in my test server now I am going to apply the same load test as previous and capture statistics again.

我已经在测试服务器中应用了上述脚本,现在我将应用与以前相同的负载测试并再次捕获统计信息。

SQLQueryStress load test result with 100 thread * 1000 iterations

具有100个线程* 1000次迭代SQLQueryStress负载测试结果

I have re run SQLQueryStress tool and execute as follows.

我已经重新运行SQLQueryStress工具并执行如下。

Monitoring query result in SSMS

在SSMS中监视查询结果

As per the previously mentioned monitoring queries, I am going to execute in two different sessions.

根据前面提到的监视查询,我将在两个不同的会话中执行。

Now, I have got load test results in 30 seconds and got latch waits/sec of 484 instead of 196,746 and average latch wait time(MS) count of 206,170 instead of 1,421,675. As a result of query statistics, the Pagelatch_EX waits have been reduced. As per my product use case, I have configured this approach. It could be possible different effects for this approach. Later on, I will describe pros and cons of this.

现在,我在30秒内获得了负载测试结果,锁存等待/秒为484,而不是196,746 ,平均锁存等待时间(MS)数为206,170,而不是1,421,675 。 作为查询统计信息的结果, Pagelatch_EX等待已减少。 根据我的产品用例,我已经配置了这种方法。 这种方法可能会有不同的效果。 稍后,我将描述其优缺点。

Partition techniques

分区技术

When a table has millions of rows, then the table cost comes up in the picture vis a vis DML operations. In this behavior, structure level changes are needful like vertical and horizontal table level partitions. There are several trade-offs between these. Horizontal table partition could be integrated easily We can also apply horizontal partitions on a computed column, which is practically the same functionality with just minor differences as using leading unique indexes. When the insertion operation is performed, this is still going on the end of this logical range, but hash values produce dynamic values and its split across the B-tree structure. To do so, it might be possible, due to this frequency insertion contention issue can be solved using a computed column.

当一个表具有数百万行时,相对于DML操作,表成本将在图片中显示。 在这种情况下,需要进行结构级别更改,例如垂直和水平表级别分区。 这些之间需要权衡取舍。 水平表分区可以轻松集成。我们还可以在计算列上应用水平分区,该功能几乎与使用领先的唯一索引一样具有相同的功能。 执行插入操作时,此操作仍在此逻辑范围的末尾,但是哈希值会产生动态值,并且会在B树结构中进行拆分。 这样做,由于此频率插入争用问题可以使用计算列来解决,因此是可能的。

结论 (Conclusion)

I have concluded points and mentioned trade-off below.

我已经总结了要点,并在下面提到了权衡问题。

Pros

优点

  • Using it, Insertion will be performed in a non-sequential manner and provide a benefit against the frequency of latch contention issue.

    使用它,插入将以非顺序​​的方式执行,并且可以防止闩锁争用问题的发生。
  • The table partition feature is very useful for managing large volume data;

    表分区功能对于管理大量数据非常有用。

Cons

缺点

  • Index key length is bigger than normal. Due to this index size difference and page traversing cost, fragmentation can become an issue.

    索引键长度大于正常值。 由于此索引大小差异和页面遍历成本,碎片化可能成为一个问题。
  • Random insertion operations might generate page split operations.

    随机插入操作可能会生成分页操作。
  • Get data action like select queries might produce issues while retrieving the data from hash partitions because the query plan estimates might be inaccurate.

    诸如选择查询之类的获取数据操作可能会在从哈希分区中检索数据时产生问题,因为查询计划的估计可能不准确。
  • It’s difficult to maintain reference integrity while increasing key combination of indexes.

    增加索引的键组合时,很难保持引用完整性。

翻译自: https://www.sqlshack.com/how-to-identify-and-resolve-hot-latches-in-sql-server/

sql 闩锁 原因

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值