接着上一章:I/O
TempDB:
TempDB是一个全局数据库,存储内部和用户对象还有零食表、对象、在SQLServer操作过程中产生的存储过程等数据。在一个SQLServer实例中,只有一个TempDB。所以容易成为性能和磁盘空间瓶颈。TempDB可能因为空间可用程度和过量的DDL和DML操作而变得超负荷。这可能导致不相关的应用运行缓慢甚至失败。
常见的TempDB问题如下:
l TempDB空间超支。
l 因为TempDB的I/O瓶颈而导致查询缓慢。这可以查看前面的I/O瓶颈章节。
l 过渡的DDL操作导致系统表产生瓶颈。
l 资源分配争用。
在开始诊断TempDB问题之前,先看看TempDB的空间是如何使用的。可以总结为4部分:
Category | Description |
User Objects(用户对象) | 由用户会话显式创建并且在系统目录中被跟踪的对象。包括: 表及索引; 全局临时表(##t1)及其索引; 局部临时表(#t1)及其索引; 会话(session)范围:包括会话范围及在存储过程中的范围; 表变量(@t1)范围:包括会话范围及在存储过程中的范围; |
Internal Objects(内部对象) | 这是语句范围的对象,存在和消失于SQLServer处理的查询中。包括: 工作文件(hash join); 运行排序; 工作表(游标、脱机(spool)和LOB(大对象数据类型)类型存储); 从优化角度,当工作表被删除时,一个IAM也和一个区将被保存用于新的工作表。 |
Version Store(版本存储) | 这部分用于存储行版本、MARS、联机索引、触发器、基于快照的隔离级别的行版本。 |
Free Space(空余空间) | TempDB的可用空间 |
TempDB的总使用空间等于用户对象(userobjects)加上内置对象(internal objects)加上版本存储(version store)加上可用空间。
可用空间等于性能计数器中tempdb 的可用空间值。
监控Tempdb空间(Monitoring tempdb Space):
提早发现问题总比出现了再解决要强。你可以使用性能计数器:Free Space in tempdb(KB)来监控TempDB的空间使用情况。这个计数器按KB来跟踪TempDB。DBA可以使用这个指针来判断tempdb是否运行在低空间环境。但是,标识不同类别,就像签名定义的一样,tempdb使用磁盘空间的情况是非常丰富的。下面的查询返回tempdb被用户和内置对象使用情况,注意,这仅仅适用于tempdb:
Select
SUM(user_object_reserved_page_count)*8 asuser_objects_kb,
SUM(internal_object_reserved_page_count)*8 asinternal_objects_kb,
SUM(version_store_reserved_page_count)*8 as version_store_kb,
SUM(unallocated_extent_page_count)*8 as freespace_kb
From sys.dm_db_file_space_usage
Where database_id = 2
本机结果:
user_objects_kb internal_objects_kb version_store_kb freespace_kb
-------------------- ---------------------------------------- --------------------
NULL NULL NULL NULL
注意这些数据是不包含混合区的计算,混合区被分配给用户和内置对象。
空间分配故障排查:
用户对象、内置对象和版本存储能引起tempdb的空间申请,下面我们看看如何检查每部分的故障问题。
用户对象(User objects):
因为用户对象不属于任何特定会话(specific sessions),你需要理解规范的应用程序应该根据特定的要求创建和调整用户对象。你可以通过运行exec sp_spaceused @objname=’<user-object>’来找到由个别用户对象使用的空间。比如,运行以下脚本来列举所有tempdb 对象:
DECLARE userobj_cursor CURSOR FOR
select
sys.schemas.name + '.' + sys.objects.name
from sys.objects, sys.schemas
where object_id > 100 and
type_desc = 'USER_TABLE'and
sys.objects.schema_id = sys.schemas.schema_id
go
open userobj_cursor
go
declare @name varchar(256)
fetch userobj_cursor into @name
while (@@FETCH_STATUS = 0)
begin
exec sp_spaceused@objname = @name
fetch userobj_cursor into @name
end
close userobj_cursor
版本存储(Version Store):
SQLServer2008 提供一个行版本框架,目前为止,以下特性被用于行版本框架:
l 触发器
l MARS
l 联机索引
l 基于行版本隔离级别:需要在数据库级别设置选项
更多信息请查看联机丛书:RowVersioning Resource Usage
行版本在会话过程中是共享的,创建者也没有权限去回收行版本。你需要找到并可能的情况下停止运行最久的事务来保证行版本的清除。下面的插叙是返回运行最久的基于行版本存储的两个事务:
select top 2
transaction_id,
transaction_sequence_num,
elapsed_time_seconds
from sys.dm_tran_active_snapshot_database_transactions
order by elapsed_time_seconds DESC
以下是示例结果:
transaction_id transaction_sequence_numelapsed_time_seconds
-------------------- --------------------------------------------
8609 3 6523
20156 25 783
因为第二个活动事务在一个短时期内被引用,所以你应该把时间花在第一个事务中。但是,没有办法预估多少版本空间将被停止的事务释放。所以建议停止多一点的事务来清空这部分的空间。
可以通过特定账号来固定tempdb中版本存储的空间或者靠清除,如果可能,应该清除运行最久的快照隔离事务或者使用已提交读快照隔离的运行最久的查询。可以使用以下公式大概预估行版本存储所需空间:
[Sizeof version store] = 2 * [version store data generated per minute] * [longestrunning time (minutes) of the transaction]
在所有允许使用行版本隔离级别的数据库中,版本存储每分钟产生的事务和日志数据产生的相同。但是也有例外:在更新时的日志记录;还有最近插入的数据是没有形成版本,但会被记录日志。如果是大容量日志操作,并且恢复模式不是完全恢复,你可以使用Version Generation Rate 和Version Cleanup Rate性能计数器来计算。如果VersionCleanup Rate为0,则运行久的事务会防止行版本存储被清空。附带说明,在发生tempdb空间不足的错误前,SQLServer2008会坚持到最后,防止行版本存储被收缩。在收缩过程中,运行最久的事务却没产生任何行版本的将被标记为“受害者”,并清空这部分的空间。消息3967就是在错误日志中显示每个受害事务的信息。一旦事务被标记为受害者,将不能创建或者访问行版本信息。消息3966记录受害事务尝试读取行版本时将被回滚的情况。当收缩行版本存储成功后,tempdb将有更多的可用空间,否则,tempdb将耗尽。
内置对象(Internal Objects):
内置对象是被每个语句创建或销毁的对象,处理在前面说道的部分之外,其他都会创建。如果你发现在tempdb中有一个很大的空间被分配,你就要检查哪个会话或任务消耗了这部分空间,然后尽可能校正。
SQLServer2008提供DMVs:
sys.dm_db_session_space_usage 和sys.dm_db_task_space_usage
来追踪tempdb空间被哪些会话或者任务分配了。虽然任务是在会话环境下运行,但是任务的空间使用在任务完成之后才被会话占用的。可以使用以下查询来找到排行前列的会话分配。注意这些结果只包含任务已经完成的会话:
select
session_id,
internal_objects_alloc_page_count,
internal_objects_dealloc_page_count
from sys.dm_db_session_space_usage
order byinternal_objects_alloc_page_count DESC
可以使用下面的查询前列会话中分配给内置对象,包含目前活动的任务:
SELECT
t1.session_id,
(t1.internal_objects_alloc_page_count + task_alloc) as allocated,
(t1.internal_objects_dealloc_page_count + task_dealloc) as
deallocated
from sys.dm_db_session_space_usage ast1,
(select session_id,
sum(internal_objects_alloc_page_count)
as task_alloc,
sum(internal_objects_dealloc_page_count) as
task_dealloc
from sys.dm_db_task_space_usagegroup bysession_id) ast2
where t1.session_id = t2.session_id and t1.session_id >50
order by allocated DESC
下面是示例输出:
session_id allocated deallocated
---------- -------------------------------------
52 5120 5136
51 16 0
当你有一个隔离的任务或者产生大量内置对象分配的任务时,可以使用下面语句来发现这些语句和他们的详细执行计划:
select
t1.session_id,
t1.request_id,
t1.task_alloc,
t1.task_dealloc,
t2.sql_handle,
t2.statement_start_offset,
t2.statement_end_offset,
t2.plan_handle
from (Select session_id,
request_id,
sum(internal_objects_alloc_page_count) as task_alloc,
sum(internal_objects_dealloc_page_count) as task_dealloc
from sys.dm_db_task_space_usage
group bysession_id, request_id)as t1,
sys.dm_exec_requests ast2
where t1.session_id = t2.session_id and
(t1.request_id =t2.request_id)
order by t1.task_alloc DESC
示例输出:
session_id request_id task_alloc task_dealloc
---------------------------------------------------------
52 0 1024 1024
sql_handle statement_start_offset
-----------------------------------------------------------------------
0x02000000D490961BDD2A8BE3B0FB81ED67655EFEEB360172 356
statement_end_offset plan_handle
---------------------------------
-1 0x06000500D490961BA8C19503000000000000000000000000
可以使用sql_handle和plan_handle列来得到语句的执行计划:
select text from sys.dm_exec_sql_text(@sql_handle)
select * fromsys.dm_exec_query_plan(@plan_handle)
注意,当你想访问这些执行计划的时候可能它们不再缓存中,为了保证执行计划的可用性,缓存的执行计划会频繁轮询和保存结果,以便更好地使用。所以它可能会在后来才查到。
当SQLServer重启时,tempdb会初始化并增长到配置大小。这可能导致tempdb出现碎片,和引起间接开销,包含数据库自动增长时申请新区而导致的阻塞,扩张tempdb空间。这可能导致你的工作负载增大而影响性能。建议预先设置tempdb到适合的大小。
过度的DDL和分配操作:
Tempdb争论的两个点为:
创建和删除大数据量的临时表或者表变量会引起源数据的争用。在SQLServer2008中,局部临时表和表变量只是缓存最小的源数据。但是,下面的条件必须满足。否则,这些临时对象将不会被缓存:
l 没有创建命名约束
l 作用在表上的DDL语句,在临时表创建后没有运行,比如CREATE INDEX或者CREATE STATISTICS语句。
l 没有使用动态SQL创建的临时对象,如sp_executesqlN’create table #t(a int)’。
l 在别的对象中创建的临时对象,比如存储过程、触发器或者用户自定义函数、或者在临时对象中返回用户自定义函数、表值函数。
具有代表性的是,几乎所有的在堆中的临时/工作表都有这种情况。所以,一个增、删、或者drop操作都会英气PFS(空页面空间)页面的严重资源争用。如果大部分这些表都小于64KB和使用混合区来分配空间,会给SGAM(共享全局分配映射)页也带来很重的负担。
SQLServer2008缓存一个数据页和一个IAM页给均不临时表作为最小分配资源。工作表的缓存改进了。当一个查询执行时,计划也会被缓存,工作表在多个执行中的计划里面被使用,但很少清空。此外,第一个工作表的9个页面会被保留。
因为SGAM和PFS页发生在数据文件中固定间隔发生。所以容易找到它们的资源描述。所以,比如2:1:1表示在tempdb中的第一个PFS页(databaseid=2,fileid=1,pageid=1),2:1:3表示第一个SGAM页。SGAM页在每511232个页面后产生一个。PFS页会在每8088个页面后产生一个。你可以通过这个特性去tempdb中超找所有PFS和SGAM页。任何时候一个任务都会等待得到这些页上的闩锁(latch),这些信息保存在sys.dm_os_waiting_tasks表中。由于闩锁等待是很短暂的,所以你可以经常查询这些表(大概10秒一次)。并且收集这些信息做后续分析。比如,你可以使用下面面查询去加载所有在tempdb页中等待的任务到Analysis数据库的waiting_tasks表中:
-- get the current timestamp
declare @now datetime
select @now = getdate()
-- insert data into a table forlater analysis
insert into analysis..waiting_tasks
select
session_id,
wait_duration_ms,
resource_description,
@now
from sys.dm_os_waiting_tasks
where wait_type like‘PAGE%LATCH_%’ and
resource_description like ‘2:%’
任何时候当你在表中发现tempdb页中的latch申请,你就能分析是否基于PFS/SGAM页。如果是,意味着在tempdb中存在分配争用。如果看到争用在tempdb的其他页,并且如果你能识别这些也属于系统表,意味着存在过多的DDL操作引起了资源争用。
在tempdb对象分配造成的不正常增长,也可以监控下面的性能计数器:
1. SQL Server:Access Methods\Workfiles Created /Sec
2. SQL Server:Access Methods\Worktables Created /Sec
3. SQL Server:Access Methods\Mixed Page Allocations /Sec
4. SQL Server:General Statistics\Temp Tables Created /Sec
5. SQL Server:General Statistics\Temp Tables for destruction
解决:
如果tempdb由于过度的DDL操作引起资源争用。你可以检查应用程序和看看是否最小化DDL操作。可以尝试以下建议:
l 从SQLServer2005开始,临时对象在前面所说的情况下被缓存。但是,如果你依然遇到重大的DDL争用。你就需要查找哪些临时对象没有被缓存和为什么会发生这种情况。如果这些对象发生在循环或者存储过程里面,考虑把它们移出存储过程或者循环中。
l 检查执行计划,是否有一些计划创建了大量的临时对象、假脱机、排序或者工作表。对此,你需要把一些临时对象清理掉。比如,在列中创建用于order by的索引可以考虑移除排序。
如果争用是由于SGAM/PFS页引起,可以通过以下方式减缓:
l 增加tempdb数据文件,来平衡磁盘和文件的负载。理想的情况下,应该和CPU个数持平。
使用TF-1118来移除混合区的分配。下一章:内存
原文
tempdb
tempdb globally stores both internal and user objects and the temporary tables, objects, and stored procedures that are created during SQL Server operation.
There is a single tempdb for each SQL Server instance. It can be a performance and disk space bottleneck. tempdb can become overloaded in terms of space available and excessive DDL and DML operations. This can cause unrelated applications running on the server to slow down or fail.
Some of the common issues with tempdb are as follows:
• Running out of storage space in tempdb.
• Queries that run slowly due to the I/O bottleneck in tempdb. This is covered under I/O Bottlenecks earlier in this paper.
• Excessive DDL operations leading to a bottleneck in the system tables.
• Allocation contention.
Before we start diagnosing problems with tempdb, let us first look at how the space in tempdb is used. It can be grouped into four main categories.
Category Description
User objects These are explicitly created by user sessions and are tracked in system catalog. They include the following:
Table and index.
Global temporary table (##t1) and index.
Local temporary table (#t1) and index.
Session scoped.
Stored procedure scoped in which it was created.
Table variable (@t1).
Session scoped.
Stored procedure scoped in which it was created.
Internal objects These are statement scoped objects that are created and destroyed by SQL Server to process queries. These are not tracked in the system catalog. They include the following:
Work file (hash join)
Sort run
Work table (cursor, spool and temporary large object data type (LOB) storage)
As an optimization, when a work table is dropped, one IAM page and an extent is saved to be used with a new work table.
There are two exceptions: The temporary LOB storage is batch scoped, and the cursor worktable is session scoped.
Version store This is used for storing row versions. MARS, online index, triggers, and snapshot-based isolation levels are based on row versioning.
Free space This represents the disk space that is available in tempdb.
The total space used by tempdb equal to the user objects plus the internal objects plus the version store plus the free space.
This free space is same as the performance counter free space in tempdb.
Monitoring tempdb Space
It is better to prevent a problem than it is to work to solve it later. You can use the Free Space in tempdb (KB) performance counter to monitor the amount of space tempdb is using. This counter tracks free space in tempdb in kilobytes. Administrators can use this counter to determine whether tempdb is running low on free space.
However, identifying how the different categories, as defined earlier, are using the disk space in tempdb is a more interesting, and productive, question.
The following query returns the tempdb space used by user and by internal objects. Currently, it provides information for tempdb only.
Select
SUM (user_object_reserved_page_count)*8 as user_objects_kb,
SUM (internal_object_reserved_page_count)*8 as internal_objects_kb,
SUM (version_store_reserved_page_count)*8 as version_store_kb,
SUM (unallocated_extent_page_count)*8 as freespace_kb
From sys.dm_db_file_space_usage
Where database_id = 2
Here is one sample output (with space in KBs).
user_objets_kb internal_objects_kb version_store_kb freespace_kb
---------------- -------------------- ------------------ ------------
8736 128 64 448
Note that these calculations don’t account for pages in mixed extents. The pages in mixed extents can be allocated to user and internal objects.
Troubleshooting Space Issues
User objects, internal objects, and version storage can all cause space issues in tempdb. In this section, we consider how you can troubleshoot each of these categories.
User Objects
Because user objects are not owned by any specific sessions, you need to understand the specifications of the application that created them and adjust the tempdb size requirements accordingly. You can find the space used by individual user objects by executing exec sp_spaceused @objname='<user-object>'. For example, you can run the following script to enumerate all the tempdb objects.
DECLARE userobj_cursor CURSOR FOR
select
sys.schemas.name + '.' + sys.objects.name
from sys.objects, sys.schemas
where object_id > 100 and
type_desc = 'USER_TABLE'and
sys.objects.schema_id = sys.schemas.schema_id
go
open userobj_cursor
go
declare @name varchar(256)
fetch userobj_cursor into @name
while (@@FETCH_STATUS = 0)
begin
exec sp_spaceused @objname = @name
fetch userobj_cursor into @name
end
close userobj_cursor
Version Store
SQL Server 2008 provides a row versioning framework. Currently, the following features use the row versioning framework:
• Triggers
• MARS
• Online index
• Row versioning-based isolation levels: requires setting an option at the database level
For more information about these features, see Row Versioning Resource Usage (http://msdn.microsoft.com/en-us/library/ms175492.aspx) in SQL Server 2008 Books Online.
Row versions are shared across sessions. The creator of the row version has no control over when the row version can be reclaimed. You will need to find and then possibly stop the longest-running transaction that is preventing the row version cleanup.
The following query returns the top two longest-running transactions that depend on the versions in the version store.
select top 2
transaction_id,
transaction_sequence_num,
elapsed_time_seconds
from sys.dm_tran_active_snapshot_database_transactions
order by elapsed_time_seconds DESC
Here is a sample output that shows that a transaction with XSN 3 and Transaction ID 8609 has been active for 6,523 seconds.
transaction_id transaction_sequence_num elapsed_time_seconds
-------------------- ------------------------ --------------------
8609 3 6523
20156 25 783
Because the second transaction has been active for a relatively short period, you might be able to free up a significant amount of version store by stopping the first transaction. However, there is no way to estimate how much version space will be freed up by stopping this transaction. You may need to stop few a more transactions to free up significant space.
You can mitigate this problem by either sizing your tempdb properly to account for the version store or by eliminating, where possible, long-running transactions with snapshot isolation or long-running queries with read-committed-snapshot isolation. You can roughly estimate the size of the version store that is needed by using the following formula. (A factor of two is needed to account for the worst-case scenario, which occurs when the two longest-running transactions overlap.)
[Size of version store] = 2 * [version store data generated per minute] *
[longest running time (minutes) of the transaction]
In all databases that are enabled for row versioning based isolation levels, the version store data generated per minute for a transaction is about the same as log data generated per minute. However, there are some exceptions: Only differences are logged for updates; and a newly inserted data row is not versioned, but it might be logged, if it is a bulk-logged operation and the recovery mode is not full recovery.
You can also use the Version Generation Rate and Version Cleanup Rate performance counters to fine-tune your computation. If your Version Cleanup Rate is 0, a long-running transaction could be preventing the version store cleanup.
Incidentally, before generating an out-of-tempdb-space error, SQL Server 2008 makes a last-ditch attempt by forcing the version stores to shrink. During the shrink process, the longest-running transactions that have not yet generated any row versions are marked as victims. This frees up the version space used by them. Message 3967 is generated in the error log for each such victim transaction. If a transaction is marked as a victim, it can no longer read the row versions in the version store or create new ones. Message 3966 is generated and the transaction is rolled back when the victim transaction attempts to read row versions. If the shrink of the version store succeeds, more space is available in tempdb. Otherwise, tempdb runs out of space.
Internal Objects
Internal objects are created and destroyed for each statement, with exceptions as outlined in the table in tempdb earlier in this paper. If you notice that a huge amount of tempdb space is allocated, you should determine which session or tasks are consuming the space and then possibly take corrective action.
SQL Server 2008 provides two DMVs, sys.dm_db_session_space_usage and sys.dm_db_task_space_usage, to track tempdb space that is allocated to sessions and tasks, respectively. Though tasks are run in the context of sessions, the space used by tasks is accounted for under sessions only after the tasks complete.
You can use the following query to find the top sessions that are allocating internal objects. Note that this query includes only the tasks that have been completed in the sessions.
select
session_id,
internal_objects_alloc_page_count,
internal_objects_dealloc_page_count
from sys.dm_db_session_space_usage
order by internal_objects_alloc_page_count DESC
You can use the following query to find the top user sessions that are allocating internal objects, including currently active tasks.
SELECT
t1.session_id,
(t1.internal_objects_alloc_page_count + task_alloc) as allocated,
(t1.internal_objects_dealloc_page_count + task_dealloc) as
deallocated
from sys.dm_db_session_space_usage as t1,
(select session_id,
sum(internal_objects_alloc_page_count)
as task_alloc,
sum (internal_objects_dealloc_page_count) as
task_dealloc
from sys.dm_db_task_space_usage group by session_id) as t2
where t1.session_id = t2.session_id and t1.session_id >50
order by allocated DESC
Here is sample output.
session_id allocated deallocated
---------- -------------------- --------------------
52 5120 5136
51 16 0
After you have isolated the task or tasks that are generating a lot of internal object allocations, you can find out which Transact-SQL statement it is and its query plan for a more detailed analysis.
select
t1.session_id,
t1.request_id,
t1.task_alloc,
t1.task_dealloc,
t2.sql_handle,
t2.statement_start_offset,
t2.statement_end_offset,
t2.plan_handle
from (Select session_id,
request_id,
sum(internal_objects_alloc_page_count) as task_alloc,
sum (internal_objects_dealloc_page_count) as task_dealloc
from sys.dm_db_task_space_usage
group by session_id, request_id) as t1,
sys.dm_exec_requests as t2
where t1.session_id = t2.session_id and
(t1.request_id = t2.request_id)
order by t1.task_alloc DESC
Here is sample output.
session_id request_id task_alloc task_dealloc
---------------------------------------------------------
52 0 1024 1024
sql_handle statement_start_offset
-----------------------------------------------------------------------
0x02000000D490961BDD2A8BE3B0FB81ED67655EFEEB360172 356
statement_end_offset plan_handle
---------------------------------
-1 0x06000500D490961BA8C19503000000000000000000000000
You can use the sql_handle and plan_handle columns to get the SQL statement and the query plan as follows.
select text from sys.dm_exec_sql_text(@sql_handle)
select * from sys.dm_exec_query_plan(@plan_handle)
Note that it is possible that a query plan may not be in the cache when you want to access it. To guarantee the availability of the query plans, poll the plan cache frequently and save the results, preferably in a table, so that it can be queried later.
When SQL Server is restarted, the tempdb size goes back to the initially configured size and it grows based on the requirements. This can lead to fragmentation of the tempdb and can incur overhead, including the blocking of the allocation of new extents during the database auto-grow, and expanding the size of the tempdb. This can impact the performance of your workload. We recommend that you preallocate tempdb to the appropriate size.
Excessive DDL and Allocation Operations
Two sources of contention in tempdb can result in the following situations.
Creating and dropping large numbers of temporary tables and table variables can cause contention on metadata. In SQL Server 2008, local temporary tables and table variables are cached to minimize metadata contention. However, the following conditions must be satisfied; otherwise, the temp objects are not cached:
• Named constraints are not created.
• DDL statements that affect the table are not run after the temp table has been created, such as the CREATE INDEX or CREATE STATISTICS statements.
• The temp object is not created by using dynamic SQL, such as: sp_executesql N'create table #t(a int)'.
• The temp object is created inside another object, such as a stored procedure, trigger, or user-defined function; or the temp object is the return table of a user-defined, table-valued function.
Typically, most temporary/work tables are heaps; therefore, an insert, delete, or drop operation can cause heavy contention on Page Free Space (PFS) pages. If most of these tables are smaller than 64 KB and use mixed extent for allocation or deal location, this can put heavy contention on Shared Global Allocation Map (SGAM) pages. SQL Server 2008 caches one data page and one IAM page for local temporary tables to minimize allocation contention. Worktable caching is improved. When a query execution plan is cached, the work tables needed by the plan are not dropped across multiple executions of the plan but merely truncated. In addition, the first nine pages for the work table are kept.
Because SGAM and PFS pages occur at fixed intervals in data files, it is easy to find their resource description. So, for example, 2:1:1 represents the first PFS page in the tempdb (database-id = 2, file-id =1, page-id = 1) and 2:1:3 represents the first SGAM page. SGAM pages occur after every 511,232 pages, and each PFS page occurs after every 8,088 pages. You can use this to find all other PFS and SGAM pages across all files in tempdb. Any time a task is waiting to acquire latch on these pages, it shows up in sys.dm_os_waiting_tasks. Because latch waits are transient, you should query this table frequently (about once every 10 seconds) and collect this data for analysis later. For example, you can use the following query to load all tasks waiting on tempdb pages into a waiting_tasks table in the analysis database.
-- get the current timestamp
declare @now datetime
select @now = getdate()
-- insert data into a table for later analysis
insert into analysis..waiting_tasks
select
session_id,
wait_duration_ms,
resource_description,
@now
from sys.dm_os_waiting_tasks
where wait_type like ‘PAGE%LATCH_%’ and
resource_description like ‘2:%’
Any time you see tasks waiting to acquire latches on tempdb pages, you can analyze to see whether it is due to PFS or SGAM pages. If it is, this implies allocation contention in tempdb. If you see contention on other pages in tempdb, and if you can identify that a page belongs to the system table, this implies contention due to excessive DDL operations.
You can also monitor the following Performance Monitor counters for any unusual increase in the temporary objects allocation/deal location activity:
• SQL Server:Access Methods\Workfiles Created /Sec
• SQL Server:Access Methods\Worktables Created /Sec
• SQL Server:Access Methods\Mixed Page Allocations /Sec
• SQL Server:General Statistics\Temp Tables Created /Sec
• SQL Server:General Statistics\Temp Tables for destruction
Resolution
If the contention in tempdb is due to excessive DDL operation, you should look at your application and see whether you can minimize the DDL operation. You can try the following suggestions:
• Starting with SQL Server 2005, the temporary objects are cached under conditions as described earlier. However, if you are still encountering significant DDL contention, you need to look at what temporary objects are not being cached and where do they occur. If such objects occur inside a loop or a stored procedure, consider moving them out of the loop or the stored procedure.
• Look at query plans to see if some plans create lot of temporary objects, spools, sorts, or worktables. You may need to eliminate some temporary objects. For example, creating an index on a column that is used in ORDER BY might eliminate the sort.
If the contention is due to the contention in SGAM and PFS pages, you can mitigate it by trying the following:
• Increase the tempdb data files by an equal amount to distribute the workload across all of the disks and files. Ideally, you want to have as many files as there are CPUs (taking into account the affinity).
• Use TF-1118 to eliminate mixed extent allocations.