Investigating perfomance bottlenecks in SQL Server 2008 by looking at wait events using the XE trace system
Introduction.
In my previous articles SQL Server Wait Events: Taking the Guesswork out of Performance Profiling, and Taking the Guesswork out of SQL Server Performance Profiling Part 2, I introduced the concepts of performance analysis based on wait events.
In short, the idea behind this concept is to take the response time of a SQL statement (or a batch or a user session) and split it up into CPU time and Wait time..
Earlier versions of SQL Server only maintain wait time (aggregated) on the server level, or you can see the wait times and the wait reasons, and when they actually occur in views like sysprocesses orsys.dm_os_waiting_tasks. Starting with SQL Server 2005, the wait events are finally documented in the SQL Server BOL, and more and more background is provided in blogs, KB Articles, forums etc.
A wait event ‘happens’ when the SQL Server scheduler (implemented in the SQLOS) decides to suspend a running task.
This can be because a ‘long’ operation starts, such as disk, network I/O, or lock. It can also happen when the allotted time quantum that a task can be active on the CPU ends. At this moment this seems to be a hard-coded 4 ms. This is the means by which that the SQL Server scheduler makes sure that every task in SQL Server gets its turn running on the CPU. By carefully looking at the way the response time is build up, one can make intelligent decisions on where to look for possible optimizations, or capacity planning: For instance, if on your SQL Server box 90% of the total used (response) time consists of I/O waits and only 10% of the time is spend on the CPU, then adding CPU capacity or upgrading to faster CPUs is unlikely to have a very large impact on response times. The same is true for individual queries. If one query spends 90% of its time waiting for I/O, then speeding up the CPU will only impact the other 10% of the response time picture.
SQL Server 2008 introduces Extended Events (XE) a new event handling system. XE is a flexible and lightweight event trace system.
The SQL Server engine is instrumented with tracing code in many locations. It is possible to trace ‘traditional’ events as existed in older SQL Trace versions, events like ‘RPC:Completed’. ‘sp_completed’, ‘lock timeout’ etc.
An exciting new tracing feature, is the means by which one can trace information about SQL Server OS (SQLOS) wait events
Each event contains a set of interesting columns that will be logged to a ‘target’. In this example, we have chosen a file. Next to the ‘default’ set of columns for an event, extra information can be logged as well. This is done by defining ‘actions’ on a traced event. Below is an example of standard columns that are logged for the event‘wait_info’ in the package ‘sqlos’, and for the event, ‘sql_statement_completed’ in the sqlserver package, which will also be used in the example:
name column_id object_name
-------------------- ----------- -----------------------
wait_type 0 wait_info
opcode 1 wait_info
duration 2 wait_info
max_duration 3 wait_info
total_duration 4 wait_info
signal_duration 5 wait_info
completed_count 6 wait_info
…
source_database_id 0 sql_statement_completed
object_id 1 sql_statement_completed
object_type 2 sql_statement_completed
cpu 3 sql_statement_completed
duration 4 sql_statement_completed
reads 5 sql_statement_completed
writes 6 sql_statement_completed
Next is an example on how to setup wait event tracing and collecting information from an ‘asynchronous file target’. Only specific events are collected for session_id=53.
First we create the ‘event session’, and start it.
After this, we can run an example query from a session with session_id=53, a count(*) query was run on a 1 million row (unindexed) table.
Example 1:
create event session test1
on server
add event sqlserver.sql_statement_starting
(action
(sqlserver.session_id, package0.collect_system_time,
package0.collect_cpu_cycle_time,sqlserver.sql_text,
sqlserver.plan_handle, sqlos.task_address, sqlos.worker_address)
where sqlserver.session_id = 53),
add event sqlserver.sql_statement_completed
(action
(sqlserver.session_id, package0.collect_system_time,package0.collect_cpu_cycle_time, sqlserver.sql_text,
sqlserver.plan_handle, sqlos.task_address, sqlos.worker_address)
where sqlserver.session_id = 53),
add event sqlos.wait_info
(action
(sqlserver.session_id, package0.collect_system_time,package0.collect_cpu_cycle_time, sqlos.task_address, sqlos.worker_address)
where sqlserver.session_id = 53)
--
-- async file, read with: sys.fn_xe_file_target_read_file
--
ADD TARGET package0.asynchronous_file_target
(SET filename = N'C:\temp\wait.etx', metadatafile = N'C:\temp\wait.mta',
max_file_size = 50, max_rollover_files = 10)
WITH (max_dispatch_latency = 2 seconds)
go
alter event session test1 on server state = start
go
Example 2:
The next example shows how to collect the trace information from the trace files for further processing. As you can see from the example, the files is read ‘through’ a virtual function, ‘sys.fn_xe_file_target_read_file‘, and the rows are returned in XML form.
The insert into the xTable counted 4938 rows. For the next summary I only collected the ‘wait_info’ ‘end’ events (1560 rows) and the ‘sql_statement_completed’ event (1 row):
drop table xTable
CREATE TABLE xTable
(
xTable_ID INT IDENTITY
PRIMARY KEY,
xCol XML
) ;
INSERT INTO xTable ( xCol )
select cast(event_data as xml) waitinfo from sys.fn_xe_file_target_read_file
('c:\temp\wait_*.etx',
'c:\temp\wait_*.mta',
null,null)
Now in order to extract the CPU and Wait information and match it with the total response time, we run the following queries:
SELECT
-- for some reason wait type name is not logged with synch target. bug?
(select map_value from sys.dm_xe_map_values
where name = 'wait_types'
and map_key = xCol.value('(event/data/value)[1]', 'int')
)AS wtype,
xCol.value('(event/data/value)[3]', 'int') --wait time
AS tottime,
xCol.value('(event/data/value)[6]', 'int') --sig wait time
AS sigtime
into #mywaits
FROM xTable
where xCol.value('(/event/@name)[1]', 'varchar(30)') = 'wait_info'
and xCol.value('(event/data/value)[2]', 'int') = 1 --opcode end
select wtype,
count(*) as wcount,
sum(tottime) as total_time,
sum(sigtime) as signal_time
from #mywaits group by wtype
go
-- stmt completed:
SELECT
xCol.value('(event/data/value)[4]', 'int')
AS cputime,
xCol.value('(event/data/value)[5]', 'int')
AS duration,
xCol.value('(event/action/value)[4]', 'varchar(MAX)')
AS sql_text
into #mysql
FROM [xTable]
where xCol.value('(/event/@name)[1]', 'varchar(30)') ='sql_statement_completed'
select sql_text,
count(*) as count,
SUM(cputime) as cputime,
SUM(duration) as duration from #mysql group by sql_text
go
drop table #mywaits
drop table #mysql
This is the result of the above queries:
-------------------------- ----------- ----------- -----------
PWAIT_SOS_SCHEDULER_YIELD 20 45 0
PWAIT_SLEEP_TASK 1 10 0
PWAIT_PAGEIOLATCH_SH 1539 59163 674
(1 row(s) affected)
sql_text count cputime duration
------------------------- -------- -------- -----------
select COUNT(*) from t1m 1 1892 61560364
So what we can see here is that the statement response time (duration) was 6156 ms (apparently, the sqlserver.sql_statement_completed.duration is measured in microseconds)
The CPU time used was 1892 ms and the wait time for the three different wait events was 45+10+59163 ms = 59218 ms. Including the measured signal_time (the time between the end of the actual wait and the resumption of work) this nicely adds up to the total duration of the query.
For more information on the specific wait events, search the SQL Server BOL for the sys.dm_os_wait_stats view. All wait events are documented here.
It is also possible to look at each individual wait event. This can be useful to detect skew in wait durations for example. If you detect that certain I/O operations take longer than others, it might be interesting to add other XEvents to trace. Events like ‘file_read_completed’ will show which file was read and which offset within the file. The event ‘physical_page_read’, can tell which page for which file. These events can help detecting if you might be reading from slow disks or doing random I/O while you were expecting to do sequential I/Os.
-- first: sql_statement_starting
SELECT xTable_ID,
xCol.value('(event/action/value)[1]', 'int')
AS session_id,
xCol.value('(/event/@timestamp)[1]', 'varchar(24)')
AS EventTime,
xCol.value('(/event/@name)[1]', 'varchar(30)')
AS EventType,
xCol.value('(event/action/value)[2]', 'varchar(30)')
AS system_time,
xCol.value('(event/action/value)[3]', 'varchar(30)')
AS cpu_cycle_time,
xCol.value('(event/data/value)[5]', 'varchar(30)')
AS data1,
xCol.value('(event/data/value)[6]', 'varchar(30)')
AS data2,
xCol.value('(event/action/value)[4]', 'varchar(MAX)')
AS sql_text
FROM [xTable]
where xCol.value('(/event/@name)[1]', 'varchar(30)') = 'sql_statement_starting'
union --now collect: statement_completed
SELECT xTable_ID,
xCol.value('(event/action/value)[1]', 'int')
AS session_id,
xCol.value('(/event/@timestamp)[1]', 'varchar(24)')
AS EventTime,
xCol.value('(/event/@name)[1]', 'varchar(30)')
AS EventType,
xCol.value('(event/action/value)[2]', 'varchar(30)')
AS system_time,
xCol.value('(event/action/value)[3]', 'varchar(30)')
AS cpu_cycle_time,
xCol.value('(event/data/value)[4]', 'varchar(30)')
AS data1,
xCol.value('(event/data/value)[5]', 'varchar(30)')
AS data2,
xCol.value('(event/action/value)[4]', 'varchar(MAX)')
AS sql_text
FROM [xTable]
where xCol.value('(/event/@name)[1]', 'varchar(30)') = 'sql_statement_completed'
union
SELECT xTable_ID,
xCol.value('(event/action/value)[1]', 'int')
AS session_id,
xCol.value('(/event/@timestamp)[1]', 'varchar(24)')
AS EventTime,
-- for some reason wait type name is not logged with synch target. bug?
(select map_value from sys.dm_xe_map_values
where name = 'wait_types'
and map_key = xCol.value('(event/data/value)[1]','int')
) as Event_name,
xCol.value('(event/action/value)[2]', 'varchar(30)')
AS system_time,
xCol.value('(event/action/value)[3]', 'varchar(30)')
AS cpu_cycle_time,
xCol.value('(event/data/value)[3]', 'varchar(30)') --wait time
AS data1,
xCol.value('(event/data/value)[6]', 'varchar(30)') --sig wait time
AS data2,
''
AS sql_text
FROM [xTable]
where xCol.value('(/event/@name)[1]', 'varchar(30)') = 'wait_info'
and xCol.value('(event/data/value)[2]', 'int') = 1 -- opcode end
This article, submitted in May, was originally scheduled for publication as part of Simple-Talk's celebration of the launch of SQL Server 2008 next month, but has been brought forward in view of the current interest in the subject