How to Troubleshoot High CPU in RDS for SQL Server


Sometimes we come with high CPU usage of RDS for SQL Server instance. Here's some common steps to troubleshoot this issue.

 

What will cause high CPU usage in SQL Server?

  1. MAXDOP
  2. T-SQL queries
  3. I/O issue caused high CPU and so on

 

How to troubleshoot this issue in RDS for SQL Server.

 

  1. First, check CPU,IOPS status in "监控与报警" at the issue time
  2. Check high wait types via "SQL诊断报告" in DMS and by the following queries

WITH [Waits] AS

    (SELECT

        [wait_type],

        [wait_time_ms] / 1000.0 AS [WaitS],

        ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],

        [signal_wait_time_ms] / 1000.0 AS [SignalS],

        [waiting_tasks_count] AS [WaitCount],

       100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],

        ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]

    FROM sys.dm_os_wait_stats

    WHERE [wait_type] NOT IN (

        N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR',

        N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH',

        N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',

        N'CHKPT', N'CLR_AUTO_EVENT',

        N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',

 

        -- Maybe uncomment these four if you have mirroring issues

        N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE',

        N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD',

 

        N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',

        N'EXECSYNC', N'FSAGENT',

        N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',

 

        -- Maybe uncomment these six if you have AG issues

        N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',

        N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE',

        N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',

 

        N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP',

        N'LOGMGR_QUEUE', N'MEMORY_ALLOCATION_EXT',

        N'ONDEMAND_TASK_QUEUE',

        N'PREEMPTIVE_XE_GETTARGETSTATE',

        N'PWAIT_ALL_COMPONENTS_INITIALIZED',

        N'PWAIT_DIRECTLOGCONSUMER_GETNEXT',

        N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP', N'QDS_ASYNC_QUEUE',

        N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',

        N'QDS_SHUTDOWN_QUEUE', N'REDO_THREAD_PENDING_WORK',

        N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE',

        N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH',

        N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP',

        N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',

        N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP',

        N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',

        N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT',

        N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH',

        N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',

        N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS',

        N'WAITFOR', N'WAITFOR_TASKSHUTDOWN',

        N'WAIT_XTP_RECOVERY',

        N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',

        N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',

        N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT')

    AND [waiting_tasks_count] > 0

    )

SELECT

    MAX ([W1].[wait_type]) AS [WaitType],

    CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S],

    CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S],

    CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S],

    MAX ([W1].[WaitCount]) AS [WaitCount],

    CAST (MAX ([W1].[Percentage]) AS DECIMAL (5,2)) AS [Percentage],

    CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S],

    CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S],

    CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S],

    CAST ('https://www.sqlskills.com/help/waits/' + MAX ([W1].[wait_type]) as XML) AS [Help/Info URL]

FROM [Waits] AS [W1]

INNER JOIN [Waits] AS [W2]

    ON [W2].[RowNum] <= [W1].[RowNum]

GROUP BY [W1].[RowNum]

HAVING SUM ([W2].[Percentage]) - MAX( [W1].[Percentage] ) < 95; -- percentage threshold

GO

 

  1. Second check "慢日志统计",find the slow and high logical reads queries.
  2. Query high CPU statements in cache which can monitor high CPU queries.

SELECT TOP 50

[Avg. MultiCore/CPU time(sec)] = qs.total_worker_time / 1000000 / qs.execution_count,

[Total MultiCore/CPU time(sec)] = qs.total_worker_time / 1000000,

[Avg. Elapsed Time(sec)] = qs.total_elapsed_time / 1000000 / qs.execution_count,

[Total Elapsed Time(sec)] = qs.total_elapsed_time / 1000000,

qs.execution_count,

[Avg. I/O] = (total_logical_reads + total_logical_writes) / qs.execution_count,

[Total I/O] = total_logical_reads + total_logical_writes,

Query = SUBSTRING(qt.[text], (qs.statement_start_offset / 2) + 1,

(

(

CASE qs.statement_end_offset

WHEN -1 THEN DATALENGTH(qt.[text])

ELSE qs.statement_end_offset

END - qs.statement_start_offset

) / 2

) + 1

),

Batch = qt.[text],

[DB] = DB_NAME(qt.[dbid]),

qs.last_execution_time,

qp.query_plan

FROM sys.dm_exec_query_stats AS qs

CROSS APPLY sys.dm_exec_sql_text(qs.[sql_handle]) AS qt

CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp

where qs.execution_count > 5        --more than 5 occurences

ORDER BY [Total MultiCore/CPU time(sec)] DESC

 

Conclusions:

  1. If have too many MAXDOP wait types, in OLTP system, Customer need to set a lower value for MAXDOP。
  2. If it related to slow queries, please refer to "How to troubleshoot slow queries in RDS for SQL Server"
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值