About CTFP
CTFP (Cost Threshold for Parallelism) 是 SQL Server 中的一项配置,用于控制查询执行计划何时使用并行处理。具体来说,它表示执行计划的“子树成本”(StatementSubTreeCost)需要达到的阈值,才能触发 SQL Server 使用多核 CPU 来并行执行查询。
查询CTFP的设定值
sp_configure 'show advanced options',1
reconfigure
sp_configure 'cost threshold for parallelism'
CTFP 默认值的意义
这个默认值是5,他意指什么呢?他是否如上面所说的“子树成本”(StatementSubTreeCost),我们验证一下
先创建一张表,并插入1000000行
CREATE TABLE dbo.t1(ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, Stuffing VARCHAR(20));
INSERT INTO dbo.t1(Stuffing)
SELECT TOP 1000000 'Stuff'
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2;
GO
执行非常快,cpu time 191ms,产生4545个逻辑读
set statistics io,time on
SELECT COUNT(*) FROM dbo.t1
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(1 row affected)
Table 't1'. Scan count 17, logical reads 4545, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 191 ms, elapsed time = 61 ms.
CPU time 大于 elapsed time,说明查询在使用并行执行计划。这意味着 SQL Server 在执行查询时使用了多个 CPU 核心,同时并行处理任务。每个 CPU 核心会为这次查询消耗一定的时间,而这些核心之间的时间是并行的,不会相互累加到 elapsed time 中。
查看执行计划,可以看到使用了parallelism,且执行计划成本是3.55082
看起来可能有点令人困惑,SQL Server 默认的并行成本(Cost Threshold for Parallelism)阈值是 5,高于 3。为什么这个查询会并行?
这其实因为sqlserver决定是否并行化,是依据串行执行计划(serial plan)的成本
我们将maxdop降到1
SELECT COUNT(*) FROM dbo.t1 option(maxdop 1);
很快执行完成,elapsed time(138ms) 高于 CPU time(125ms),这可能意味着sql执行花费了138ms,其中cpu time花费了125ms,其他的时间可能在等待 I/O 操作、锁或者系统负载等。
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(1 row affected)
Table 't1'. Scan count 1, logical reads 4483, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 138 ms.
在下图显示的执行计划中,已经看不到parallelism的节点,说明sql使用了串行执行计划(serial plan),而且cost是5.00995,大于CTFP(cost throshold for parallelism,默认为5)的值,这也是上述cost=3.55082的情况下,也采用了平行化的原因
合理化CTFP值
过去,微软开发人员将查询成本与当时的电脑性能挂钩,但这些标准没有随着硬件性能的提升而更新。如今,查询成本的计算方式依然沿用老旧的标准,导致在现代高性能服务器上,很多简单查询仍然会触发并行执行。而服务器硬件的巨大进步使得默认的并行设置和成本阈值(如“cost threshold for parallelism”)不再适应当前的环境,很多性能问题源于没有根据硬件调整这些配置。
所以合理化CTFP的目標有2個:
1.查看高使用次数的执行计划
看看执行计划是否有优化的空间,比如是否有缺失的索引导致这些查询的成本上升。如果能调优这些高频率执行的查询并降低它们的成本,那无论如何都是一个好结果。
如下的sql帮助确认高价且执行频繁的sql语句以及它们的执行计划
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select * from (
SELECT
query_plan AS CompleteQueryPlan,
n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText,
n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel,
cast(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float) AS StatementSubTreeCost,
n.query('.') AS ParallelSubTreeXML,
ecp.usecounts,
ecp.size_in_bytes
FROM sys.dm_exec_cached_plans AS ecp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)
WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1
) a
order by StatementSubTreeCost desc
2.调整CTFP值
如果经过上述的处理,仍然有一些成本非常高的查询,可能无法将它们的成本降低到5以下,那么可以考虑增加CTFP的值,因为一些简单的query也频繁使用并行化执行,这个设定可以帮助减少这种可能,让那些cost比较高,真正需要并行化的sql采用并行化
如下的sql帮助检查当前数据库中sql语句大致的成本分布,作为判断CTFP设置的基准
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
select ctfp_gp.cost_range_distribution,count(1) as subtotal_num_sqlstmt,sum(ctfp_gp.usecounts) subtoal_usecount,
cast(count(1)*1.0/max(ctfp_gp.total_num_sqlstmt) as decimal(10,2)) as ratio_sqlstmt,
cast(sum(ctfp_gp.usecounts)*1.0/max(ctfp_gp.total_useconts) as decimal(10,2)) as ratio_useconts,
max(ctfp_gp.total_num_sqlstmt) AS total_num_sqlstmt,max(ctfp_gp.total_useconts) AS total_usecounts
from (
SELECT
ctfp.CompleteQueryPlan,
ctfp.StatementText,
ctfp.StatementOptimizationLevel,
case when cast(cast (ctfp.StatementSubTreeCost as float) as int) <=10 then 'less then 10'
when cast(cast (ctfp.StatementSubTreeCost as float) as int) between 10 and 20 then '10'
when cast(cast (ctfp.StatementSubTreeCost as float) as int) between 20 and 30 then '20'
when cast(cast (ctfp.StatementSubTreeCost as float) as int) between 30 and 40 then '30'
when cast(cast (ctfp.StatementSubTreeCost as float) as int) between 40 and 50 then '40'
when cast(cast (ctfp.StatementSubTreeCost as float) as int) between 50 and 60 then '50'
when cast(cast (ctfp.StatementSubTreeCost as float) as int) between 60 and 70 then '60'
when cast(cast (ctfp.StatementSubTreeCost as float) as int) between 70 and 80 then '70'
when cast(cast (ctfp.StatementSubTreeCost as float) as int) between 80 and 90 then '80'
when cast(cast (ctfp.StatementSubTreeCost as float) as int) between 90 and 100 then '90'
when cast(cast (ctfp.StatementSubTreeCost as float) as int) >100 then 'more then 100'
end as cost_range_distribution,
ctfp.ParallelSubTreeXML,
ctfp.usecounts,
sum(ctfp.usecounts) over () as total_useconts,
count(1) over () as total_num_sqlstmt,
ctfp.size_in_bytes
from (
SELECT
query_plan AS CompleteQueryPlan,
n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText,
n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel,
n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost,
n.query('.') AS ParallelSubTreeXML,
ecp.usecounts,
ecp.size_in_bytes
FROM sys.dm_exec_cached_plans AS ecp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)
WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1
) as ctfp
) ctfp_gp group by cost_range_distribution
从上述输出,我们可以看出:
成本10区间的sql有17%,执行次数占到全库的27%
成本40区间的sql有10%,执行次数占到全库的14%
成本50区间的sql有20%,执行次数占到全库的21%
因此我将选择将CTFP值设定在50,我认为是合理的选择