SQL Server中的MTVF和CE模型变化-CSDN博客

This is a note about multi-statement table valued functions (MTVF) and how their cardinality is estimated in the new CE framework.

这是有关多语句表值函数（MTVF）以及如何在新的CE框架中估计其基数的注释。

In the old CE framework the MTVF had fixed estimate of one row, in the new one the estimate is still fixed, however, now it is 100 rows. That’s the whole story. =)

在旧的CE框架中，MTVF具有固定的一行估计，在新的CE框架中，估计仍然是固定的，但是现在是100行。这就是整个故事。 =）

For the optimizer, the MTVF (as long as the scalar UDF) is a black box, from the estimation perspective. Its often considered that inline-TVF is better, because it’s text is embedded into the query and optimized as a whole. However, I saw the examples where MTVF performed better than inline – it depends, as it used to say. In general and most of the cases inline functions are really a better choice.

对于优化器，从估计的角度来看，MTVF（只要标量UDF）是一个黑匣子。人们通常认为内联TVF更好，因为它的文本被嵌入到查询中并进行了整体优化。但是，我看到了MTVF的性能优于内联的示例–正如过去所说，这取决于。通常，在大多数情况下，内联函数确实是一个更好的选择。

If you turn on the diagnostic output TF 2363 you will see that the optimizer uses the term “black-box” literally. You may also notice, that a fixed join calculator is used to estimate the join selectivity with the fixed 100 row estimate.

如果打开诊断输出TF 2363，您将看到优化器按字面意义使用术语“黑匣子”。您可能还会注意到，使用固定的连接计算器以固定的100行估算值估算连接选择性。

Now let’s move to the short example, this is an artificial example, just to demonstrate the possible positive effect of estimating more than 1 row.

现在让我们转到简短的示例，这是一个人工示例，只是为了演示估算多于1行的可能的积极效果。

At first, let’s create a Numbers table and a simple MTVF.

首先，让我们创建一个Numbers表和一个简单的MTVF。

use AdventureWorks2012;
go
 
------------------------------------------------
if object_id ('dbo.mtvfGetNums') is not null drop function dbo.mtvfGetNums;
if object_id('dbo.Numbers') is not null drop table dbo.Numbers;
create table dbo.Numbers(n int primary key)
insert dbo.Numbers(n) select top(1000000) rn = row_number() over(order by (select null)) from sys.columns c1,sys.columns c2,sys.columns c3;
go
 
------------------------------------------------
create function dbo.mtvfGetNums(@max int)
returns @res table(id int identity primary key, n int)
with schemabinding
as
begin
	insert @res(n) select n from dbo.Numbers where n <= @max;
	return;
end;
go

Now, let’s run two identical synthetic queries, to demonstrate the difference (don’t look for the sense in them), the first one uses the old CE, the second one the new CE:

现在，让我们运行两个相同的综合查询，以证明它们之间的区别（不要在其中寻找意义），第一个使用旧的CE，第二个使用新的CE：

-- Old
set statistics time, xml on
declare @a int, @b int, @c int;
select 
	@a = p.BusinessEntityID,
	@b = be.BusinessEntityID,
	@c = f.n
from
	Person.Person p
	cross apply dbo.mtvfGetNums((BusinessEntityID+1)%1000) f
	left join Person.BusinessEntity be on f.id = be.BusinessEntityID
where
	p.BusinessEntityID <= 1000
option(querytraceon 9481)
set statistics time, xml off
go
--New:
set statistics time, xml on
declare @a int, @b int, @c int;
select 
	@a = p.BusinessEntityID,
	@b = be.BusinessEntityID,
	@c = f.n
from
	Person.Person p
	cross apply dbo.mtvfGetNums((BusinessEntityID+1)%1000) f
	left join Person.BusinessEntity be on f.id = be.BusinessEntityID
where
	p.BusinessEntityID <= 1000
set statistics time, xml off
go

On average the first query runs 50% slower (1950 ms old vs. 1250 ms new). Let’s look at the plans.

平均而言，第一个查询的运行速度降低了50％（旧查询为1950毫秒，新查询为1250毫秒）。让我们看看计划。

In the first case, the MTVF was estimated as one row, multiplied by the number of executions 646. The actual number of rows is much higher, about 270 000. That lower estimate, lead to selecting a merge join that demands sorted inputs, and so the sort is present. The Sort demands some memory amount, this amount is based on the cardinality also, that was underestimated and so the spill at the level two occurred.

在第一种情况下，MTVF估计为一行，乘以执行次数646。实际的行数高得多，大约为270000。该较低的估计值导致选择需要排序输入的合并联接，并且因此存在排序。排序需要一些内存量，该内存量也是基于基数的，因此被低估了，因此发生了第二级溢出。

The new CE estimated in a correct way and is closer to the reality of 270 000 rows, however, not very close, but this was enough, to choose another join type and avoid sorting and spilling.

新的CE以正确的方式估算，并且接近270 000行的实际值，但是距离不是很接近，但这足以选择另一种连接类型并避免排序和溢出。

You may invent the opposite situation when 1 row estimate wins, or you may invent the example where there is no difference. That is possible because 100 rows estimate is still a guess and you may vary the data to make the guess closer or farther to the reality.

当赢得1行估算值时，您可以发明相反的情况，也可以发明没有差异的示例。这是可能的，因为100行估算仍然是一个猜测，您可以更改数据以使猜测更接近或更远。

模型变化 (The Model Variation)

You may use the new CE, but turn off this particular estimate of 100 rows for MTVF using the model variation, that can be enabled by TF 9488 (it is checked in the function CCardFrameworkSQL12::CardEstimateTVF internally).

您可以使用新的CE，但是可以使用TF 9488启用的模型变体来关闭MTVF的100行特定估计（可以在函数CCardFrameworkSQL12 :: CardEstimateTVF内部进行检查）。

If you run the query:

如果运行查询：

set statistics time, xml on
declare @a int, @b int, @c int;
select 
	@a = p.BusinessEntityID,
	@b = be.BusinessEntityID,
	@c = f.n
from
	Person.Person p
	cross apply dbo.mtvfGetNums((BusinessEntityID+1)%1000) f
	left join Person.BusinessEntity be on f.id = be.BusinessEntityID
where
	p.BusinessEntityID <= 1000
option(querytraceon 9488)
set statistics time, xml off
go

You will see that the estimate now is 1 row per execution:

您将看到现在的估算值是每次执行1行：

Interesting, that even the MTVF is estimated as in the old CE framework, we have a different plan. That is because the Joins are present in our queries, and the Join estimation was also changed in many ways. However, that is a topic for another blog post.

有趣的是，即使MTVF也是按照旧的CE框架进行估算的，我们也有不同的计划。这是因为联接存在于我们的查询中，并且联接估计也以许多方式进行了更改。但是，这是另一篇博客文章的主题。

That’s all for that post, happy estimations! =)

这是该帖子的全部内容，估计很高兴！ =）