This post opens a series of blog posts dedicated to my observations of the new cardinality estimator in SQL Server 2014. But, before we jump to the new features, I’d like to provide some background, to make the next posts clearer.
这篇文章打开了一系列博客文章,这些文章专门针对我对SQL Server 2014中新基数估计器的观察。但是,在我们跳转至新功能之前,我想提供一些背景知识,以使下一篇文章更加清晰。
We’ll start by discussing the role of Cardinality Estimation in SQL Server, trying to answer – what is it and why it is needed.
我们将首先讨论基数估计在SQL Server中的作用,然后尝试回答–它是什么以及为什么需要它。
SQL Server has the Cost Based Optimizer. The optimizer chooses the cheapest plan, where cost represents an estimate of the resource consumption.
SQL Server具有基于成本的优化器。 优化器选择最便宜的计划,其中成本表示对资源消耗的估计。
Cardinality estimation is the crucial mechanism in SQL Server, because, it is a main argument in the costing function. Cardinality estimation influences a lot of things, here are some of them:
基数估计是SQL Server中的关键机制,因为它是成本核算函数中的主要论据。 基数估计会影响很多因素,其中包括:
- Access Method Strategy 访问方法策略
- Join Order Choise 加入订单选择
- Join Type Choise 加入类型选择
- Memory grants 内存授予
- Optimization Efforts Spent 优化工作
- Whole Plan Shape 整体平面图
A lot of things are not included in this general list, however, they are also influenced by cardinality.
此总列表中未包括很多内容,但是它们也受基数的影响。
Let’s create a simple synthetic database “opt”, it will be used in this and future blog posts for the simple demos. It contains three plain tables of 1000 rows each with primary keys, two of them have a foreign key relationship.
让我们创建一个简单的综合数据库“ opt”,它将在本博客以及以后的博客文章中进行简单演示。 它包含三个具有1000行的普通表,每个普通表都有主键,其中两个具有外键关系。
use master;
go
if db_id('opt') is not null drop database opt;
go
create database opt;
go
use opt;
go
create table t1(a int not null, b int not null, c int check (c between 1 and 50), constraint pk_a primary key(a));
create table t2(b int not null, c int, d char(10), constraint pk_b primary key(b));
create table t3(c int not null, constraint pk_c primary key(c));
go
insert into t1(a,b,c) select number, number%100+1, number%50+1 from master..spt_values where type = 'p' and number between 1 and 1000;
insert into t2(b,c) select number, number%100+1 from master..spt_values where type = 'p' and number between 1 and 1000;
insert into t3(c) select number from master..spt_values where type = 'p' and number between 1 and 1000;
go
alter table t1 add constraint fk_t2_b foreign key (b) references t2(b);
go
create statistics s_b on t1(b);
create statistics s_c on t1(c);
create statistics s_c on t2(c);
create statistics s_d on t2(c);
go
Now, imagine simple queries that joins two tables and filter rows.
现在,想象一下连接两个表和过滤器行的简单查询。
set showplan_xml on
go
select * from t1 join t2 on t1.a = t2.c where t1.a <= 10
select * from t1 join t2 on t1.a = t2.c where t1.a <= 100
go
set showplan_xml off
go
Depending on the estimated number of rows (estimated cardinality), we get different plans, even in such a simple case!
根据估计的行数(估计的基数),即使在这种简单情况下,我们也会获得不同的计划!
The only thing that is different, is the estimated number of rows. These different estimates lead to, at least, three item changes, according to the list mentioned above:
唯一不同的是估计的行数。 根据上述清单,这些不同的估计至少会导致三个项目更改:
- Access Method Strategy (t2 Seek vs. t2 Scan) 访问方法策略(t2搜寻与t2扫描)
- Join Order Choise (t1 Join t2 vs. t2 Join t1) 选择加入顺序(t1加入t2与t2加入t1)
- Join Type Choise (Nested Loops Join vs. Merge Join) 选择联接类型(嵌套循环联接与合并联接)
Now we may see, how sensitive is the Query Optimizer to the cardinality estimation. Wrong estimates may lead to a very inefficient plan, that’s why it is so important component of SQL Server, and that’s why I think it is worth to spend time learning how these estimates are made.
现在我们可以看到,查询优化器对基数估计有多敏感。 错误的估计值可能会导致计划效率非常低下,这就是为什么它是SQL Server如此重要的组成部分,这就是为什么我认为值得花时间学习如何进行这些估计。
Next we’ll discuss a place of Cardinality Estimation in the whole optimization process.
接下来,我们将讨论基数估计在整个优化过程中的位置。
目录 (Table of Contents)
参考资料 (References)
- Cardinality Estimation (SQL Server) 基数估计(SQL Server)
- Troubleshooting Poor Query Performance: Cardinality Estimation 解决查询性能差的问题:基数估计
- Query Tuning Fundamentals: Density, Predicates, Selectivity, and Cardinality 查询调优基础知识:密度,谓词,选择性和基数