sql判断基数_SQL Server中的基数估计概念

最新推荐文章于 2022-11-28 21:51:22 发布

culuo4781

最新推荐文章于 2022-11-28 21:51:22 发布

阅读量759

点赞数

文章标签： python java 机器学习人工智能算法

原文链接：https://www.sqlshack.com/cardinality-estimation-concepts/

版权

sql判断基数

In this blog post we are going to talk about the principles and the main concepts which are used by the optimizer to perform an estimation, and also, we will do a little bit math, so be prepared.

在此博客文章中，我们将讨论优化器用来执行估计的原理和主要概念，并且，我们将做一些数学运算，因此请做好准备。

基本统计 (Base Statistics)

A cardinality estimation mechanism, as a first step, usually uses base statistics to estimate the expected number of rows that should be returned from the base table. You may look at these statistics using DBCC command – DBCC SHOW_STATISTICS.

作为第一步，基数估计机制通常使用基本统计信息来估计应从基本表返回的预期行数。您可以使用DBCC命令– DBCC SHOW_STATISTICS查看这些统计信息。

The main components of the base statistics are:

基本统计信息的主要组成部分是：

Table metadata
表元数据
Column density
列密度
Column Distribution histogram
列分布直方图

Table metadata includes the information about the number of rows in the table, a number of modifications and so on – the data that describes a base table. Column density represents how many distinct values there are in particular column. Column distribution histogram tells the optimizer about how column values are distributed in the column.

表元数据包括有关表中的行数，修改数等信息-描述基本表的数据。列密度表示特定列中有多少个不同的值。列分布直方图告诉优化器列值如何在列中分布。

Let’s take a quick look at the base statistics in the database “opt”, table t1 and column b.

让我们快速看一下数据库“ opt”，表t1和列b中的基本统计信息。

dbcc show_statistics(t1, s_b)

The output is:

输出为：

If the optimizer needs to know how the specific value, let’s say 5 in the column b, is distributed, it should look into the histogram (the last dataset in the picture), find the histogram step, which the value 5 belongs to, it would be the third one, in our case. Then look if the value 5 equals to the upper border. In our case it does – we look for 5 and RANGE_HI_KEY also equals 5. If equals, then look into the EQ_ROWS (number of rows in the step that are equal to the upper border) and determine 10 rows.

如果优化器需要知道特定值（例如b列中的5）是如何分布的，则应查看直方图（图片中的最后一个数据集），找到值5所属的直方图步骤，在我们的情况下将是第三个。然后查看值5是否等于上边界。在我们的例子中，它确实是–我们寻找5，而RANGE_HI_KEY也等于5。如果相等，则调查EQ_ROWS（步骤中等于上边界的行数）并确定10行。

The density might be used to estimate the number of distinct values in the column b, that is useful to estimate the number of rows after a grouping operation.

密度可用于估计列b中不同值的数量，这对估计分组操作后的行数很有用。

There is a good description of DBCC SHOW_STATISTICS command in the documentation, with a lot of details, explanation, examples and even illustrations, so I will not delve too much into it’s usage here. If you are not familiar with it, I encourage you to visit the related link at the end of this post.

文档中对DBCC SHOW_STATISTICS命令有一个很好的描述，其中包含许多详细信息，说明，示例甚至插图，因此在此我不会过多地研究它的用法。如果您不熟悉它，建议您访问本文末尾的相关链接。

Basic statistic is a very interesting topic, there are many pitfalls and nuances. The whole book chapters or even books are dedicated exactly to that topic. It absolutely deserves it, because a lot of performance problems might be solved, if you know how to deal with statistics. But the focus of this blog post is not the statistics itself, but how it is used for the estimation, and so I’ll n

最低0.47元/天解锁文章

culuo4781

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
sql判断基数_SQL Server中的基数估计概念

sql判断基数 In this blog post we are going to talk about the principles and the main concepts which are used by the optimizer to perform an estimation, and also, we will do a little bit math, so be pre...
复制链接

扫一扫