oracle ndv,oracle12c新类型的优化器统计信息-top frequency histogram

最新推荐文章于 2022-06-23 17:40:32 发布

彼岸枫桥

最新推荐文章于 2022-06-23 17:40:32 发布

阅读量128

点赞数

文章标签： oracle ndv

说明

官方文档：

Oracle creates histograms on columns that have a data skew to improve cardinality estimates. Two additional types of histograms have been introduced for columns which have more than 254 distinct values to improve the cardinality estimates generated using histograms.A top frequency histogram is created if a small number of distinct values occupy most of the data (greater than 99% of the data). The histogram is created using the small number of extremely popular distinct values.By ignoring the unpopular values, which are statistically insignificant, a better quality histogram for the highly popular values can be produced.Alternatively, a hybrid histogram can be created which combines a height-based histogram and frequency histogram. It is a height-based histogram where frequent values always become the endpoint values and a value never spans more than one bucket. By recording the frequency of each end value, we record the frequency of the frequent values.

top frequency histogram它忽略了那些“非流行数据”(即出现频率低的数值)。例如，1000枚硬币中只有一枚面值1分的硬币，那在创建柱状图分组时，它就就可以被忽略。Top－N频率柱状图能产生一个更利于“流行数据”(高频率数据)的柱状图。

A top frequency histogramprovidesmore accurate cardinality estimates for columnsthat have more than254 distinct valuesbut contain a small number of extremely popular distinct values (greater than 99% of the data has one of those values).

在12c中，优化器的一个新特性就是新增了新的柱状图数据-top frequency histogram(TOP N频率柱状图)和hybrid histogram(混合柱状图)。优化器利用这两个柱状图，可以更准确的计算列的基数估计，从而生成更好的执行计划。https://www.cndba.cn/Expect-le/article/2034

产生Top Frequency Histograms的条件

NDV(Distinct Value Number,不同值的总数) is greater than n, where n is the number of histogram buckets (default 254).

柱状图的分组数量一定要小于NDVhttps://www.cndba.cn/Expect-le/article/2034

The percentage of rows occupied by the top n frequent values is equal to or greater than threshold p, where p is (1-(1/n))*100.

出现频率最高的数值所占比例大于等于阈值P，p=(1-(1/n))*100。如果n=254，p=99.6

The estimate_percent parameter in the DBMS_STATS statistics gathering procedure is set to AUTO_SAMPLE_SIZE.

https://www.cndba.cn/Expect-le/article/2034

DBMS_STATS中estimate_percent必须设置为默认比例(AUTO_SAMPLE_SIZE)

对于上面几个参数，官网解释如下：

NDV

This represents the number of distinct values in a column. For example, if a column only contains the values 100, 200, and 300, then the NDV for this column is 3.

This variable represents the number of histogram buckets. The default is 254.

This variable represents an internal percentage threshold that is equal to (1–(1/n)) * 100. For example, if n = 254, then p is 99.6.

实验-如何产生Top Frequency Histogram

3.1

创建表并插入合适的数据

create table topfreq_t(id number);

--插入数据

insert into topfreq_t values(52792);

insert into topfreq_t values(52793);

insert into topfreq_t values(52794);

insert into topfreq_t values(52795);

insert into topfreq_t values(52796);

insert into topfreq_t values(52797);

insert into topfreq_t values(52798);

insert into topfreq_t values(52799);

commit;

表topfreq_t中有23条数据，与8种不同的值。如下：

SQL> select t.id, count(*) from TOPFREQ_T t group by t.id order by 1

2 ;

ID COUNT(*)

---------- ----------

52792 1

52793 5

52794 2

52795 1

52796 1

52797 2

52798 2

52799 9

8 rows selected.

3.2

生成 top frequency histogram

3.2.1

收集统计信息

BEGIN

DBMS_STATS.GATHER_TABLE_STATS (

ownname => 'LEI'

, tabname => 'TOPFREQ_T'

, method_opt => 'FOR COLUMNS id SIZE 7'

);

END;

PL/SQL procedure successfully completed.

注意：id为表的列名，7是分组数要小于不同值的总数(8)。

3.2.2

查看id列的统计信息

SQL> col table_name for a10

SQL> col COLUMN_NAME for a20

SQL> SELECT TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, HISTOGRAM

FROM USER_TAB_COL_STATISTICS

WHERE TABLE_NAME='TOPFREQ_T'

AND COLUMN_NAME='ID'; 2 3 4

TABLE_NAME COLUMN_NAMENUM_DISTINCT HISTOGRAM

---------------------- ----------------- --------------

TOPFREQ_T ID 8 TOP-FREQUENCY

产生的原因：首先肯定是满足了前面的条件

分组数7小于了不同值的总数8

https://www.cndba.cn/Expect-le/article/2034

由于n=7，所以p=85.7，而7个出现最频繁的值所占的比例是95.65>85.7

3.2.3

查看Oracle所收集的7个频率最高的值

https://www.cndba.cn/Expect-le/article/2034

SQL> SELECT ENDPOINT_NUMBER, ENDPOINT_VALUE

FROM USER_HISTOGRAMS

WHERE TABLE_NAME='TOPFREQ_T'

AND COLUMN_NAME='ID'; 2 3 4

ENDPOINT_NUMBER ENDPOINT_VALUE

--------------- --------------

1 52792

6 52793

8 52794

9 52796

11 52797

13 52798

22 52799

7 rows selected.

可以看到，少了52795(只有一条记录)，ENDPOINT_VALUE表示值出现频率的总和。

下图表示Top Frequency Histogram

top frequency histogram

彼岸枫桥

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
oracle ndv,oracle12c新类型的优化器统计信息-top frequency histogram

1说明官方文档：Oracle creates histograms on columns that have a data skew to improve cardinality estimates. Two additional types of histograms have been introduced for columns which have more than 254 distin...
复制链接

扫一扫