oracle ndv,oracle12c新类型的优化器统计信息-top frequency histogram

1

说明

官方文档:

Oracle creates histograms on columns that have a data skew to improve cardinality estimates. Two additional types of histograms have been introduced for columns which have more than 254 distinct values to improve the cardinality estimates generated using histograms.A top frequency histogram is created if a small number of distinct values occupy most of the data (greater than 99% of the data). The histogram is created using the small number of extremely popular distinct values.By ignoring the unpopular values, which are statistically insignificant, a better quality histogram for the highly popular values can be produced.Alternatively, a hybrid histogram can be created which combines a height-based histogram and frequency histogram. It is a height-based histogram where frequent values always become the endpoint values and a value never spans more than one bucket. By recording the frequency of each end value, we record the frequency of the frequent values.

top frequency histogram它忽略了那些“非流行数据”(即出现频率低的数值)。例如,1000枚硬币中只有一枚面值1分的硬币,那在创建柱状图分组时,它就就可以被忽略。Top-N频率柱状图能产生一个更利于“流行数据”(高频率数据)的柱状图。

A top frequency histogramprovidesmore accurate cardinality estimates for columnsthat have more than254 distinct valuesbut contain a small number of extremely popular distinct values (greater than 99% of the data has one of those values).

在12c中,优化器的一个新特性就是新增了新的柱状图数据-top frequency histogram(TOP N频率柱状图)和hybrid histogram(混合柱状图)。优化器利用这两个柱状图,可以更准确的计算列的基数估计,从而生成更好的执行计划。https://www.cndba.cn/Expect-le/article/2034

2

产生Top Frequency Histograms的条件

l

NDV(Distinct Value Number,不同值的总数) is greater than n, where n is the number of histogram buckets (default 254).

柱状图的分组数量一定要小于NDVhttps://www.cndba.cn/Expect-le/article/2034

l

The percentage of rows occupied by the top n frequent values is equal to or greater than threshold p, where p is (1-(1/n))*100.

出现频率最高的数值所占比例大于等于阈值P,p=(1-(1/n))*100。如果n=254,p=99.6

l

The estimate_percent parameter in the DBMS_STATS statistics gathering procedure is set to AUTO_SAMPLE_SIZE.

https://www.cndba.cn/Expect-le/article/2034

DBMS_STATS中estimate_percent必须设置为默认比例(AUTO_SAMPLE_SIZE)

对于上面几个参数,官网解释如下:

NDV

This represents the number of distinct values in a column. For example, if a column only contains the values 100, 200, and 300, then the NDV for this column is 3.

n

This variable represents the number of histogram buckets. The default is 254.

p

This variable represents an internal percentage threshold that is equal to (1–(1/n)) * 100. For example, if n = 254, then p is 99.6.

3

实验-如何产生Top Frequency Histogram

3.1

创建表并插入合适的数据

create table topfreq_t(id number);

--插入数据

insert into topfreq_t values(52792);

insert into topfreq_t values(52793);

insert into topfreq_t values(52793);

insert into topfreq_t values(52793);

insert into topfreq_t values(52793);

insert into topfreq_t values(52793);

insert into topfreq_t values(52794);

insert into topfreq_t values(52794);

insert into topfreq_t values(52795);

insert into topfreq_t values(52796);

insert into topfreq_t values(52797);

insert into topfreq_t values(52797);

insert into topfreq_t values(52798);

insert into topfreq_t values(52798);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

insert into topfreq_t values(52799);

commit;

表topfreq_t中有23条数据,与8种不同的值。如下:

SQL> select t.id, count(*) from TOPFREQ_T t group by t.id order by 1

2 ;

ID COUNT(*)

---------- ----------

52792 1

52793 5

52794 2

52795 1

52796 1

52797 2

52798 2

52799 9

8 rows selected.

3.2

生成 top frequency histogram

3.2.1

收集统计信息

BEGIN

DBMS_STATS.GATHER_TABLE_STATS (

ownname => 'LEI'

, tabname => 'TOPFREQ_T'

, method_opt => 'FOR COLUMNS id SIZE 7'

);

END;

/

PL/SQL procedure successfully completed.

注意:id为表的列名,7是分组数要小于不同值的总数(8)。

3.2.2

查看id列的统计信息

SQL> col table_name for a10

SQL> col COLUMN_NAME for a20

SQL> SELECT TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, HISTOGRAM

FROM USER_TAB_COL_STATISTICS

WHERE TABLE_NAME='TOPFREQ_T'

AND COLUMN_NAME='ID'; 2 3 4

TABLE_NAME COLUMN_NAMENUM_DISTINCT HISTOGRAM

---------------------- ----------------- --------------

TOPFREQ_T ID 8 TOP-FREQUENCY

产生的原因:首先肯定是满足了前面的条件

l

分组数7小于了不同值的总数8

https://www.cndba.cn/Expect-le/article/2034

l

由于n=7,所以p=85.7,而7个出现最频繁的值所占的比例是95.65>85.7

3.2.3

查看Oracle所收集的7个频率最高的值

https://www.cndba.cn/Expect-le/article/2034

SQL> SELECT ENDPOINT_NUMBER, ENDPOINT_VALUE

FROM USER_HISTOGRAMS

WHERE TABLE_NAME='TOPFREQ_T'

AND COLUMN_NAME='ID'; 2 3 4

ENDPOINT_NUMBER ENDPOINT_VALUE

--------------- --------------

1 52792

6 52793

8 52794

9 52796

11 52797

13 52798

22 52799

7 rows selected.

可以看到,少了52795(只有一条记录),ENDPOINT_VALUE表示值出现频率的总和。

下图表示Top Frequency Histogram

版权声明:本文为博主原创文章,未经博主允许不得转载。

top frequency histogram

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值