直方图(HISTOGRAM)实验

通常情况下当BUCTET < 表的NUM_DISTINCT值得到的是HEIGHT BALANCED(高度平衡)直方图,

而当BUCTET >= 表的NUM_DISTINCT值的时候得到的是FREQUENCY(频率)直方图。


一、准备数据

SQL> create table tab(a number,b number);

Table created.

SQL> begin
  2  for i in 1..10000 loop
  3  insert into tab values(i,i);
  4  end loop;
  5  commit;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> update tab set b=5 where b between 6 and 9995;

9990 rows updated.

SQL> commit;

Commit complete.

SQL> select b,count(*) from tab group by b;

         B   COUNT(*)
---------- ----------
         1          1
      9997          1
      9999          1
     10000          1
         2          1
         4          1
         5       9991
      9998          1
         3          1
      9996          1

10 rows selected.

SQL> 

这样表中有10000条数据,对于b列,有10个不同的值,等于5的值有9991个,
此时无论是选择b=1或者b=5的数据都必须要进行全表扫描,
因为表上没有索引,也即是没有别的访问路径。


创建索引:

SQL> 
SQL> create index i_tab_b on tab(b);

Index created.
SQL> col low_value for a10
SQL> col high_value for a10
SQL> col column_name for a10
SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;




COLUMN_NAM NUM_DISTINCT LOW_VALUE  HIGH_VALUE    DENSITY NUM_BUCKETS LAST_ANALYZED       SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A                 10000 C102       C302            .0001           1 2013-02-18 12:43:23       10000 NONE
B                    10 C102       C302               .1           1 2013-02-18 12:43:23       10000 NONE




Density的出现就是为了分析高频率出现的值的影响。
没有histograms信息的时候,DENSITY永远等于1/NUM_DISTINCT


SQL> 
SQL> select 1/10000 from dual;


   1/10000
----------
     .0001


SQL> 
SQL> select 1/10 from dual;


      1/10
----------
        .1


SQL> 


当我们统计了histograms之后,DENSITY就会发生改变

二、执行表分析,不收集histogram

SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 1 b');

PL/SQL procedure successfully completed.

SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;

COLUMN_NAM NUM_DISTINCT LOW_VALUE  HIGH_VALUE    DENSITY NUM_BUCKETS LAST_ANALYZED       SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A                 10000 C102       C302            .0001           1 2013-02-18 12:43:23       10000 NONE
B                    10 C102       C302               .1           1 2013-02-18 13:12:05       10000 NONE

SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';

TABLE_NAME                     COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB                            B                        0              1
TAB                            A                        0              1
TAB                            B                        1          10000
TAB                            A                        1          10000

SQL> 
SQL> 
SQL> select * from tab where b=5;

9991 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 514792329

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |  1000 |  6000 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB     |  1000 |  6000 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_TAB_B |  1000 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("B"=5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1370  consistent gets
          0  physical reads
          0  redo size
     212167  bytes sent via SQL*Net to client
       7818  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       9991  rows processed

SQL> 

虽然b=5的数据量很大,但是由于没有收集histogram,这里执行计划还是走索引的方式。(如果这里不走表分析,会动态采用,执行计划还是会走全表扫描的方式,这里不在列出)


三、收集histogram信息


情况1:
当bucket = NUM_DISTINCT 时


SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 10 b');

PL/SQL procedure successfully completed.

SQL> 
SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;

COLUMN_NAM NUM_DISTINCT LOW_VALUE  HIGH_VALUE    DENSITY NUM_BUCKETS LAST_ANALYZED       SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A                 10000 C102       C302            .0001           1 2013-02-18 12:43:23       10000 NONE
B                    10 C102       C302           .00005          10 2013-02-18 13:06:11       10000 FREQUENCY

SQL> 

当bucket = NUM_DISTINCT 时,自动创建了一个基于FREQUENCY(频率)的histogram。
SQL> 
SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';

TABLE_NAME                     COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB                            B                        1              1
TAB                            B                        2              2
TAB                            B                        3              3
TAB                            B                        4              4
TAB                            B                     9995              5
TAB                            B                     9996           9996
TAB                            B                     9997           9997
TAB                            B                     9998           9998
TAB                            B                     9999           9999
TAB                            B                    10000          10000
TAB                            A                        0              1

TABLE_NAME                     COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB                            A                        1          10000

12 rows selected.

SQL> 
SQL> 

直方图中的ENDPOINT_VALUE表示列值,ENDPOINT_NUMBER表示累积的行数。
比如ENDPOINT_VALUE=2,ENDPOINT_NUMBER=2,因为ENDPOINT_NUMBER是个累积值,
实际上2的 ENDPOINT_NUMBER应该是2减去上一个值的ENDPOINT_NUMBER,也即是2-1=1。
同理,5的 ENDPOINT_NUMBER=9995-4=9991。

SQL> select * from tab where b=5;

9991 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1995730731

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  9991 | 59946 |     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TAB  |  9991 | 59946 |     6   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("B"=5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1370  consistent gets
          0  physical reads
          0  redo size
     212167  bytes sent via SQL*Net to client
       7818  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       9991  rows processed

收集了histogram信息后,b=5时,执行计划正确。


删除直方图:

SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 1 b');

PL/SQL procedure successfully completed.

SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;

COLUMN_NAM NUM_DISTINCT LOW_VALUE  HIGH_VALUE    DENSITY NUM_BUCKETS LAST_ANALYZED       SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A                 10000 C102       C302            .0001           1 2013-02-18 12:43:23       10000 NONE
B                    10 C102       C302               .1           1 2013-02-18 13:12:05       10000 NONE

SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';

TABLE_NAME                     COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB                            B                        0              1
TAB                            A                        0              1
TAB                            B                        1          10000
TAB                            A                        1          10000

SQL> 
SQL> 
SQL> select * from tab where b=5;

9991 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 514792329

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |  1000 |  6000 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB     |  1000 |  6000 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | I_TAB_B |  1000 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("B"=5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1370  consistent gets
          0  physical reads
          0  redo size
     212167  bytes sent via SQL*Net to client
       7818  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       9991  rows processed

SQL> 

b = 5的执行计划又变为了走索引的方式。


情况2:
当bucket > NUM_DISTINCT 时

SQL> 
SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 30 b');

PL/SQL procedure successfully completed.

SQL> 
SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;

COLUMN_NAM NUM_DISTINCT LOW_VALUE  HIGH_VALUE    DENSITY NUM_BUCKETS LAST_ANALYZED       SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A                 10000 C102       C302            .0001           1 2013-02-18 12:43:23       10000 NONE
B                    10 C102       C302           .00005          10 2013-02-18 13:13:59       10000 FREQUENCY

SQL> 
SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';

TABLE_NAME                     COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB                            B                        2              2
TAB                            B                        3              3
TAB                            B                        4              4
TAB                            B                     9995              5
TAB                            B                     9996           9996
TAB                            B                     9997           9997
TAB                            B                     9998           9998
TAB                            B                     9999           9999
TAB                            B                    10000          10000
TAB                            B                        1              1
TAB                            A                        0              1

TABLE_NAME                     COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB                            A                        1          10000

12 rows selected.

SQL> 
SQL> 

当bucket > NUM_DISTINCT 时,依然自动产生的FREQUENCY类型的histogram,虽然指定了30个bucket,
但是由于只有10个NUM_DISTINCT,所以只生产了10个bucket


SQL> 
SQL> select * from tab where b=5;

9991 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1995730731

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  9991 | 59946 |     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TAB  |  9991 | 59946 |     6   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("B"=5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1370  consistent gets
          0  physical reads
          0  redo size
     212167  bytes sent via SQL*Net to client
       7818  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       9991  rows processed

SQL> 


情况3:
当bucket < NUM_DISTINCT 时

先删除直方图
SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 1 b');

PL/SQL procedure successfully completed.

SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;

COLUMN_NAM NUM_DISTINCT LOW_VALUE  HIGH_VALUE    DENSITY NUM_BUCKETS LAST_ANALYZED       SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A                 10000 C102       C302            .0001           1 2013-02-18 12:43:23       10000 NONE
B                    10 C102       C302               .1           1 2013-02-18 13:17:16       10000 NONE

SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';

TABLE_NAME                     COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB                            B                        0              1
TAB                            A                        0              1
TAB                            B                        1          10000
TAB                            A                        1          10000

SQL> 

创建新的直方图

exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 8 b');

SQL> 
SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 8 b');

PL/SQL procedure successfully completed.

SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;

COLUMN_NAM NUM_DISTINCT LOW_VALUE  HIGH_VALUE    DENSITY NUM_BUCKETS LAST_ANALYZED       SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A                 10000 C102       C302            .0001           1 2013-02-18 12:43:23       10000 NONE
B                    10 C102       C302            .0001           8 2013-02-18 13:18:14       10000 HEIGHT BALANCED

SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';

TABLE_NAME                     COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB                            B                        0              1
TAB                            B                        7              5
TAB                            B                        8          10000
TAB                            A                        0              1
TAB                            A                        1          10000

当bucket < NUM_DISTINCT 时,自动生成了一个基于HEIGHT BALANCED的histogram


ENDPOINT_NUMBER是实际的bucket编号,ENDPOINT_VALUE是根据列值决定的该bucket的endpoint值。


上面的输出中,bucket 0存放着b列的低值,为了节省空间没有显示出1-6号的bucket。
但是我们能够理解,bucket[1-7]里存放着的endpoint=5,而 bucket8里存放endpoint=10000。
因此,实际上bucket0里包含了1-5之间的所有值,而bucket8里包含了5-10000之间的所有值,在本例中也就是9996-10000这5个数值。
也就是说在这9个bucket里面,endpoint=5的占用了其中的7个bucket,说有了5这个值的数据最多。


实验参考了http://www.dbasky.com/oracle/oracle_histogram.html中的部分内容



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值