通常情况下当BUCTET < 表的NUM_DISTINCT值得到的是HEIGHT BALANCED(高度平衡)直方图,
这样表中有10000条数据,对于b列,有10个不同的值,等于5的值有9991个,
此时无论是选择b=1或者b=5的数据都必须要进行全表扫描,
因为表上没有索引,也即是没有别的访问路径。
虽然b=5的数据量很大,但是由于没有收集histogram,这里执行计划还是走索引的方式。(如果这里不走表分析,会动态采用,执行计划还是会走全表扫描的方式,这里不在列出)
直方图中的ENDPOINT_VALUE表示列值,ENDPOINT_NUMBER表示累积的行数。
比如ENDPOINT_VALUE=2,ENDPOINT_NUMBER=2,因为ENDPOINT_NUMBER是个累积值,
实际上2的 ENDPOINT_NUMBER应该是2减去上一个值的ENDPOINT_NUMBER,也即是2-1=1。
同理,5的 ENDPOINT_NUMBER=9995-4=9991。
收集了histogram信息后,b=5时,执行计划正确。
b = 5的执行计划又变为了走索引的方式。
当bucket < NUM_DISTINCT 时,自动生成了一个基于HEIGHT BALANCED的histogram
ENDPOINT_NUMBER是实际的bucket编号,ENDPOINT_VALUE是根据列值决定的该bucket的endpoint值。
上面的输出中,bucket 0存放着b列的低值,为了节省空间没有显示出1-6号的bucket。
但是我们能够理解,bucket[1-7]里存放着的endpoint=5,而 bucket8里存放endpoint=10000。
因此,实际上bucket0里包含了1-5之间的所有值,而bucket8里包含了5-10000之间的所有值,在本例中也就是9996-10000这5个数值。
也就是说在这9个bucket里面,endpoint=5的占用了其中的7个bucket,说有了5这个值的数据最多。
而当BUCTET >= 表的NUM_DISTINCT值的时候得到的是FREQUENCY(频率)直方图。
一、准备数据
SQL> create table tab(a number,b number);
Table created.
SQL> begin
2 for i in 1..10000 loop
3 insert into tab values(i,i);
4 end loop;
5 commit;
6 end;
7 /
PL/SQL procedure successfully completed.
SQL> update tab set b=5 where b between 6 and 9995;
9990 rows updated.
SQL> commit;
Commit complete.
SQL> select b,count(*) from tab group by b;
B COUNT(*)
---------- ----------
1 1
9997 1
9999 1
10000 1
2 1
4 1
5 9991
9998 1
3 1
9996 1
10 rows selected.
SQL>
这样表中有10000条数据,对于b列,有10个不同的值,等于5的值有9991个,
此时无论是选择b=1或者b=5的数据都必须要进行全表扫描,
因为表上没有索引,也即是没有别的访问路径。
创建索引:
SQL>
SQL> create index i_tab_b on tab(b);
Index created.
SQL> col low_value for a10
SQL> col high_value for a10
SQL> col column_name for a10
SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;
COLUMN_NAM NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A 10000 C102 C302 .0001 1 2013-02-18 12:43:23 10000 NONE
B 10 C102 C302 .1 1 2013-02-18 12:43:23 10000 NONE
Density的出现就是为了分析高频率出现的值的影响。
没有histograms信息的时候,DENSITY永远等于1/NUM_DISTINCT
SQL>
SQL> select 1/10000 from dual;
1/10000
----------
.0001
SQL>
SQL> select 1/10 from dual;
1/10
----------
.1
SQL>
当我们统计了histograms之后,DENSITY就会发生改变
二、执行表分析,不收集histogram
SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 1 b');
PL/SQL procedure successfully completed.
SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;
COLUMN_NAM NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A 10000 C102 C302 .0001 1 2013-02-18 12:43:23 10000 NONE
B 10 C102 C302 .1 1 2013-02-18 13:12:05 10000 NONE
SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB B 0 1
TAB A 0 1
TAB B 1 10000
TAB A 1 10000
SQL>
SQL>
SQL> select * from tab where b=5;
9991 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 514792329
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 6000 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TAB | 1000 | 6000 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | I_TAB_B | 1000 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("B"=5)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1370 consistent gets
0 physical reads
0 redo size
212167 bytes sent via SQL*Net to client
7818 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
9991 rows processed
SQL>
虽然b=5的数据量很大,但是由于没有收集histogram,这里执行计划还是走索引的方式。(如果这里不走表分析,会动态采用,执行计划还是会走全表扫描的方式,这里不在列出)
三、收集histogram信息
情况1:
当bucket = NUM_DISTINCT 时
SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 10 b');
PL/SQL procedure successfully completed.
SQL>
SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;
COLUMN_NAM NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A 10000 C102 C302 .0001 1 2013-02-18 12:43:23 10000 NONE
B 10 C102 C302 .00005 10 2013-02-18 13:06:11 10000 FREQUENCY
SQL>
当bucket = NUM_DISTINCT 时,自动创建了一个基于FREQUENCY(频率)的histogram。
SQL>
SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB B 1 1
TAB B 2 2
TAB B 3 3
TAB B 4 4
TAB B 9995 5
TAB B 9996 9996
TAB B 9997 9997
TAB B 9998 9998
TAB B 9999 9999
TAB B 10000 10000
TAB A 0 1
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB A 1 10000
12 rows selected.
SQL>
SQL>
直方图中的ENDPOINT_VALUE表示列值,ENDPOINT_NUMBER表示累积的行数。
比如ENDPOINT_VALUE=2,ENDPOINT_NUMBER=2,因为ENDPOINT_NUMBER是个累积值,
实际上2的 ENDPOINT_NUMBER应该是2减去上一个值的ENDPOINT_NUMBER,也即是2-1=1。
同理,5的 ENDPOINT_NUMBER=9995-4=9991。
SQL> select * from tab where b=5;
9991 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1995730731
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9991 | 59946 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TAB | 9991 | 59946 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("B"=5)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1370 consistent gets
0 physical reads
0 redo size
212167 bytes sent via SQL*Net to client
7818 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
9991 rows processed
收集了histogram信息后,b=5时,执行计划正确。
删除直方图:
SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 1 b');
PL/SQL procedure successfully completed.
SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;
COLUMN_NAM NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A 10000 C102 C302 .0001 1 2013-02-18 12:43:23 10000 NONE
B 10 C102 C302 .1 1 2013-02-18 13:12:05 10000 NONE
SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB B 0 1
TAB A 0 1
TAB B 1 10000
TAB A 1 10000
SQL>
SQL>
SQL> select * from tab where b=5;
9991 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 514792329
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 6000 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TAB | 1000 | 6000 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | I_TAB_B | 1000 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("B"=5)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1370 consistent gets
0 physical reads
0 redo size
212167 bytes sent via SQL*Net to client
7818 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
9991 rows processed
SQL>
b = 5的执行计划又变为了走索引的方式。
情况2:
当bucket > NUM_DISTINCT 时
SQL>
SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 30 b');
PL/SQL procedure successfully completed.
SQL>
SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;
COLUMN_NAM NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A 10000 C102 C302 .0001 1 2013-02-18 12:43:23 10000 NONE
B 10 C102 C302 .00005 10 2013-02-18 13:13:59 10000 FREQUENCY
SQL>
SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB B 2 2
TAB B 3 3
TAB B 4 4
TAB B 9995 5
TAB B 9996 9996
TAB B 9997 9997
TAB B 9998 9998
TAB B 9999 9999
TAB B 10000 10000
TAB B 1 1
TAB A 0 1
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB A 1 10000
12 rows selected.
SQL>
SQL>
当bucket > NUM_DISTINCT 时,依然自动产生的FREQUENCY类型的histogram,虽然指定了30个bucket,
但是由于只有10个NUM_DISTINCT,所以只生产了10个bucket
SQL>
SQL> select * from tab where b=5;
9991 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1995730731
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9991 | 59946 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TAB | 9991 | 59946 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("B"=5)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1370 consistent gets
0 physical reads
0 redo size
212167 bytes sent via SQL*Net to client
7818 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
9991 rows processed
SQL>
情况3:
当bucket < NUM_DISTINCT 时
先删除直方图
SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 1 b');
PL/SQL procedure successfully completed.
SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;
COLUMN_NAM NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A 10000 C102 C302 .0001 1 2013-02-18 12:43:23 10000 NONE
B 10 C102 C302 .1 1 2013-02-18 13:17:16 10000 NONE
SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB B 0 1
TAB A 0 1
TAB B 1 10000
TAB A 1 10000
SQL>
创建新的直方图
exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 8 b');
SQL>
SQL> exec DBMS_stats.gather_table_stats(cascade => TRUE,degree => 2,estimate_percent => 100,force => TRUE,ownname => USER,tabname => 'TAB',method_opt => 'for columns size 8 b');
PL/SQL procedure successfully completed.
SQL> select COLUMN_NAME,num_distinct, low_value, high_value, density, num_buckets, last_analyzed, SAMPLE_SIZE,HISTOGRAM from USER_TAB_COLS;
COLUMN_NAM NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_BUCKETS LAST_ANALYZED SAMPLE_SIZE HISTOGRAM
---------- ------------ ---------- ---------- ---------- ----------- ------------------- ----------- ---------------
A 10000 C102 C302 .0001 1 2013-02-18 12:43:23 10000 NONE
B 10 C102 C302 .0001 8 2013-02-18 13:18:14 10000 HEIGHT BALANCED
SQL> select table_name, column_name, endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB';
TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
------------------------------ ---------- --------------- --------------
TAB B 0 1
TAB B 7 5
TAB B 8 10000
TAB A 0 1
TAB A 1 10000
当bucket < NUM_DISTINCT 时,自动生成了一个基于HEIGHT BALANCED的histogram
ENDPOINT_NUMBER是实际的bucket编号,ENDPOINT_VALUE是根据列值决定的该bucket的endpoint值。
上面的输出中,bucket 0存放着b列的低值,为了节省空间没有显示出1-6号的bucket。
但是我们能够理解,bucket[1-7]里存放着的endpoint=5,而 bucket8里存放endpoint=10000。
因此,实际上bucket0里包含了1-5之间的所有值,而bucket8里包含了5-10000之间的所有值,在本例中也就是9996-10000这5个数值。
也就是说在这9个bucket里面,endpoint=5的占用了其中的7个bucket,说有了5这个值的数据最多。
实验参考了http://www.dbasky.com/oracle/oracle_histogram.html中的部分内容