直方图概念:
直方图,当某个列数据分布不均衡时,为了让cbo生成的执行计划更准确,可以收集直方图。我们可能需要对表收集直方图。直方图最大的桶数(Bucket)是254,默认桶数是75,桶数可以取值范围是1-254,。收集时直方图的时候很耗费cpu,oracle在收集直方图的时候会对表进行全表扫描,对于所有的列值放入不同的桶中。一般情况下,要理性的收集,如果收集不当可能会造成绑定变量窥探。
直方图用在什么情况下?
列的值分布非常不均衡的时候,并且where条件中经常用到这个列。
绑定变量窥探概念:
使用绑定变量,可以使sql共享,从来不产生那么多的硬解析,可以避免4031错误。但有时候绑定变量也会产生一些不好的事情,由于绑定变量而导致oracle没有正常的走应该走的执行计划。绑定变量窥探就是oracle在第一次硬解析一个带有绑定变量的sql时,会窥探(查看)变量的赋值,将真实值带入并生成一个执行计划,而以后同类型的sql都使用改执行计划,如果此时的真实值恰好是选择性小的值,生成了走索引的执行计划,那万一下次的真实值的选择性很大,其实走全表扫描更好,但由于之前的绑定变量窥探导致走了索引,那性能会查很多。
直方图实验:
实验:
SQL> create table test as select * from dba_objects;
Table created.
SQL> BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname => 'SYS',
tabname => 'TEST',
estimate_percent => 100,---采样比率
method_opt => 'for all columns size skewonly',
degree => 1,---指定并行度
cascade => TRUE);--指定也收集相关表的索引的统计信息
END;
/
PL/SQL procedure successfully completed.
--这里method_opt(与列的统计相关)中size后面的值(size后面就是和直方图有关)有几种:
auto:Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.它基于数据的分布以及应用程序访问列的方式来创建直方图。
repeat:只会为现有的直方图重新分析索引,不再生成新的直方图。
1-254桶数:size后面如果跟着10,那就代表用10个桶
skewonly:Oracle determines the columns to collect histograms based on the data distribution of the columns。只会根据column的数据分布情况决定是否收集histogram。
这里auto和skewonly类似,但区别是skewonly一定会收集这个列的直方图,但auto是会根据列的数据和workload。
SQL> select a.column_name,
b.num_rows,
a.num_distinct Cardinality,
round(a.num_distinct / b.num_rows * 100, 2) selectivity,
a.histogram,
a.num_buckets
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.owner
and a.table_name = b.table_name
and a.table_name = 'TEST';
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
------------------------------ ---------- ----------- ----------- --------------- -----------
OWNER 13202 8 .06 FREQUENCY 8
OBJECT_NAME 13202 10219 77.4 HEIGHT BALANCED 254
SUBOBJECT_NAME 13202 83 .63 FREQUENCY 83
OBJECT_ID 13202 13202 100 NONE 1
DATA_OBJECT_ID 13202 2689 20.37 HEIGHT BALANCED 254
OBJECT_TYPE 13202 37 .28 FREQUENCY 37
CREATED 13202 389 2.95 HEIGHT BALANCED 254
LAST_DDL_TIME 13202 407 3.08 HEIGHT BALANCED 254
TIMESTAMP 13202 402 3.04 HEIGHT BALANCED 254
STATUS 13202 1 .01 FREQUENCY 1
TEMPORARY 13202 2 .02 FREQUENCY 2
GENERATED 13202 2 .02 FREQUENCY 2
SECONDARY 13202 1 .01 FREQUENCY 1
NAMESPACE 13202 15 .11 FREQUENCY 15
EDITION_NAME 13202 0 0 NONE 0
15 rows selected.
解释一下:
CARDINALITY是基数,
SELECTIVITY是选择性,
HISTOGRAM是直方图,
频率直方图(FREQUENCY HISTOGRAM):num_buckets<254时,oracle会收集频率直方图。也可以说频率直方图的话,num_buckets就等于distinct的数量,(如果size 后面跟着的值不是桶数的话,比如是auto的话,那么如果一个列的基数小于254,那么它的桶数就等于基数。
)
高度平衡直方图(HEIGHT BALANCED):num_buckets=254时,oracle会收集高度平衡直方图。
对于object_name,不应该收集直方图,可能是收集统计信息的时候,不小心收集的直方图(method_opt => 'for all columns size skewonly',),导致对表收集统计信息的时候对所有的列都收集了直方图。也可以说如果当一个列的值得选择性很高的话,或者说这个列的distinct值很大的话大到已经要接近与主键的话,那就没必要收集直方图了。
在owner有直方图的情况下,测试执行计划:
SQL> create index ind_mao on test(owner);
Index created.
SQL> set autotrace traceonly
SQL> select * from test where owner='SYS';
9280 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9280 | 806K| 48 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 9280 | 806K| 48 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OWNER"='SYS')
Statistics
----------------------------------------------------------
112 recursive calls
0 db block gets
1016 consistent gets
0 physical reads
0 redo size
404254 bytes sent via SQL*Net to client
7214 bytes received via SQL*Net from client
620 SQL*Net roundtrips to/from client
76 sorts (memory)
0 sorts (disk)
9280 rows processed
SQL> select * from test where owner='SH';
Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 178 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 2 | 178 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_MAO | 2 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SH')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
1519 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
删除直方图信息:
SQL> BEGIN
DBMS_STATS.GATHER_TABLE_STATS(ownname => 'SYS',
tabname => 'TEST',
estimate_percent => 100,
method_opt => 'for all columns size 1',
degree => 1,
cascade => TRUE);
END;
/
SQL> select a.column_name,
2 b.num_rows,
3 a.num_distinct Cardinality,
4 round(a.num_distinct / b.num_rows * 100, 2) selectivity,
5 a.histogram,
6 a.num_buckets
7 from dba_tab_col_statistics a, dba_tables b
8 where a.owner = b.owner
9 and a.table_name = b.table_name
10 and a.table_name = 'TEST';
COLUMN_NAME NUM_ROWS CARDINALITY SELECTIVITY HISTOGRAM NUM_BUCKETS
------------------------------ ---------- ----------- ----------- --------------- -----------
OWNER 13202 8 .06 NONE 1
OBJECT_NAME 13202 10219 77.4 NONE 1
SUBOBJECT_NAME 13202 83 .63 NONE 1
OBJECT_ID 13202 13202 100 NONE 1
DATA_OBJECT_ID 13202 2689 20.37 NONE 1
OBJECT_TYPE 13202 37 .28 NONE 1
CREATED 13202 389 2.95 NONE 1
LAST_DDL_TIME 13202 407 3.08 NONE 1
TIMESTAMP 13202 402 3.04 NONE 1
STATUS 13202 1 .01 NONE 1
TEMPORARY 13202 2 .02 NONE 1
GENERATED 13202 2 .02 NONE 1
SECONDARY 13202 1 .01 NONE 1
NAMESPACE 13202 15 .11 NONE 1
EDITION_NAME 13202 0 0 NONE 0
15 rows selected.
SQL> select * from test where owner='SH';
Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1650 | 143K| 46 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1650 | 143K| 46 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_MAO | 1650 | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SH')
Statistics
----------------------------------------------------------
37 recursive calls
0 db block gets
64 consistent gets
0 physical reads
0 redo size
1519 bytes sent via SQL*Net to client
416 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
2 rows processed
这里虽然只返回2行结果,但oracle认为返回了1650行结果。这里的1650其实是13202/8。
我们没有对owner列收集直方图,那么oracle就会认为owner这个列是均衡的,就会认为行数(13202)/基数(8个值) 就是oracle认为的行数。得到了1650行,oracle觉得1650行对于1w多行中是应该走索引的。所以就走了索引。
select * from test where owner='SYS';
SQL> select * from test where owner='SYS';
9280 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3105505699
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1650 | 143K| 46 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 1650 | 143K| 46 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_MAO | 1650 | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SYS')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1421 consistent gets
0 physical reads
0 redo size
938251 bytes sent via SQL*Net to client
7214 bytes received via SQL*Net from client
620 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
9280 rows processed
如果是sys的话,同样计算,oracle也是因为行数为1650,也还是会走索引。
select /*+ full(test) */ * from test where owner='SYS';
SQL> select /*+ full(test) */ * from test where owner='SYS';
9280 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1650 | 143K| 48 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 1650 | 143K| 48 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OWNER"='SYS')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
789 consistent gets
0 physical reads
0 redo size
404254 bytes sent via SQL*Net to client
7214 bytes received via SQL*Net from client
620 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
9280 rows processed
逻辑读比走索引要低。
为啥全表扫描要比索引的逻辑读要高呢?
SQL> select blocks from dba_segments where segment_name='TEST'
2 ;
BLOCKS
----------
256
SQL> show parameter db_f
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_file_multiblock_read_count integer 120
多块读的个数是120。那消耗io的次数就是789/16次。所以49次io就可以全表扫描完毕了。
索引扫描的io次数:
索引扫描是单词io单块读(除了INDEX FAST FULL SCAN)。我们这里的索引是INDEX RANGE SCAN ,是单块读,假如索引的高度是2,那如果扫描索引块的话,至少要扫描2个块,那就需要2次io,刚才索引返回9280行数据,如果一个索引块能存100条记录,那需要90个索引块才可以得到这些数据。所以需要进行90次的io,然后得到rowid后需要回表再得到数据。得到9k多个rowid后,回到表里查这些查rowid所对应的表中数据,至少还需要至少1百个左右的io。
如果有一个1000w行的表,那么如果查询返回结果是500w,那索引的高度为3的话,那一个索引块存100行记录,那就需要扫描5w次,再回表的话,也需要几万个io。那差不多就是10w次io。但如是全表扫描的话,那就只有1w个io。索引是单块读,所以涉及的等待事件也是db file sequential read 。如果看到这个等待事件,可以判断出可能是需要全表的但走了索引。
一次io单块读和一次io多快读的时间差别并不太大,对于现在的存储,多快读其实大多是在多个存储里读的,因为存储有条带化,多块读的话是在多个磁盘读,单块读的话是在一个存储里读,所以差别并不大。所以速度基本一样的。其实也都是毫秒里计算的,只有io扫描了上千万次,那可能给我们的感觉才是慢,如果只是io几万次,那我们可能是感觉不出来慢的。
如果我们没计算直方图,那oracle就简单的相除,如果我们计算了直方图,那oracle就知道了,不能单纯的相除了,要根据桶里的值来计算。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/24500180/viewspace-1952637/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/24500180/viewspace-1952637/