初步理解直方图

[quote][size=large]当系统中的某些表存在高度不均匀的数据分布时,使用直方图能够产生更好的选择性评估,从而产生更加优化的执行计划。[/size][/quote]

[size=large]通过下面的例子来感受直方图的作用[/size]

基础数据
drop user sure cascade;
create user sure identified by oracle;
grant resource to sure;
create table sure.tab (a number, b number);


插入1万条数据
begin
for i in 1..10000 loop
insert into sure.tab values (i, i);
end loop;
commit;
end;
/


制造[color=red]不均匀[/color]的情况
update sure.tab set b=5 where b between 6 and 9995;
commit;


此时b列的数据分布为
select b, count(*) from sure.tab group by b order by b;
B COUNT(*)
---------- ----------
1 1
2 1
3 1
4 1
5 9991
9996 1
9997 1
9998 1
9999 1
10000 1


【1】在创建索引之前,无论是查询b=1或者是b=5,都只能走[color=red]全表扫描[/color]。
explain plan for select * from sure.tab where b=1;
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TAB | 1 | 26 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------


explain plan for select * from sure.tab where b=5;
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9991 | 253K| 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TAB | 9991 | 253K| 6 (0)| 00:00:01 |
--------------------------------------------------------------------------


【2】在b列上创建一个索引
[size=large][color=blue]create index sure.ix_tab_b on sure.tab(b);[/color][/size]

explain plan for select * from sure.tab where b=1;
select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TAB | 1 | 26 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IX_TAB_B | 1 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
- dynamic sampling used for this statement
18 rows selected.


explain plan for select * from sure.tab where b=5;
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9991 | 253K| 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TAB | 9991 | 253K| 6 (0)| 00:00:01 |
--------------------------------------------------------------------------
- dynamic sampling used for this statement
17 rows selected


网上的例子说,在有了索引以后,应该都走INDEX RANGE SCAN,但是实际情况(10201)却是,b=5时,依然选择到了正确的路径--[color=red]全表扫描[/color]。这是因为没有统计信息,Oracle进行了[b][color=blue]动态采样[/color][/b],相当于临时收集了一小份统计信息,所以这时反而挺准的。

【3】收集统计信息,但不收集直方图
[size=large][color=blue]analyze table sure.tab compute statistics;[/color][/size]

与统计信息相关的视图
select num_rows, blocks, empty_blocks, avg_space, chain_cnt, avg_row_len
from dba_tables where table_name = 'TAB';
NUM_ROWS BLOCKS EMPTY_BLOCKS AVG_SPACE CHAIN_CNT AVG_ROW_LEN
---------- ---------- ------------ ---------- ---------- -----------
10000 20 4 2080 0 10

col low_value format a16
col high_value format a16
col column_name format a16
select column_name, num_distinct, low_value, high_value, density, num_buckets
from dba_tab_columns where table_name = 'TAB';
COLUMN_NAME NUM_DISTINCT LOW_VALUE HIGH_VALUE DENSITY NUM_BUCKETS
---------------- ------------ ---------------- ---------------- ---------- -----------
A 10000 C102 C302 .0001 1
B 10 C102 C302 .1 1

select table_name, column_name, endpoint_number, endpoint_value
from dba_tab_histograms where table_name = 'TAB';
TABLE_NA COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
-------- ---------------- --------------- --------------
TAB B 0 1
TAB A 0 1
TAB B 1 10000
TAB A 1 10000


观察执行计划
explain plan for select * from sure.tab where b=1;
select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 6000 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TAB | 1000 | 6000 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IX_TAB_B | 1000 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------


explain plan for select * from sure.tab where b=5;
select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 6000 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TAB | 1000 | 6000 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IX_TAB_B | 1000 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------


有了统计信息,[b]b=5的查询反而跑偏了[/b]。

【4】创建tab表b列的柱状图统计信息,使得优化器能够知道该列每个值的具体分布情况。
[size=large][color=blue]analyze table sure.tab compute statistics for columns b size 10;[/color][/size]

直方图中的ENDPOINT_VALUE表示列值,ENDPOINT_NUMBER表示累积的行数。
select table_name, column_name, endpoint_number, endpoint_value
from dba_histograms where table_name = 'TAB';
TABLE_NA COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
-------- ---------------- --------------- --------------
TAB B 1 1
TAB B 2 2
TAB B 3 3
TAB B 4 4
TAB B 9995 5
TAB B 9996 9996
TAB B 9997 9997
TAB B 9998 9998
TAB B 9999 9999
TAB B 10000 10000
TAB A 0 1
TAB A 1 10000


观察执行计划
explain plan for select * from sure.tab where b=1;
select * from table(dbms_xplan.display);
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TAB | 1 | 6 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IX_TAB_B | 1 | | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------


explain plan for select * from sure.tab where b=5;
select * from table(dbms_xplan.display);
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9991 | 59946 | 6 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TAB | 9991 | 59946 | 6 (0)| 00:00:01 |
--------------------------------------------------------------------------


[size=large]【总结】对于b=5的查询来说,全表扫描比之索引范围扫描更加合理,有直方图,优化器就可以做出正确的判断。(没有统计信息,因为动态采样而选对路径,属于歪打正着)[/size]
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值