/*直方图*/
/*前面提到,当某个列基数很低,该列数据分布就会不均衡。数据分布不均衡会导致在查询该列的时候,要么走全表扫描要么走索引扫描,这个时候很容易走错执行计划*/。
/*
如果没有对基数低的列收集直方图统计信息,基于成本的优化器(CBO)会认为该列数据分布是均衡的。*/
/*首先我们对测试表test收集统计信息,在收集统计信息的时候,不收集列的直方图,语句for all columns size 1 表示对所有列都不收集统计信息。*/
BEGIN
DBMS_STATS.gather_table_stats(ownname => 'SCOTT',
tabname => 'TEST',
estimate_percent => 100,
method_opt => 'for all columns size 1',
no_invalidate => FALSE,
degree => 1,
cascade => TRUE);
END;
/
/*Histogram 为none表示没有收集直方图。*/
select a.column_name,
b.NUM_ROWS,
a.num_distinct Cardinality,
round(a.num_distinct / b.NUM_ROWS * 100,2) selectivity,
a.HISTOGRAM,
a.num_buckets
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.OWNER
and a.table_name = b.TABLE_NAME
and a.owner = 'SCOTT'
and a.table_name = 'TEST';
owner列基数很低,现在我们对owner进行查询。
SQL> set autot trace
SQL> select * from test where owner = 'SCOTT';
7 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2499 | 236K| 289 (1)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| TEST | 2499 | 236K| 289 (1)| 00:00:04 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OWNER"='SCOTT')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1038 consistent gets
0 physical reads
0 redo size
1907 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
7 rows processed
SQL>
请注意看粗体部分,查询owner= 'SCOTT'返回了7条数据,但是CBO在计算Rows的时候认为owner = 'SCOTT' 返回2499条数据,Rows 估算得不是特别的准确。从72477条数据里面查询7条数据,应该走索引,所以现在我们对owner 列创建索引。
SQL> create index idx_owner on test(owner);
Index created.
SQL>
现在再来查询一下。
SQL> set pagesize 200
SQL> set line 300
SQL> select * from test where owner = 'SCOTT';
7 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3932013684
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2499 | 236K| 73 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 2499 | 236K| 73 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OWNER | 2499 | | 6 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SCOTT')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
2174 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
7 rows processed
SQL>
现在我们查询owner = 'SYS'。
SQL> select * from test where owner = 'SYS';
30812 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3932013684
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2499 | 236K| 73 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 2499 | 236K| 73 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OWNER | 2499 | | 6 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SYS')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
4927 consistent gets
69 physical reads
0 redo size
3502852 bytes sent via SQL*Net to client
23117 bytes received via SQL*Net from client
2056 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
30812 rows processed
SQL>
注意粗体字体部分,查询owner = 'SYS'返回30812条数据,从72477条数据里面返回30812条数据能走索引吗?很明显应该走全表扫描。也就是说该执行计划是错误的。
为什么查询owner= 'SYS'的执行计划是错误的呢?因为owner 这个列基数很低,只有29,而表的总行数是72477.前文着重强调过,当列没有收集直方图统计信息的时候,CBO会认为该数据分布是均衡的。正是因为CBO认为owner列数据分布是均衡的,不管owner等于任何值,CBO估算的Rows永远都是2499。而这2499是怎么来的呢?答案如下:
现在大家也知道了,执行计划里面的Rows是假的。执行计划中的Rows是根据统计信息以及一些数学公式计算出来的。很多DBA 到现在都还不知道执行计划中的Rows 是假的这个真相。真是令人遗憾。
在做SQL优化的时候,经常需要做的工作就是帮助CBO计算出比较准确的Rows。注意:我们说的是比较准确的Rows.CBO是无法得到精确的Rows的,因为对表收集统计信息的时候,统计信息一般都不会按照100%的标准采样收集,即使按照100%的标准采样收集了表的统计信息,表中数据也随时在发生变更。另外计算Rows的数学公式目前也是有缺陷的,CBO永远不可能计算得到精确的Rows.
如果CBO每次都能计算得到精确的Rows,那么相信我们这个时候只需要关心业务逻辑、表设计、SQL写法以及如何建立索引了,再也不用担心SQL会走错执行计划了。
为了让CBO选择正确的执行计划,我们需要对owner列收集直方图信息,从而告知CBO该列数据分布不均衡。让CBO在计算Rows的时候参考直方图统计。现在我们对owner列收集直方图。
SQL> BEGIN
2 DBMS_STATS.gather_table_stats(ownname => 'SCOTT',
3 tabname => 'TEST',
4 estimate_percent => 100,
5 method_opt => 'for all columns size 1',
6 no_invalidate => FALSE,
7 degree => 1,
8 cascade => TRUE);
9 END;
10 /
PL/SQL procedure successfully completed.
SQL>
查看一下owner 列的直方图信息。
select a.column_name,
b.NUM_ROWS,
a.num_distinct Cardinality,
round(a.num_distinct / b.NUM_ROWS * 100,2) selectivity,
a.HISTOGRAM,
a.num_buckets
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.OWNER
and a.table_name = b.TABLE_NAME
and a.owner = 'SCOTT'
and a.table_name = 'TEST';
现在我们再来查询上面的SQL,看执行计划是否还会走错并且验证Rows是否还会算错。
SQL> select * from test where owner = 'SCOTT';
7 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3932013684
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 679 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 7 | 679 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OWNER | 7 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='SCOTT')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
2174 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
7 rows processed
SQL> select * from test where owner = 'SYS';
30812 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 30812 | 2918K| 289 (1)| 00:00:04 |
|* 1 | TABLE ACCESS FULL| TEST | 30812 | 2918K| 289 (1)| 00:00:04 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OWNER"='SYS')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3069 consistent gets
0 physical reads
0 redo size
1536346 bytes sent via SQL*Net to client
23117 bytes received via SQL*Net from client
2056 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
30812 rows processed
SQL>
读者只需要知道直方图是用来帮助CBO在对基数很低、数据分布不均衡的列进行Rows估算的时候,可以得到更精确的Rows就够了。
什么样的列需要收集直方图呢?当列出现在where 条件中,列的选择性小于1%并且该列没有收集过直方图。这样的列就应该收集直方图。注意:千万不能对没有出现在where 条件中的列收集直方图。对没有出现在where 条件中的列收集直方图完全是做无用功,浪费数据库资源。
下面我们为大家分享第二个全自动化优化脚本。
抓出必须创建直方图的列(大家可以对该脚本适当修改,以便于用于生产环境)。
2,抓出需要收集直方图的列
此脚本依赖统计信息。
当一个表比较大,列选择性低于5%,而且列出现在where 条件中,
为了防止优化器估算Rows出现较大偏差,我们需要对这种列收集直方图。
以下脚本抓出Scott账户下,表总行数大于5万行、列选择性低于5%并且列出现在where 条件中的表以及列信息。
select a.owner,
a.table_name,
a.column_name,
a.num_distinct Cardinality,
round(a.num_distinct / b.num_rows * 100,2) selectivity
from dba_tab_col_statistics a, dba_tables b
where a.owner = b.OWNER
and a.table_name = b.TABLE_NAME
and a.owner = 'SCOTT'
and round(a.num_distinct / b.num_rows * 100,2) < 5
and num_rows > 50000
and (a.table_name,a.column_name) in
(select o.name,c.name
from sys.col_usage$ u,sys.obj$ o,sys.col$ c,sys.user$ r
where o.obj# = u.obj#
and c.obj# = u.obj#
and c.col# = u.intcol#
and r.name = 'SCOTT');