也谈DBA_TAB_COLS跟DBA_TAB_COLUMNS区别
对字段定义函数索引后,dab_tab_cols会增加对应条目(不知道正式的术语是什么,姑且称之为虚拟列);
dba_tab_columns则只包含真实字段
所以你会发现二者的记录数有差异
但是,只有执行ANALYZE TABLE,dab_tab_cols中隐藏列的相关信息才有值,而DBMS_STATS对此无效
有时某些不良SQL对WHERE条件中的字段做了函数转换
如where to_char(mydate,'yyyymm')='201108'
可能导致CBO低估CARDINALITY,选择错误的执行计划
对此,最好的办法当然是改写SQL;但如果条件不允许,也可以考虑创建函数索引,并通过ANALYZE提供额外的虚拟列统计信息,以帮助CBO修正执行计划
下面是一个测试用例:
drop table a;
create table a as select * from dba_objects where rownum<=1000 order by dbms_random.random;
update a set created=to_date('201108','yyyymm');
commit;
drop table b;
create table b as select * from dba_objects where rownum<=10000 ;
create index b$object_id on b(object_id);
exec dbms_stats.gather_table_stats('DSG','A',cascade=>true);
exec dbms_stats.gather_table_stats('DSG','B',cascade=>true);
select count(b.owner) from a,b
where a.object_id=b.object_id
and to_char(a.created,'yyyymm')='201108';
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 22 | 24 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 22 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| B | 1 | 10 | 2 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 10 | 220 | 24 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | A | 10 | 120 | 4 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | B$OBJECT_ID | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
4 - filter(TO_CHAR(INTERNAL_FUNCTION("A"."CREATED"),'yyyymm')='201108')
5 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
1936 consistent gets
0 physical reads
0 redo size
521 bytes sent via SQL*Net to client
492 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
注意,执行计划第4行,实际的返回行数应该是1000,但被低估为10
因此系统计算的COST值当然也“虚低”
create index a$created on a(to_char(created,'yyyymm'));
analyze table a compute statistics;
经过建函数索引,ANALYZE TALBE,下面再执行,优化器得到了正确的返回行数
select count(b.owner) from a,b
where a.object_id=b.object_id
and to_char(a.created,'yyyymm')='201108';
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 34 (3)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 20 | | |
|* 2 | HASH JOIN | | 1000 | 20000 | 34 (3)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| A | 1000 | 10000 | 4 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| B | 10000 | 97K| 29 (0)| 00:00:01 |
----------------------------------------------------------------------------
2 - access("A"."OBJECT_ID"="B"."OBJECT_ID")
3 - filter(TO_CHAR(INTERNAL_FUNCTION("A"."CREATED"),'yyyymm')='201108
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
144 consistent gets
0 physical reads
0 redo size
521 bytes sent via SQL*Net to client
492 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
因对返回行数的估算差异,导致两种不同的表关联方式,实际开销即consistent gets差异巨大
第二种执行计划COST值较高,但实际的执行开销远小于第一种