索引的聚集因子对查询性能的影响
视图USER_INDEXES有一列clustering_factor,我们称为聚集因子,该列反映了数据相对于已索引的列是否有序。如果clustering_factor列的值接近于索引中的树叶块(leaf block)的数目,表中的数据就越有序,如果列的值接近于表中的行数,则表中的数据则不是很有序。
如果我们对被索引的列进行范围查询,如果数据是按照顺序排列,则只需几次读取数据块,就可以检索出所需的数据,如果数据无序,每块只包含几条符合条件的数据,因此系统可能需要读取更多的数据块。
例如:
SQL> create table test_order
2 as
3 select VISIT_DATE, VISIT_NO
4 from outp_bill_items
5 order by VISIT_DATE ;
Table created.
SQL> create index ind_test_order on test_order(VISIT_DATE);
Index created.
SQL> exec dbms_stats.gather_table_stats(user , 'TEST_ORDER');
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_index_stats(user , 'IND_TEST_ORDER');
PL/SQL procedure successfully completed.
SQL> select LEAF_BLOCKS , CLUSTERING_FACTOR
2 from user_indexes where index_name='IND_TEST_ORDER';
LEAF_BLOCKS CLUSTERING_FACTOR
----------- -----------------
1494 1362
SQL> set autotrace trace
SQL> select * from test_order
2 where visit_date between to_date('20040101','YYYYMMDD')
3 and to_date('20040201','YYYYMMDD');
21656 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT ptimizer=CHOOSE (Cost=104 Card=19914 Bytes
=219054)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_ORDER' (Cost=104 Ca
rd=19914 Bytes=219054)
2 1 INDEX (RANGE SCAN) OF 'IND_TEST_ORDER' (NON-UNIQUE) (Cos
t=55 Card=19914)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2987 consistent gets
50 physical reads
0 redo size
1035456 bytes sent via SQL*Net to client
177880 bytes received via SQL*Net from client
1446 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
21656 rows processed
对于非排序的数据:
SQL> create table test_noorder
2 as
3 select VISIT_DATE, VISIT_NO
4 from outp_bill_items
5 order by dbms_random.value() ;
Table created.
SQL> create index ind_test_noorder on test_noorder(VISIT_DATE) nologging ;
Index created.
SQL> exec dbms_stats.gather_table_stats(user , 'TEST_NOORDER');
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_index_stats(user , 'IND_TEST_NOORDER');
PL/SQL procedure successfully completed.
SQL> select LEAF_BLOCKS , CLUSTERING_FACTOR
2 from user_indexes where index_name='IND_TEST_NOORDER';
LEAF_BLOCKS CLUSTERING_FACTOR
----------- -----------------
1494 563803
SQL> set autotrace trace
SQL> select * from test_noorder
2 where visit_date between to_date('20040101','YYYYMMDD')
3 and to_date('20040201','YYYYMMDD');
21656 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT ptimizer=CHOOSE (Cost=201 Card=19914 Bytes
=219054)
1 0 TABLE ACCESS (FULL) OF 'TEST_NOORDER' (Cost=201 Card=19914
Bytes=219054)
Statistics
----------------------------------------------------------
0 recursive calls
36 db block gets
2770 consistent gets
1318 physical reads
0 redo size
661515 bytes sent via SQL*Net to client
177642 bytes received via SQL*Net from client
1445 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
21656 rows processed
可以看出,两钟情况下,相同的SQL语句,物理的读之间的差距还是比较大。如果数据未排序,系统会选择全表扫描而不是使用索引。
聚集因子这个值对区间范围查询的影响比较大,对于那些静态又要求区间查询,最好能够整理一下数据,使得按顺序排列。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/15881763/viewspace-694520/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/15881763/viewspace-694520/