高效的SQL( clustering factor减少COST)
1、创建样表cluster_factor(x有序列,y随意列);实验表cluster01(low)、cluster02(high)
doudou@TEST> create table cluster_factor (x int, y int);
Table created.
doudou@TEST> begin
2 for i in 1..1000000 loop
3 insert into cluster_factor values (i,to_char(dbms_random.random,'9999999999999999'));
4 end loop;
5 commit;
6 end;
7 /
PL/SQL procedure successfully completed.
cluster01表
doudou@TEST> create table cluster01 as select * from cluster_factor;
Table created.
cluster02表
doudou@TEST> create table cluster02 as select * from cluster_factor order by y;
Table created.
index_cluster01索引(索引列有序)
doudou@TEST> create index index_cluster01 on cluster01(x);
Index created.
index_cluster02索引(索引列无序)
doudou@TEST> create index index_cluster02 on cluster02(x);
Index created.
利用dbms_stats收集表的索引信息 (cascade是否收集索引信息选项)
doudou@TEST> begin
2 dbms_stats.gather_table_stats(user,'cluster01',cascade=>true);
3 dbms_stats.gather_table_stats(user,'cluster02',cascade=>true);
4 end;
5 /
PL/SQL procedure successfully completed.
或
doudou@TEST> exec dbms_stats.gather_table_stats('DOUDOU','cluster01',cascade=>true);
PL/SQL procedure successfully completed.
doudou@TEST> exec dbms_stats.gather_table_stats('DOUDOU','cluster02',cascade=>true);
PL/SQL procedure successfully completed.
2、开启执行计划并查询SQL
doudou@TEST> set autot on
doudou@TEST> select avg(y/(x+1)) from cluster01 where x between 10000 and 30000;
AVG(Y/(X+1))
------------
-369.65884
Execution Plan
----------------------------------------------------------
Plan hash value: 3265002277
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 143 (0)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 12 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| CLUSTER01 | 20144 | 236K| 143 (0)| 00:00:02 |
|* 3 | INDEX RANGE SCAN | INDEX_CLUSTER01 | 20144 | | 47 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("X">=10000 AND "X"<=30000)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
116 consistent gets
60 physical reads
0 redo size
434 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
doudou@TEST> select avg(y/(x+1)) from cluster02 where x between 10000 and 30000;
AVG(Y/(X+1))
------------
-369.65884
Execution Plan
----------------------------------------------------------
Plan hash value: 2721670139
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 542 (1)| 00:00:07 |
| 1 | SORT AGGREGATE | | 1 | 12 | | |
|* 2 | TABLE ACCESS FULL| CLUSTER02 | 20130 | 235K| 542 (1)| 00:00:07 |
--------------------------------------------------------------------------------
【问题1:2个表的数据、索引都是一样的。为什么cluster01走索引,而cluster02全表扫描呢?】
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("X"<=30000 AND "X">=10000)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2417 consistent gets
2411 physical reads
0 redo size
434 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
doudou@TEST> SELECT /*+ INDEX(CLUSTER02 INDEX_CLUSTER02)*/AVG(Y/(X+1)) FROM CLUSTER02 WHERE X BETWEEN 10000 AND 30000;
AVG(Y/(X+1))
------------
-369.65884
Execution Plan
----------------------------------------------------------
Plan hash value: 924486639
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 12 | 20171 (1)| 00:04:03 |
| 1 | SORT AGGREGATE | | 1 | 12 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| CLUSTER02 | 20130 | 235K| 20171 (1)| 00:04:03 |
|* 3 | INDEX RANGE SCAN | INDEX_CLUSTER02 | 20130 | | 47 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
【问题2:强制cluster02走索引,但是最后cost消耗还是很大】
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("X">=10000 AND "X"<=30000)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
20040 consistent gets
0 physical reads
0 redo size
434 bytes sent via SQL*Net to client
400 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
doudou@TEST> select
2 idx.index_name,
3 tab.table_name,
4 tab.num_rows,
5 tab.blocks,
6 idx.clustering_factor
7 from
8 user_indexes idx inner join user_tables tab
9 on idx.table_name = tab.table_name
10 order by table_name;
INDEX_NAME TABLE_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
INDEX_CLUSTER01 CLUSTER01 1006949 2459 4726
INDEX_CLUSTER02 CLUSTER02 1006232 2459 999628
【cluster01与cluster02的clustering factor不同,发现了问题1的答案;
问题1的答案:物理分布的不同导致了索引的选择
问题2的答案:强制走索引不是适合所有的操作,有时也会造成更大的cost消耗
(clustering factor 高,相邻索引值指向更多不同的块,本来一个块可以返回的信息却需要数据库去读更多的块,而这里又强制使用rowid全表扫描所以造成了更多的cost)】
3、总结:
low的clustering factor减少了对相同块的重复读,从而减少cost的消耗,
high的clustering factor增加了对相同块的重复读,从而增加cost的消耗。
后语:建立表按照一定的顺序是有必要的,这样可以减低clustering factor(或者可以说索引建在有序的列上性能会好一些),从而优化sql
4、附表:脚本
1、查看clustering_factoer
idx.index_name,
tab.table_name,
tab.num_rows,
tab.blocks,
idx.clustering_factor
from
user_indexes idx inner join user_tables tab
on idx.table_name = tab.table_name
order by table_name;
2、收集信息
dbms_stats 与 analyze
dbms_stats注意:1、不可以收集集群信息,但可以收集单独表来代替收集整个集群。
2、收集优化器统计优先考虑
analyze注意:1、use the VALIDATE or LIST CHAINED ROWS clauses、collect information on free list blocks
2、收集优化器统计不优先考虑
总结:Oracle推荐收集优化器统计使用dbms_stats
Note:
Do not use the COMPUTE and ESTIMATE clauses of ANALYZE statement to collect optimizer statistics. These clauses are supported solely for backward compatibility and may be removed in a future release. The DBMS_STATS package collects a broader, more accurate set of statistics, and gathers statistics more efficiently.
You may continue to use ANALYZE statement to for other purposes not related to optimizer statistics collection:
To use the VALIDATE or LIST CHAINED ROWS clauses
To collect information on free list blocks
Statistics Gathering Procedures in the DBMS_STATS Package
Procedure | Collects |
Index statistics | |
Table, column, and index statistics | |
Statistics for all objects in a schema | |
Statistics for all dictionary objects | |
Statistics for all objects in a database |