高效的SQL( clustering factor减少COST)

高效的SQL( clustering factor减少COST)

 

1、创建样表cluster_factor(x有序列,y随意列);实验表cluster01(low)、cluster02(high)

doudou@TEST> create table cluster_factor (x int, y int);

Table created.

doudou@TEST> begin

  2  for i in 1..1000000 loop

  3  insert into cluster_factor values (i,to_char(dbms_random.random,'9999999999999999'));

  4  end loop;

  5  commit;

  6  end;

  7  /

PL/SQL procedure successfully completed.

cluster01表

doudou@TEST> create table cluster01 as select * from cluster_factor;

Table created.

cluster02表

doudou@TEST> create table cluster02 as select * from cluster_factor order by y;

Table created.

index_cluster01索引(索引列有序)

doudou@TEST> create index index_cluster01 on cluster01(x);

Index created.

index_cluster02索引(索引列无序)

doudou@TEST> create index index_cluster02 on cluster02(x);

Index created.

利用dbms_stats收集表的索引信息  (cascade是否收集索引信息选项)

doudou@TEST> begin

  2  dbms_stats.gather_table_stats(user,'cluster01',cascade=>true);

  3   dbms_stats.gather_table_stats(user,'cluster02',cascade=>true);

  4  end;

  5  /

PL/SQL procedure successfully completed.

doudou@TEST> exec dbms_stats.gather_table_stats('DOUDOU','cluster01',cascade=>true);

PL/SQL procedure successfully completed.

doudou@TEST> exec dbms_stats.gather_table_stats('DOUDOU','cluster02',cascade=>true);

PL/SQL procedure successfully completed.

 

2、开启执行计划并查询SQL

doudou@TEST> set autot on

doudou@TEST> select avg(y/(x+1)) from cluster01 where x between 10000 and 30000;

 

AVG(Y/(X+1))

------------

  -369.65884

Execution Plan

----------------------------------------------------------

Plan hash value: 3265002277

 

------------------------------------------------------------------------------------------------

| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |                 |     1 |    12 |   143   (0)| 00:00:02 |

|   1 |  SORT AGGREGATE              |                 |     1 |    12 |            |          |

|   2 |   TABLE ACCESS BY INDEX ROWID| CLUSTER01       | 20144 |   236K|   143   (0)| 00:00:02 |

|*  3 |    INDEX RANGE SCAN          | INDEX_CLUSTER01 | 20144 |       |    47   (0)| 00:00:01 |

------------------------------------------------------------------------------------------------

 

Predicate Information (identified by operation id):

---------------------------------------------------

   3 - access("X">=10000 AND "X"<=30000)

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

        116  consistent gets

         60  physical reads

          0  redo size

        434  bytes sent via SQL*Net to client

        400  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed

 

doudou@TEST> select avg(y/(x+1)) from cluster02 where x between 10000 and 30000;

AVG(Y/(X+1))

------------

  -369.65884

Execution Plan

----------------------------------------------------------

Plan hash value: 2721670139

 

--------------------------------------------------------------------------------

| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |           |     1 |    12 |   542   (1)| 00:00:07 |

|   1 |  SORT AGGREGATE    |           |     1 |    12 |            |          |

|*  2 |   TABLE ACCESS FULL| CLUSTER02 | 20130 |   235K|   542   (1)| 00:00:07 |

--------------------------------------------------------------------------------

问题12个表的数据、索引都是一样的。为什么cluster01走索引,而cluster02全表扫描呢?】

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter("X"<=30000 AND "X">=10000)

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

       2417  consistent gets

       2411  physical reads

          0  redo size

        434  bytes sent via SQL*Net to client

        400  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

1    rows processed     

 

doudou@TEST> SELECT /*+ INDEX(CLUSTER02 INDEX_CLUSTER02)*/AVG(Y/(X+1)) FROM CLUSTER02 WHERE X BETWEEN 10000 AND 30000;

 

AVG(Y/(X+1))

------------

  -369.65884

Execution Plan

----------------------------------------------------------

Plan hash value: 924486639

 

------------------------------------------------------------------------------------------------

| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |                 |     1 |    12 | 20171   (1)| 00:04:03 |

|   1 |  SORT AGGREGATE              |                 |     1 |    12 |            |          |

|   2 |   TABLE ACCESS BY INDEX ROWID| CLUSTER02       | 20130 |   235K| 20171   (1)| 00:04:03 |

|*  3 |    INDEX RANGE SCAN          | INDEX_CLUSTER02 | 20130 |       |    47   (0)| 00:00:01 |

------------------------------------------------------------------------------------------------

问题2强制cluster02走索引,但是最后cost消耗还是很大】

Predicate Information (identified by operation id):

---------------------------------------------------

   3 - access("X">=10000 AND "X"<=30000)

Statistics

----------------------------------------------------------

          1  recursive calls

          0  db block gets

      20040  consistent gets

          0  physical reads

          0  redo size

        434  bytes sent via SQL*Net to client

        400  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

1    rows processed 

 

doudou@TEST> select

  2      idx.index_name,

  3      tab.table_name,

  4      tab.num_rows,

  5      tab.blocks,

  6      idx.clustering_factor

  7  from

  8      user_indexes idx inner join user_tables tab

  9  on idx.table_name = tab.table_name

 10  order by table_name;

INDEX_NAME                     TABLE_NAME                       NUM_ROWS     BLOCKS CLUSTERING_FACTOR

------------------------------ ------------------------------ ---------- ---------- -----------------

INDEX_CLUSTER01                CLUSTER01                         1006949       2459              4726

INDEX_CLUSTER02                CLUSTER02                         1006232       2459            999628

【cluster01与cluster02的clustering factor不同,发现了问题1的答案;

问题1的答案:物理分布的不同导致了索引的选择

问题2的答案:强制走索引不是适合所有的操作,有时也会造成更大的cost消耗

(clustering factor 高,相邻索引值指向更多不同的块,本来一个块可以返回的信息却需要数据库去读更多的块,而这里又强制使用rowid全表扫描所以造成了更多的cost)】  

 

3、总结:

low的clustering factor减少了对相同块的重复读,从而减少cost的消耗,

high的clustering factor增加了对相同块的重复读,从而增加cost的消耗。

后语:建立表按照一定的顺序是有必要的,这样可以减低clustering factor(或者可以说索引建在有序的列上性能会好一些),从而优化sql

 

4、附表:脚本

1、查看clustering_factoer

select

    idx.index_name,

    tab.table_name,

    tab.num_rows,

    tab.blocks,

    idx.clustering_factor

from

    user_indexes idx inner join user_tables tab

on idx.table_name = tab.table_name

order by table_name;

 

2、收集信息

 dbms_stats 与 analyze

 

dbms_stats注意:1、不可以收集集群信息,但可以收集单独表来代替收集整个集群。

              2、收集优化器统计优先考虑

analyze注意:1、use the VALIDATE or LIST CHAINED ROWS clauses、collect information on free list blocks

              2、收集优化器统计不优先考虑

总结:Oracle推荐收集优化器统计使用dbms_stats

 

Note:

Do not use the COMPUTE and ESTIMATE clauses of ANALYZE statement to collect optimizer statistics. These clauses are supported solely for backward compatibility and may be removed in a future release. The DBMS_STATS package collects a broader, more accurate set of statistics, and gathers statistics more efficiently.

You may continue to use ANALYZE statement to for other purposes not related to optimizer statistics collection:

To use the VALIDATE or LIST CHAINED ROWS clauses

To collect information on free list blocks

 

Statistics Gathering Procedures in the DBMS_STATS Package

Procedure

Collects

GATHER_INDEX_STATS

Index statistics

GATHER_TABLE_STATS

Table, column, and index statistics

GATHER_SCHEMA_STATS

Statistics for all objects in a schema

GATHER_DICTIONARY_STATS

Statistics for all dictionary objects

GATHER_DATABASE_STATS

Statistics for all objects in a database

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值