高效的SQL( clustering factor减少COST)

最新推荐文章于 2024-04-19 17:16:47 发布

lovehewenyu

最新推荐文章于 2024-04-19 17:16:47 发布

阅读量676

点赞数

分类专栏：高效SQL

本文链接：https://blog.csdn.net/dbadoudou/article/details/8282363

版权

高效SQL 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

高效的SQL( clustering factor减少COST)

1、创建样表cluster_factor(x有序列，y随意列);实验表cluster01(low)、cluster02(high)

doudou@TEST> create table cluster_factor (x int, y int);

Table created.

doudou@TEST> begin

2 for i in 1..1000000 loop

3 insert into cluster_factor values (i,to_char(dbms_random.random,'9999999999999999'));

4 end loop;

5 commit;

6 end;

7 /

PL/SQL procedure successfully completed.

cluster01表

doudou@TEST> create table cluster01 as select * from cluster_factor;

Table created.

cluster02表

doudou@TEST> create table cluster02 as select * from cluster_factor order by y;

Table created.

index_cluster01索引（索引列有序）

doudou@TEST> create index index_cluster01 on cluster01(x);

Index created.

index_cluster02索引（索引列无序）

doudou@TEST> create index index_cluster02 on cluster02(x);

Index created.

利用dbms_stats收集表的索引信息 (cascade是否收集索引信息选项)

doudou@TEST> begin

2 dbms_stats.gather_table_stats(user,'cluster01',cascade=>true);

3 dbms_stats.gather_table_stats(user,'cluster02',cascade=>true);

4 end;

5 /

PL/SQL procedure successfully completed.

或

doudou@TEST> exec dbms_stats.gather_table_stats('DOUDOU','cluster01',cascade=>true);

PL/SQL procedure successfully completed.

doudou@TEST> exec dbms_stats.gather_table_stats('DOUDOU','cluster02',cascade=>true);

PL/SQL procedure successfully completed.

2、开启执行计划并查询SQL

doudou@TEST> set autot on

doudou@TEST> select avg(y/(x+1)) from cluster01 where x between 10000 and 30000;

AVG(Y/(X+1))

------------

-369.65884

Execution Plan

----------------------------------------------------------

Plan hash value: 3265002277

------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 12 | 143 (0)| 00:00:02 |

| 1 | SORT AGGREGATE | | 1 | 12 | | |

| 2 | TABLE ACCESS BY INDEX ROWID| CLUSTER01 | 20144 | 236K| 143 (0)| 00:00:02 |

|* 3 | INDEX RANGE SCAN | INDEX_CLUSTER01 | 20144 | | 47 (0)| 00:00:01 |

------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

3 - access("X">=10000 AND "X"<=30000)

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

116 consistent gets

60 physical reads

0 redo size

434 bytes sent via SQL*Net to client

400 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

doudou@TEST> select avg(y/(x+1)) from cluster02 where x between 10000 and 30000;

AVG(Y/(X+1))

------------

-369.65884

Execution Plan

----------------------------------------------------------

Plan hash value: 2721670139

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 12 | 542 (1)| 00:00:07 |

| 1 | SORT AGGREGATE | | 1 | 12 | | |

|* 2 | TABLE ACCESS FULL| CLUSTER02 | 20130 | 235K| 542 (1)| 00:00:07 |

--------------------------------------------------------------------------------

【问题1：2个表的数据、索引都是一样的。为什么cluster01走索引，而cluster02全表扫描呢？】

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter("X"<=30000 AND "X">=10000)

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

2417 consistent gets

2411 physical reads

0 redo size

434 bytes sent via SQL*Net to client

400 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

doudou@TEST> SELECT /*+ INDEX(CLUSTER02 INDEX_CLUSTER02)*/AVG(Y/(X+1)) FROM CLUSTER02 WHERE X BETWEEN 10000 AND 30000;

AVG(Y/(X+1))

------------

-369.65884

Execution Plan

----------------------------------------------------------

Plan hash value: 924486639

------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 12 | 20171 (1)| 00:04:03 |

| 1 | SORT AGGREGATE | | 1 | 12 | | |

| 2 | TABLE ACCESS BY INDEX ROWID| CLUSTER02 | 20130 | 235K| 20171 (1)| 00:04:03 |

|* 3 | INDEX RANGE SCAN | INDEX_CLUSTER02 | 20130 | | 47 (0)| 00:00:01 |

------------------------------------------------------------------------------------------------

【问题2：强制cluster02走索引，但是最后cost消耗还是很大】

Predicate Information (identified by operation id):

---------------------------------------------------

3 - access("X">=10000 AND "X"<=30000)

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

20040 consistent gets

0 physical reads

0 redo size

434 bytes sent via SQL*Net to client

400 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

doudou@TEST> select

2 idx.index_name,

3 tab.table_name,

4 tab.num_rows,

5 tab.blocks,

6 idx.clustering_factor

7 from

8 user_indexes idx inner join user_tables tab

9 on idx.table_name = tab.table_name

10 order by table_name;

INDEX_NAME TABLE_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR

------------------------------ ------------------------------ ---------- ---------- -----------------

INDEX_CLUSTER01 CLUSTER01 1006949 2459 4726

INDEX_CLUSTER02 CLUSTER02 1006232 2459 999628

【cluster01与cluster02的clustering factor不同，发现了问题1的答案；

问题1的答案：物理分布的不同导致了索引的选择

问题2的答案：强制走索引不是适合所有的操作，有时也会造成更大的cost消耗

（clustering factor 高，相邻索引值指向更多不同的块，本来一个块可以返回的信息却需要数据库去读更多的块，而这里又强制使用rowid全表扫描所以造成了更多的cost）】

3、总结：

low的clustering factor减少了对相同块的重复读，从而减少cost的消耗，

high的clustering factor增加了对相同块的重复读，从而增加cost的消耗。

后语:建立表按照一定的顺序是有必要的,这样可以减低clustering factor(或者可以说索引建在有序的列上性能会好一些),从而优化sql

4、附表：脚本

1、查看clustering_factoer

select

idx.index_name,

tab.table_name,

tab.num_rows,

tab.blocks,

idx.clustering_factor

from

user_indexes idx inner join user_tables tab

on idx.table_name = tab.table_name

order by table_name;

2、收集信息

dbms_stats 与 analyze

dbms_stats注意：1、不可以收集集群信息，但可以收集单独表来代替收集整个集群。

2、收集优化器统计优先考虑

analyze注意：1、use the VALIDATE or LIST CHAINED ROWS clauses、collect information on free list blocks

2、收集优化器统计不优先考虑

总结：Oracle推荐收集优化器统计使用dbms_stats

Note:

Do not use the COMPUTE and ESTIMATE clauses of ANALYZE statement to collect optimizer statistics. These clauses are supported solely for backward compatibility and may be removed in a future release. The DBMS_STATS package collects a broader, more accurate set of statistics, and gathers statistics more efficiently.

You may continue to use ANALYZE statement to for other purposes not related to optimizer statistics collection:

To use the VALIDATE or LIST CHAINED ROWS clauses

To collect information on free list blocks

Statistics Gathering Procedures in the DBMS_STATS Package

Procedure	Collects
GATHER_INDEX_STATS	Index statistics
GATHER_TABLE_STATS	Table, column, and index statistics
GATHER_SCHEMA_STATS	Statistics for all objects in a schema
GATHER_DICTIONARY_STATS	Statistics for all dictionary objects
GATHER_DATABASE_STATS	Statistics for all objects in a database