聚簇因子:数据表中的数据都是无序的存在库中,当我们对数据进行检索的时候,查找起来很是耗费资源,于是我们就需要为表创建索引,索引的作用就是把表中的数据按照一定的顺序排列保存起来,于是就出现了一个问题,有的表中的数据和索引排列的顺序很是相近,而另一些表中的数据和索引排列的顺序相距甚远,聚簇因子的作用就是用来标记这个的,聚簇因子越小,相似度越高,聚簇因子越大,相似度越低。
数据和索引相似度:ORACLE 在存储数据时,并不是按照数据块的顺序挨个进行存入数据,因为前面存入的数据经常会有DML或者DDL操作,删除数据后,原先存有数据的数据块就变成了空块,ORACLE为了节省存储空间,当数据库再次有新数据进行插入的话,就会优先使用那些空块,只有当空块不够使用的时候,才会去高水位上开辟新块,这种情况也就会导致,一张表中的数据,并不是存储在相邻的数据块中,于是聚簇因子变的很大,当这种情况进行逻辑读取的时候,就会增加IO的次数, 影响读取的速度。
测试:
建表:
CREATE TABLE t_1 AS SELECT ROWNUM rn,a.* FROM all_objects a ORDER BY object_name DESC;
--创建t_1表关于rownum索引
CREATE INDEX ind_t_1 ON t_1(rn);
--创建表表t_2
CREATE TABLE t_2 AS SELECT * FROM ( SELECT ROWNUM rn,a.* FROM all_objects a ) ORDER BY rn ASC;
--创建t_2表关于rownum索引
CREATE INDEX ind_t_2 ON t_2(rn);
--分析两张表及其索引
EXEC DBMS_STATS.gather_table_stats(USER, 'T_1');
EXEC DBMS_STATS.gather_table_stats(USER, 'T_2');
EXEC DBMS_STATS.gather_index_stats(USER, 'IND_T_1');
EXEC DBMS_STATS.gather_index_stats(USER, 'IND_T_2');
执行查询操作
SQL> set autotrace traceonly;
SQL> SELECT * FROM t_1 WHERE rn BETWEEN 100 AND 120;
21 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 923245147
--------------------------------------------------------------------------------
-------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Tim
e |
--------------------------------------------------------------------------------
-------
| 0 | SELECT STATEMENT | | 22 | 2244 | 14 (0)| 00:
00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T_1 | 22 | 2244 | 14 (0)| 00:
00:01 |
|* 2 | INDEX RANGE SCAN | IND_T_1 | 22 | | 2 (0)| 00:
00:01 |
--------------------------------------------------------------------------------
-------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("RN">=100 AND "RN"<=120)
Statistics
----------------------------------------------------------
445 recursive calls
0 db block gets
85 consistent gets
0 physical reads
0 redo size
3426 bytes sent via SQL*Net to client
427 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
21 rows processed
表二查询:
SELECT * FROM t_2 WHERE rn BETWEEN 100 AND 120;
21 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2193767187
--------------------------------------------------------------------------------
-------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Tim
e |
--------------------------------------------------------------------------------
-------
| 0 | SELECT STATEMENT | | 22 | 2244 | 3 (0)| 00:
00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T_2 | 22 | 2244 | 3 (0)| 00:
00:01 |
|* 2 | INDEX RANGE SCAN | IND_T_2 | 22 | | 2 (0)| 00:
00:01 |
--------------------------------------------------------------------------------
-------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("RN">=100 AND "RN"<=120)
Statistics
----------------------------------------------------------
445 recursive calls
0 db block gets
69 consistent gets
0 physical reads
0 redo size
3426 bytes sent via SQL*Net to client
427 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
21 rows processed
观察结果:
通过执行统计信息观察,T_1表的查询一致读是85,而T_2表的一致读只有69,周样的表结构,同样的数据,为何有这么大的差距呢。
分析:
SQL> select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1','T_2') and a.table_name = b.table_name;
TABLE_NAME INDEX_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR
------------------------------ -------------------- ---------- ---------- -----------------
T_1 IND_T_1 71761 1069 38034
T_2 IND_T_2 71763 1069 1069
通过查询聚簇因子发现,两个表的聚簇因子差别很大,基于RN的索引在RN是顺序排列的表中,CLUSTERING_FACTOR的值相差很大。
在表中数据有时候属于无序状态,这个时候的CLUSTERING_FACTOR比较接近NUM_ROWS,说明如果扫描整个表,每次都要根据INDEX来读取相应行的ROWID,这个时候的IO操作很多,自然检索时间会比较长。如果数据有序的话,CLUSTERING_FACTOR比较接近BLOCKS,说明相邻的数据在一个块中,减少了IO操作数量,自然检索时间会大大降低。
解决办法:
第一种办法:对表进行重构:alter table t_1 move;
SQL> alter table t_1 move;
(
alter table move;
可以用来降低HWM,也可以用来调整已使用的block的存储参数,
比如PCTFREE。
等于在当前tablespace中重组table。
注意这个操作后,需要rebuild index
)
Table altered.
再次查询聚簇因子:
SQL> select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1','T_2') and a.table_name = b.table_name;
TABLE_NAME INDEX_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR
------------------------------ -------------------- ---------- ---------- -----------------
T_1 IND_T_1 71761 1069 38034
T_2 IND_T_2 71763 1069 1069
发现还是一样。
第二种方法: 按照索引顺序重建表:
SQL> create table t_1_bk as select * from t_1 order by rn;
Table created.
创建索引:
create index ind_t1bk on t_1_bk(rn);
Index created.
统计索引
EXEC DBMS_STATS.gather_index_stats(USER, 'IND_T1BK');
再次查看聚簇因子:
select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1_BK','T_2') and a.table_name = b.table_name;
TABLE_NAME INDEX_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR
------------------------------ -------------------- ---------- ---------- -----------------
T_1_BK IND_T1BK 1069
T_2 IND_T_2 71763 1069 1069
可以看到聚簇因子有所降低,但没了NUM_ROWS与BLOCKS。
在这里没有这两个字段的值是因为user_indexes 和user_tables的信息来自于对表的统计信息。
SQL> exec dbms_stats.gather_table_stats(USER,'T_1_BK');
PL/SQL procedure successfully completed.
SQL> select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1_BK','T_2') and a.table_name = b.table_name;
TABLE_NAME INDEX_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR
------------------------------ -------------------- ---------- ---------- -----------------
T_1_BK IND_T1BK 71761 1069 1069
T_2 IND_T_2 71763 1069 1069
以上这种办法只针对于小表, 但如果这个表的数据量非常大就不能够使用这种办法了。
第三种方法: 针对大表的:分区
在线将表改为分区表:
EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE(USER, 'T_1',dbms_redefinition.cons_index);
EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE(USER, 'T_1',dbms_redefinition.cons_index);
PL/SQL procedure successfully completed.
获得创建表的语句:select dbms_metadata.get_ddl('TABLE','T_1','SYS') from dual;
获得创建INDEX的语句:select dbms_metadata.get_ddl('INDEX','IND_T_1','SYS') from dual;
创建表:
create table t_1new(RN NUMBER,
OWNER VARCHAR2(30) NOT NULL ENABLE,
OBJECT_NAME VARCHAR2(30) NOT NULL ENABLE,
SUBOBJECT_NAME VARCHAR2(30),
OBJECT_ID NUMBER NOT NULL ENABLE,
DATA_OBJECT_ID NUMBER,
OBJECT_TYPE VARCHAR2(19),
CREATED DATE NOT NULL ENABLE,
LAST_DDL_TIME DATE NOT NULL ENABLE,
TIMESTAMP VARCHAR2(19),
STATUS VARCHAR2(7),
TEMPORARY VARCHAR2(1),
GENERATED VARCHAR2(1),
SECONDARY VARCHAR2(1),
NAMESPACE NUMBER NOT NULL ENABLE,
EDITION_NAME VARCHAR2(30)) partition by range(rn)
(partition p1 values less than(20000),
partition p2 values less than(40000),
partition p3 values less than(60000),
partition p4 values less than(maxvalue));
创建索引:create index ind_t_1new on t_1new(rn);
EXEC DBMS_REDEFINITION.START_REDEF_TABLE(USER, 'T_1', 'T_1NEW', 'RN RN, OWNER OWNER, OBJECT_NAME OBJECT_NAME, SUBOBJECT_NAME SUBOBJECT_NAME, OBJECT_ID OBJECT_ID, DATA_OBJECT_ID DATA_OBJECT_ID, OBJECT_TYPE OBJECT_TYPE, CREATED CREATED, LAST_DDL_TIME LAST_DDL_TIME, TIMESTAMP TIMESTAMP, STATUS STATUS, TEMPORARY TEMPORARY, GENERATED GENERATED, SECONDARY SECONDARY, NAMESPACE NAMESPACE, EDITION_NAME EDITION_NAME', DBMS_REDEFINITION.CONS_INDEX);
exec dbms_redefinition.finish_redef_table('HR','T_1','T_1NEW');
PL/SQL procedure successfully completed.
验证查询:
SQL> select count(*) from t_1 partition(p1);
COUNT(*)
----------
19999
SQL> select count(*) from t_1 partition(p2);
COUNT(*)
----------
20000
分区成功! 我们这时再来看看聚簇因子:
select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1') and a.table_name = b.table_name;
SQL> select b.table_name, a.index_name, b.num_rows, b.blocks, a.clustering_factor from user_indexes a, user_tables b where b.table_name in('T_1') and a.table_name = b.table_name;
TABLE_NAME INDEX_NAME NUM_ROWS BLOCKS CLUSTERING_FACTOR
------------------------------ -------------------- ---------- ---------- -----------------
T_1 IND_T_1NEW 38104
! 无解! 咋个不行。 应该是理解问题了, 这个只能解决聚簇因子所带来的性能问题。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/196700/viewspace-751488/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/196700/viewspace-751488/