clustering_factor是使用B树索引进行区间扫描的成本的重要因素。
该索引在表中越排列有序,则该值越小;越无序,则越大。
--------
影响群集因子的几种情况:
减少表争用:(多个自由列表),为了保证插入的数据速度快。
减少叶块争用(反转键索引,reverse key index)
减少叶块争用(反转键索引,reverse key index)
--------------------------------------------------
clustering_factor是使用B树索引进行区间扫描的成本的重要因素,它很有可能就成为成本计算误差的主要原因
DBA用来提升性能和避免争用的策略,由此分析索引的副作用以使得优化器抛弃本来应该使用的索引
clustering_factor能够描述数据在表中分布的随机程度,创建这样一个参数是一种较好的思路
特别关注传统的基于堆组织的heap-organized的表,我们将发现有问题的索引大部分都是基于时间或者序列的
1.基本示例:
我们创建一个表,其中主键分为2部分,一部分是日期,一部分是序列码
接下来同时运行5个进程以执行一个模拟终端用户行为的过程.
a high pctfree原因在于只有如此才可以在不产生大量数据的前提下创建一个符合要求的大表
在脚本中提供的过程也使用包dbms_lock来同步并发启动时间
当5个并发执行进程结束后,需要创建相关的索引,然后产生并检查相关的统计信息
脚本内容:
SQL> @E:/oracle培训/性能优化/群集因子/ch_05_clustering/base_line
start setenv
alter session set "_optimizer_skip_scan_enabled"=false;
drop table t1;
begin
begin execute immediate 'purge recyclebin';
exception when others then null;
end;
begin execute immediate 'begin dbms_stats.delete_system_stats; end;';
exception when others then null;
end;
begin execute immediate 'alter session set "_optimizer_cost_model"=io';
exception when others then null;
end;
end;
/
create table t1(
date_ord date constraint t1_dto_nn not null,
seq_ord number(6) constraint t1_sqo_nn not null,
small_vc varchar2(10)
)
pctfree 90
pctused 10
;
drop sequence t1_seq;
create sequence t1_seq;
create or replace procedure t1_load(i_tag varchar2) as
m_date date;
begin
dbms_output.put_line(
dbms_lock.request(
1,dbms_lock.s_mode, release_on_commit => true
)
);
commit;
for i in 0..25 loop
m_date := trunc(sysdate) + i;
for j in 1..200 loop
insert into t1 values(
m_date,
t1_seq.nextval,
i_tag || j
);
commit;
dbms_lock.sleep(0.01);
end loop;
end loop;
end;
/
begin
dbms_output.put_line(
dbms_lock.request(
1,dbms_lock.x_mode,
release_on_commit=>true
)
);
end;
/
分析:
SQL> exec t1_load('a')
SQL> exec t1_load('b')
SQL> exec t1_load('c')
SQL> exec t1_load('d')
SQL> exec t1_load('e')
5个并发session同时执行过程,第6个session 察看等待事件
SQL> select sid,event,p1,p2,p3 from v$session_wait where event not like 'rdbms%'
and event not like 'SQL%';
SID EVENT P1 P2 P3
---------- ------------------------------ ---------- ---------- ----------
134 wait for unread message on bro 906842184 906810940 0
adcast channel
137 jobq slave wait 0 0 0
143 Streams AQ: waiting for messag 8839 912619724 5
es in the queue
150 Streams AQ: waiting for time m 0 0 0
anagement or cleanup tasks
151 Streams AQ: qmn slave idle wai 0 0 0
SID EVENT P1 P2 P3
---------- ------------------------------ ---------- ---------- ----------
t
154 Streams AQ: qmn coordinator id 0 0 0
le wait
164 smon timer 300 0 0
170 pmon timer 300 0 0
已选择8行。
create index t1_i1 on t1(date_ord, seq_ord);
begin
dbms_stats.gather_table_stats(
user,
't1',
cascade => true,
estimate_percent => null,
method_opt => 'for all columns size 1'
);
end;
/
SQL>select
blocks,
num_rows
from
user_tables
where
table_name = 'T1';
BLOCKS NUM_ROWS
---------- ----------
596 20800
SQL>select
index_name, blevel, leaf_blocks, clustering_factor
from
user_indexes
where
table_name = 'T1';
INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ----------- -----------------
T1_I1 1 101 1899
------分析:
CLUSTERING_FACTOR和表中块的数量相近,但比表中行的数目要小得多。这个索引看起来还是不错的
接下来,查询获得指定日期的所有数据
SQL> set autotrace on
SQL> select
2 count(small_vc)
3 from
4 t1
5 where
6 date_ord = trunc(sysdate) + 7
7 ;
COUNT(SMALL_VC)
---------------
800
执行计划
----------------------------------------------------------
Plan hash value: 269862921
--------------------------------------------------------------------------------
------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
|
--------------------------------------------------------------------------------
------
| 0 | SELECT STATEMENT | | 1 | 13 | 79 (0)| 00:0
0:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | |
|
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 800 | 10400 | 79 (0)| 00:0
0:01 |
|* 3 | INDEX RANGE SCAN | T1_I1 | 800 | | 5 (0)| 00:0
0:01 |
--------------------------------------------------------------------------------
cost=
blevel +
ceil(effective index selectivity * leaf_blocks) +
ceil(effective table selectivity * clustering_factor)
示例中,我们取出26天中某一天数据,选择率为1/26 ,代入公式为
cost=
1 +
1/26 * 101 +
1/26 * 1899
=77
2.减少表争用:(多个自由列表)
上面的示例中存在一个问题,在高并发系统中,他们可能会遇到大量争用麻烦,简单看下前面得到数据的几行:
SQL> set autotr off
SQL> select /* full(t1) */ rowid,date_ord,seq_ord,small_vc
from t1
where rownum<3000;
ROWID DATE_ORD SEQ_ORD SMALL_VC
------------------ -------------- ---------- ----------
AAANDyAABAABoesAAY 20-1月 -09 2993 e174
AAANDyAABAABoesAAZ 20-1月 -09 2994 c70
AAANDyAABAABoesAAa 20-1月 -09 2995 c71
AAANDyAABAABoesAAb 18-1月 -09 2996 a31
AAANDyAABAABoesAAc 19-1月 -09 2997 d121
AAANDyAABAABoesAAd 20-1月 -09 2998 e175
AAANDyAABAABoesAAe 18-1月 -09 2999 a32
扩展rowid是由以下几个部分组成的:
object_id 前六个字母AAANDy
相对的field_id 接下来的3个字母AAB
文件中的块 接下来的6个字母AABoes
块中的行 最后的3个字母AAA,AAB....
由此可以看出,所有的数据行都在同一个块中AABoes,
同时SMALL_VC附加了一个标记,用于标识数据是由哪一个进程插入的,所有的5个进程同时都对一个数据库进行操作,在一个非常忙得系统中,
可以看到很多数据块处于的缓冲忙等待,这是因为所有的插入操作都是针对同一个数据块进行的
如何解决这个问题? 可以创建一个包含多个自由列表(freelist)的表,在这里可以创建一个包含5个自由列表的表:storage (freelists 5)
该子句可用时,oracle保持5个自由块的链接列表,他们挂起在表的段标题块中(segment header block),当一个进程需要插入一行数据时,它使用进行ID来确定应该访问
哪一个列表以获取一个自由的块,这意味着5个并发的进程永远都不会相互冲突,他们总是使用5个不同的表块来插入数据行
SQL> @E:/oracle培训/性能优化/群集因子/ch_05_clustering/free_lists
alter session set "_optimizer_skip_scan_enabled"=false;
drop table t1;
begin
begin execute immediate 'purge recyclebin';
exception when others then null;
end;
begin execute immediate 'begin dbms_stats.delete_system_stats; end;';
exception when others then null;
end;
begin execute immediate 'alter session set "_optimizer_cost_model"=io';
exception when others then null;
end;
end;
/
create table t1(
date_ord date constraint t1_dto_nn not null,
seq_ord number(6) constraint t1_sqo_nn not null,
small_vc varchar2(10)
)
pctfree 90
pctused 10
storage (freelists 5)
;
drop sequence t1_seq;
create sequence t1_seq;
create or replace procedure t1_load(i_tag varchar2) as
m_date date;
begin
dbms_output.put_line(
dbms_lock.request(
1,dbms_lock.s_mode, release_on_commit => true
)
);
commit;
for i in 0..25 loop
m_date := trunc(sysdate) + i;
for j in 1..200 loop
insert into t1 values(
m_date,
t1_seq.nextval,
i_tag || j
);
commit;
dbms_lock.sleep(0.01);
end loop;
end loop;
end;
/
begin
dbms_output.put_line(
dbms_lock.request(
1,dbms_lock.x_mode,
release_on_commit=>true
)
);
end;
/
分析:
SQL> exec t1_load('a')
SQL> exec t1_load('b')
SQL> exec t1_load('c')
SQL> exec t1_load('d')
SQL> exec t1_load('e')
5个并发session同时执行过程,第6个session 察看等待事件
SQL> select sid,event,p1,p2,p3 from v$session_wait where event not like 'rdbms%'
and event not like 'SQL%';
SID EVENT P1 P2 P3
---------- ------------------------------ ---------- ---------- ----------
132 PL/SQL lock timer 1 0 0
134 wait for unread message on bro 906842184 906810940 0
adcast channel
137 jobq slave wait 0 0 0
142 PL/SQL lock timer 1 0 0
143 Streams AQ: waiting for messag 8839 912619724 5
es in the queue
144 PL/SQL lock timer 1 0 0
147 PL/SQL lock timer 1 0 0
SID EVENT P1 P2 P3
---------- ------------------------------ ---------- ---------- ----------
150 Streams AQ: waiting for time m 0 0 0
anagement or cleanup tasks
151 Streams AQ: qmn slave idle wai 0 0 0
t
154 Streams AQ: qmn coordinator id 0 0 0
le wait
157 PL/SQL lock timer 1 0 0
164 smon timer 300 0 0
SID EVENT P1 P2 P3
---------- ------------------------------ ---------- ---------- ----------
170 pmon timer 300 0 0
create index t1_i1 on t1(date_ord, seq_ord);
begin
dbms_stats.gather_table_stats(
user,
't1',
cascade => true,
estimate_percent => null,
method_opt => 'for all columns size 1'
);
end;
/
SQL>select
blocks,
num_rows
from
user_tables
where
table_name = 'T1';
BLOCKS NUM_ROWS
---------- ----------
749 26000
SQL>select
index_name, blevel, leaf_blocks, clustering_factor
from
user_indexes
where
table_name = 'T1';
INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ----------- -----------------
T1_I1 1 133 1121
SQL>set autotrace traceonly explain
SQL>select
count(small_vc)
from
t1
where
date_ord = trunc(sysdate) + 7
;
执行计划
----------------------------------------------------------
Plan hash value: 269862921
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 51 |
| 1 | SORT AGGREGATE | | 1 | 13 | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 1000 | 13000 | 51 |
|* 3 | INDEX RANGE SCAN | T1_I1 | 1000 | | 7 |
----------------------------------------------------------------------
2.减少叶块争用(反转键索引,reverse key index)
在对clustering_factor的值进行修改之前,我们先来看其他两种方法,他们的效果和修正clustering_factor的值是一样的,
第一种方法就是反转键索引,这是在基于序列的索引出现之前,oracle8将其作为一种能够减少争用的机制而引入的(尤其是在RAC系统中)
所谓反转键索引,其操作方式为在将输出的值插入到索引结构之前,首先将每个列的字节顺序反转
这样做的的作用是能够将顺序值打乱为随机散布的索引项
eg:
SQL> select * from t1
where rownum<3;
DATE_ORD SEQ_ORD SMALL_VC
-------------- ---------- ----------
16-1月 -09 1 a1
16-1月 -09 2 a2
SQL> select dump(DATE_ORD,16) date_dump,
2 dump(SEQ_ORD,16) sep_dump
3 from t1
4 where DATE_ORD=to_date('16-1月 -09')
5 and SEQ_ORD=2;
DATE_DUMP
------------------------------------------------------------------------------
SEP_DUMP
------------------------------------------------------------------------------
Typ=12 Len=7: 78,6d,1,10,1,1,1
Typ=2 Len=2: c1,3
将这个值反转过来后得到
select dump(reverse(DATE_ORD,16)) date_dump,
dump(reverse(SEQ_ORD,16)) sep_dump
from t1
where DATE_ORD=to_date('16-1月 -09')
and SEQ_ORD=2;
SQL> select dump(19) from dual;
DUMP(19)
-------------------
Typ=2 Len=2: 193,20
SQL> select dump(reverse(19)) from dual;
DUMP(REVERSE(19))
-------------------
Typ=2 Len=2: 20,193
这对
clustering_factor和执行计划将产生什么影响?
回到基本测试用到的表(freelists 采用1),索引重建为一个反转键索引
eg:
@E:/oracle培训/性能优化/群集因子/ch_05_clustering/reversed_ind
start setenv
alter session set "_optimizer_skip_scan_enabled"=false;
drop table t1;
begin
begin execute immediate 'purge recyclebin';
exception when others then null;
end;
begin execute immediate 'begin dbms_stats.delete_system_stats; end;';
exception when others then null;
end;
begin execute immediate 'alter session set "_optimizer_cost_model"=io';
exception when others then null;
end;
end;
/
create table t1(
date_ord date constraint t1_dto_nn not null,
seq_ord number(6) constraint t1_sqo_nn not null,
small_vc varchar2(10)
)
pctfree 90
pctused 10
;
drop sequence t1_seq;
create sequence t1_seq;
create or replace procedure t1_load(i_tag varchar2) as
m_date date;
begin
dbms_output.put_line(
dbms_lock.request(
1,dbms_lock.s_mode, release_on_commit => true
)
);
commit;
for i in 0..25 loop
m_date := trunc(sysdate) + i;
for j in 1..200 loop
insert into t1 values(
m_date,
t1_seq.nextval,
i_tag || j
);
commit;
dbms_lock.sleep(0.01);
end loop;
end loop;
end;
/
begin
dbms_output.put_line(
dbms_lock.request(
1,dbms_lock.x_mode,
release_on_commit=>true
)
);
end;
/
分析:
SQL> exec t1_load('a')
SQL> exec t1_load('b')
SQL> exec t1_load('c')
create index t1_i1 on t1(date_ord, seq_ord);
begin
dbms_stats.gather_table_stats(
user,
't1',
cascade => true,
estimate_percent => null,
method_opt => 'for all columns size 1'
);
end;
/
select
blocks,
num_rows
from
user_tables
where
table_name = 'T1';
select
index_name, blevel, leaf_blocks, clustering_factor
from
user_indexes
where
table_name = 'T1'
;
set autotrace traceonly explain
select
count(small_vc)
from
t1
where
date_ord = trunc(sysdate) + 7
;
set autotrace off
SQL> alter index t1_i1 rebuild reverse;
begin
dbms_stats.gather_table_stats(
user,
't1',
cascade => true,
estimate_percent => null,
method_opt => 'for all columns size 1'
);
end;
/
SQL> select
2 blocks,
3 num_rows
4 from
5 user_tables
6 where
7 table_name = 'T1';
BLOCKS NUM_ROWS
---------- ----------
443 15600
SQL>
SQL> select
2 index_name, blevel, leaf_blocks, clustering_factor
3 from
4 user_indexes
5 where
6 table_name = 'T1'
7 ;
INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
------------------------------ ---------- ----------- -----------------
T1_I1 1 51 15588
SQL> set autotrace traceonly explain
SQL>
SQL> select
2 count(small_vc)
3 from
4 t1
5 where
6 date_ord = trunc(sysdate) + 7
7 ;
执行计划
----------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 101 (2)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | TABLE ACCESS FULL| T1 | 600 | 7800 | 101 (2)| 00:00:02 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("DATE_ORD"=TRUNC(SYSDATE@!)+7)
将索引反转的目的在于当表项是顺序值得时候,将索引项打乱顺序重新散布-----但是,这样做的后果就是临近的索引项和
对应的表项是散布的;换句话说,clustering_factor得值变得非常大.这样一来,clustering_factor就和示例中表的行数比较近了,所以执行计划将从索引区间扫描切换到表扫描,数据分布是没有变化的,但是oracle的感知发生了变化,这是因为clustering_factor的机制并不知道反转建索引的影响
3.减少表的争用ASSM
还有一个新的功能会很明显地破坏索引的效率,它的目的在于通过增强数据分布的随机性来减少争用
本想解决性能问题而提出的功能,但是它的引入又带来了新的性能问题
测试:
在表空间中创建相关数据,表空间使用的是自动段空间管理ASSM的表空间
oracle针对段空间管理而引入这一新功能的目的在于避免插入操作时出现的对表块的争用问题,尤其是在RAC环境中
ASSM有两个特性:
1,结构化:ASSM表空间中的每个段都在每个盘区的开始部分使用几个块(在盘区中,每64个块对应着一到两个)来维持该盘区中所有其他块的映射,
并且能够在大体上指示每个块中尚余多少自由空间可用(精度大体相当于1/4块)
2,出现在运行时:当一个进程需要插入一行数据时,它根据进程ID来选择一个空间映射块,然后依据进程ID从空间映射中选择相关数据块
ASSM的影响在于并发的进程倾向于选择不同的块来插入数据行,并在没有DBA介入情况下使得争用程度最小化(这意味着clustering_factor参数可能发生较大的变化)
eg:
---创建表空间test_8k_assm
SQL> create tablespace test_8k_assm
2 blocksize 8k
3 datafile 'D:/oracle/product/10.2.0/oradata/orcl/test_8k_assm.dbf'
4 size 50m reuse
5 extent management local
6 uniform size 1m
7 segment space management auto
8 ;
---环境设置
set pause off
set serveroutput on size 1000000 format wrapped
rem exec dbms_java.set_output(1000000)
set doc off
doc
Sections of documentation end with a line starting with #
#
set linesize 120
set trimspool on
set pagesize 24
set arraysize 25
set long 20000
set autotrace off
clear breaks
ttitle off
btitle off
column owner format a15
column segment_name format a20
column table_name format a20
column index_name format a20
column object_name format a20
column partition_name format a20
column subpartition_name format a20
column column_name format a20
column constraint_name format a20
column low_value format a24
column high_value format a24
column parent_id_plus_exp format 999
column id_plus_exp format 990
column plan_plus_exp format a90
column object_node_plus_exp format a10
column other_plus_exp format a90
column other_tag_plus_exp format a29
column os_username format a30
column terminal format a24
column userhost format a24
column client_id format a24
column statistic_name format a35
column namespace format a20
column attribute format a20
column time_now noprint new_value m_timestamp
select to_char(sysdate,'hh24miss') time_now
from dual;
set feedback off
commit;
set feedback on
set verify off
set timing off
alter session set optimizer_mode = all_rows;
alter session set "_optimizer_skip_scan_enabled"=false;
drop table t1;
begin
begin execute immediate 'purge recyclebin';
exception when others then null;
end;
begin execute immediate 'begin dbms_stats.delete_system_stats; end;';
exception when others then null;
end;
begin execute immediate 'alter session set "_optimizer_cost_model"=io';
exception when others then null;
end;
end;
/
---创建表,序列,过程
create table t1(
date_ord date constraint t1_dto_nn not null,
seq_ord number(6) constraint t1_sqo_nn not null,
small_vc varchar2(10)
)
pctfree 90
pctused 10
tablespace test_8k_assm
;
drop sequence t1_seq;
create sequence t1_seq;
create or replace procedure t1_load(i_tag varchar2) as
m_date date;
begin
dbms_output.put_line(
dbms_lock.request(
1,dbms_lock.s_mode, release_on_commit => true
)
);
commit;
for i in 0..25 loop
m_date := trunc(sysdate) + i;
for j in 1..200 loop
insert into t1 values(
m_date,
t1_seq.nextval,
i_tag || j
);
commit;
dbms_lock.sleep(0.01);
end loop;
end loop;
end;
/
begin
dbms_output.put_line(
dbms_lock.request(
1,dbms_lock.x_mode,
release_on_commit=>true
)
);
end;
/
分析:
SQL> exec t1_load('a')
SQL> exec t1_load('b')
SQL> exec t1_load('c')
SQL> exec t1_load('d')
SQL> exec t1_load('e')
5个并发session同时执行过程,第6个session 察看等待事件
SQL> select sid,event,p1,p2,p3 from v$session_wait where event not like 'rdbms%
and event not like 'SQL%';
---察看有多少块在并发进程之间存在冲突
select ct, count(*)
from
(
select block, count(*) ct
from
(
select
distinct dbms_rowid.rowid_block_number(rowid) block,
substr(small_vc,1,1)
from t1
)
group by block
)
group by ct
;
CT COUNT(*)
---------- ----------
1 403
2 140
5 23
4 61
3 98
----在对过程的每次调用中都包括了一个标记,标记的值被复制到small_vc中,这里利用块数目和标记值来找出有多少个块被所有5个进程插入数据行(5个进程分别为A-E这5个)
有多少个块被4个进程插入数据行,以此类推.......
由此可见,ASSM测试,仍显示了有许多块在并发进程之间存在明显冲突,其中有140个块被2个数据加载进程所公用,采用ASSM容易导致插入操作的随机性和资源的争用
(后面可以跟freelists 设置为5做比较)
ASSM引入的数据分布的随机性使得我们很难获得较好地避免争用的措施,但是争用的程度会较小而分布在不同的时间段内-----这5个进程都是用块SOLO,但是这并不一定意味着
他们是在同一个时间段内使用的
---创建索引并分析表
create index t1_i1 on t1(date_ord, seq_ord);
begin
dbms_stats.gather_table_stats(
user,
't1',
cascade => true,
estimate_percent => null,
method_opt => 'for all columns size 1'
);
end;
/
SQL> select
2 blocks,
3 num_rows
4 from
5 user_tables
6 where
7 table_name = 'T1';
BLOCKS NUM_ROWS
---------- ----------
754 26000
已选择 1 行。
SQL>
SQL> select
2 index_name, blevel, leaf_blocks, clustering_factor
3 from
4 user_indexes
5 where
6 table_name = 'T1';
INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- ---------- ----------- -----------------
T1_I1 1 86 9405
set autotrace traceonly explain
SQL> select
2 count(small_vc)
3 from
4 t1
5 where
6 date_ord = trunc(sysdate) + 7
7 ;
执行计划
----------------------------------------------------------
Plan hash value: 3724264953
-----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
-----------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 74 |
| 1 | SORT AGGREGATE | | 1 | 13 | |
|* 2 | TABLE ACCESS FULL| T1 | 1000 | 13000 | 74 |
-----------------------------------------------------------
由此可见,在没有对数据插入代码,数据定义和终端用户的行为进行任何修改的前提下,我们只是在基础层次上引入了oracle的一个特定的功能就将执行计划从索引
访问路径修改为表扫描(当然这里运行结果可能存在着差异),这是ASSM的一个特性,插入数据的散布会受到执行插入操作进行的ID的影响