对表进行dbms_stats.gather_table_stats统计分析时,参数method_opt主要控制两件事情:
1、决定哪些columns统计被收集
2、决定哪些columns直方图被收集以及直方图包含多少个buckets
以下是参数method_opt对表统计影响实验:
1、创建测试表
SQL>exec dbms_random.initialize(1);
SQL>create table t1
as
with generator as (
select --+ materialize
rownum pk
from all_objects
where rownum<=4000
)
select
/*+ ordered use_nl(v2)*/
rownum pk,
round(dbms_random.value(1,2)) a,
round(dbms_random.value(1,5)) b,
round(dbms_random.value(1,10)) c,
round(dbms_random.value(1,100)) d,
round(dbms_random.value(1,1000000)) e
from
generator v1,
generator v2
where
rownum<=1000000
/
创建T1表的唯一索引
SQL>create unique index PK_T1 on T1(PK);
2、采用dbms_stats缺省统计分析或method_opt=>'for all columns size 1'
SQL>begin
dbms_stats.gather_table_stats(
wnname => user,
tabname => 'T1',
estimate_percent =>100,
cascade => true);
end;
/
查看统计分析表T1信息:
SQL>select
column_name,
num_distinct,
histogram, num_buckets,
to_char(LAST_ANALYZED, 'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics
where table_name='T1';
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK 1000000 NONE 1 2009-24-08 11:15:18
A 2 NONE 1 2009-24-08 11:15:18
B 5 NONE 1 2009-24-08 11:15:18
C 10 NONE 1 2009-24-08 11:15:18
D 100 NONE 1 2009-24-08 11:15:18
E 632322 NONE 1 2009-24-08 11:15:18
表中的六个列都有进行了统计分析。
3、method_opt=>'for all indexed columns'
SQL>begin
dbms_stats.delete_table_stats(user, 'T1');
end;
/
SQL>begin
dbms_stats.gather_table_stats(
wnname => user,
tabname=>'T1',
estimate_percent=>100,
method_opt=> 'for all indexed columns',
--method_opt=> 'for all indexed columns size 1',
--method_opt=> 'for all indexed columns size auto',
--method_opt=> 'for all indexed columns size 254',
cascade => true);
end;
/
SQL>select
column_name,
num_distinct,
histogram, num_buckets,
to_char(LAST_ANALYZED, 'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics
where table_name='T1';
method_opt=> 'for all indexed columns',输出结果:
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK 1000000 HEIGHT BALANCED 75 2009-24-08 15:34:23
method_opt=> 'for all indexed columns size 1|auto',输出结果:
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK 1000000 NONE 1 2009-24-08 15:37:40
method_opt=> 'for all indexed columns size 254',输出结果:
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK 1000000 HEIGHT BALANCED 254 2009-24-08 15:34:23
mthod_opt=>'for all indexed columns' 只对索引列进行统计分析。
结论:采用method_opt=>‘for all indexed columns'参数,因为只统计索引列的信息,造成其他列统计分析的缺失,会造成执行计划错误,具体实验如下:
实验1:
SQL>select /*+ gather_plan_statistics */
count(*)
from t1
where a=1;
SQL>select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.19 | 3737 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 10000 | 500K|00:00:00.01 | 3737 |
-------------------------------------------------------------------------------------
评估行数值=10000
实际行数值=500K
两者的差异=500K/10K=50
因为列a没有进行统计分析,所以对该列的评估值计算rows*1%=10000
实验2:
SQL>select /*+ gather_plan_statistics */
count(*)
from t1
where a=1
and b=3;
SQL>select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.19 | 3737 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 100 | 124K|00:00:00.12 | 3737 |
-------------------------------------------------------------------------------------
评估行数值=100
实际行数值=124K
两者的差异=124K/100=1240
因为列a和B没有进行统计分析,所以对该列的评估值计算rows*1%*1%=100
实验3:
SQL>select /*+ gather_plan_statistics */
count(*)
from t1
where a=1
and b=3
and d+e>50;
SQL>select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.25 | 3737 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 5 | 124K|00:00:00.25 | 3737 |
-------------------------------------------------------------------------------------
评估行数值=5
实际行数值=124K
两者的差异=124K/5=25800
条件中包含未统计的列越多,执行计划偏离越大,所以我们在进行统计分析中要慎用method_opt=>'for all indexed columns'。
当我们采用method_opt=>'for all columns size 1',执行计划中的评估值(E-ROWS)
和实际值(A-ROWS)差异将大大缩小。通常建议该差异值不要超过100。
备注:
以上信息来源于Greg Rahn写的《DBMS_STATS, METHOD_OPT and FOR ALL INDEXED COLUMNS》,网址:http://structureddata.org/2008/10/14/dbms_stats-method_opt-and-for-all-indexed-columns/
1、决定哪些columns统计被收集
2、决定哪些columns直方图被收集以及直方图包含多少个buckets
以下是参数method_opt对表统计影响实验:
1、创建测试表
SQL>exec dbms_random.initialize(1);
SQL>create table t1
as
with generator as (
select --+ materialize
rownum pk
from all_objects
where rownum<=4000
)
select
/*+ ordered use_nl(v2)*/
rownum pk,
round(dbms_random.value(1,2)) a,
round(dbms_random.value(1,5)) b,
round(dbms_random.value(1,10)) c,
round(dbms_random.value(1,100)) d,
round(dbms_random.value(1,1000000)) e
from
generator v1,
generator v2
where
rownum<=1000000
/
创建T1表的唯一索引
SQL>create unique index PK_T1 on T1(PK);
2、采用dbms_stats缺省统计分析或method_opt=>'for all columns size 1'
SQL>begin
dbms_stats.gather_table_stats(
wnname => user,
tabname => 'T1',
estimate_percent =>100,
cascade => true);
end;
/
查看统计分析表T1信息:
SQL>select
column_name,
num_distinct,
histogram, num_buckets,
to_char(LAST_ANALYZED, 'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics
where table_name='T1';
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK 1000000 NONE 1 2009-24-08 11:15:18
A 2 NONE 1 2009-24-08 11:15:18
B 5 NONE 1 2009-24-08 11:15:18
C 10 NONE 1 2009-24-08 11:15:18
D 100 NONE 1 2009-24-08 11:15:18
E 632322 NONE 1 2009-24-08 11:15:18
表中的六个列都有进行了统计分析。
3、method_opt=>'for all indexed columns'
SQL>begin
dbms_stats.delete_table_stats(user, 'T1');
end;
/
SQL>begin
dbms_stats.gather_table_stats(
wnname => user,
tabname=>'T1',
estimate_percent=>100,
method_opt=> 'for all indexed columns',
--method_opt=> 'for all indexed columns size 1',
--method_opt=> 'for all indexed columns size auto',
--method_opt=> 'for all indexed columns size 254',
cascade => true);
end;
/
SQL>select
column_name,
num_distinct,
histogram, num_buckets,
to_char(LAST_ANALYZED, 'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
from user_tab_col_statistics
where table_name='T1';
method_opt=> 'for all indexed columns',输出结果:
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK 1000000 HEIGHT BALANCED 75 2009-24-08 15:34:23
method_opt=> 'for all indexed columns size 1|auto',输出结果:
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK 1000000 NONE 1 2009-24-08 15:37:40
method_opt=> 'for all indexed columns size 254',输出结果:
COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK 1000000 HEIGHT BALANCED 254 2009-24-08 15:34:23
mthod_opt=>'for all indexed columns' 只对索引列进行统计分析。
结论:采用method_opt=>‘for all indexed columns'参数,因为只统计索引列的信息,造成其他列统计分析的缺失,会造成执行计划错误,具体实验如下:
实验1:
SQL>select /*+ gather_plan_statistics */
count(*)
from t1
where a=1;
SQL>select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.19 | 3737 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 10000 | 500K|00:00:00.01 | 3737 |
-------------------------------------------------------------------------------------
评估行数值=10000
实际行数值=500K
两者的差异=500K/10K=50
因为列a没有进行统计分析,所以对该列的评估值计算rows*1%=10000
实验2:
SQL>select /*+ gather_plan_statistics */
count(*)
from t1
where a=1
and b=3;
SQL>select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.19 | 3737 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 100 | 124K|00:00:00.12 | 3737 |
-------------------------------------------------------------------------------------
评估行数值=100
实际行数值=124K
两者的差异=124K/100=1240
因为列a和B没有进行统计分析,所以对该列的评估值计算rows*1%*1%=100
实验3:
SQL>select /*+ gather_plan_statistics */
count(*)
from t1
where a=1
and b=3
and d+e>50;
SQL>select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.25 | 3737 |
|* 2 | TABLE ACCESS FULL| T1 | 1 | 5 | 124K|00:00:00.25 | 3737 |
-------------------------------------------------------------------------------------
评估行数值=5
实际行数值=124K
两者的差异=124K/5=25800
条件中包含未统计的列越多,执行计划偏离越大,所以我们在进行统计分析中要慎用method_opt=>'for all indexed columns'。
当我们采用method_opt=>'for all columns size 1',执行计划中的评估值(E-ROWS)
和实际值(A-ROWS)差异将大大缩小。通常建议该差异值不要超过100。
备注:
以上信息来源于Greg Rahn写的《DBMS_STATS, METHOD_OPT and FOR ALL INDEXED COLUMNS》,网址:http://structureddata.org/2008/10/14/dbms_stats-method_opt-and-for-all-indexed-columns/
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/354732/viewspace-613088/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/354732/viewspace-613088/