统计分析中不恰当method_opt使用可能造成执行计划问题

最新推荐文章于 2022-04-19 20:28:22 发布

comnp66042

最新推荐文章于 2022-04-19 20:28:22 发布

阅读量474

点赞数

对表进行dbms_stats.gather_table_stats统计分析时，参数method_opt主要控制两件事情：

1、决定哪些columns统计被收集
2、决定哪些columns直方图被收集以及直方图包含多少个buckets

以下是参数method_opt对表统计影响实验：

1、创建测试表

SQL>exec dbms_random.initialize(1);

SQL>create table t1
    as
    with generator as (
     select --+ materialize
       rownum pk
       from all_objects
       where rownum<=4000
       )
    select
       /*+ ordered use_nl(v2)*/
     rownum pk,
     round(dbms_random.value(1,2)) a,
     round(dbms_random.value(1,5)) b,
     round(dbms_random.value(1,10)) c,
     round(dbms_random.value(1,100)) d,
     round(dbms_random.value(1,1000000)) e
   from
     generator v1,
     generator v2
   where
     rownum<=1000000
     /

创建T1表的唯一索引
SQL>create unique index PK_T1 on T1(PK);

2、采用dbms_stats缺省统计分析或method_opt=>'for all columns size 1'

SQL>begin
      dbms_stats.gather_table_stats(
         wnname => user,
         tabname => 'T1',
         estimate_percent =>100,
         cascade => true);
     end;
   /

查看统计分析表T1信息：
SQL>select
                column_name,
                num_distinct,
                histogram, num_buckets,
         to_char(LAST_ANALYZED, 'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics
      where table_name='T1';
COLUMN_NAME   NUM_DISTINCT HISTOGRAM    NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK            1000000      NONE           1       2009-24-08 11:15:18
A             2            NONE           1       2009-24-08 11:15:18
B             5            NONE           1       2009-24-08 11:15:18
C             10           NONE           1       2009-24-08 11:15:18
D             100          NONE           1       2009-24-08 11:15:18
E             632322       NONE           1       2009-24-08 11:15:18
表中的六个列都有进行了统计分析。

3、method_opt=>'for all indexed columns'

SQL>begin
      dbms_stats.delete_table_stats(user, 'T1');
    end;
    /
SQL>begin
      dbms_stats.gather_table_stats(
        wnname => user,
        tabname=>'T1',
        estimate_percent=>100,
      method_opt=> 'for all indexed columns',
      --method_opt=> 'for all indexed columns size 1',
      --method_opt=> 'for all indexed columns size auto',
      --method_opt=> 'for all indexed columns size 254',
        cascade => true);
    end;
    /

SQL>select
                column_name,
                num_distinct,
                histogram, num_buckets,
         to_char(LAST_ANALYZED, 'yyyy-dd-mm hh24:mi:ss') LAST_ANALYZED
   from user_tab_col_statistics
      where table_name='T1';

method_opt=> 'for all indexed columns',输出结果：

COLUMN_NAME   NUM_DISTINCT HISTOGRAM    NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK       1000000              HEIGHT BALANCED          75 2009-24-08 15:34:23

method_opt=> 'for all indexed columns size 1|auto',输出结果：

COLUMN_NAME   NUM_DISTINCT HISTOGRAM    NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK                                  1000000 NONE                      1 2009-24-08 15:37:40

method_opt=> 'for all indexed columns size 254',输出结果：

COLUMN_NAME   NUM_DISTINCT HISTOGRAM    NUM_BUCKETS LAST_ANALYZED
------------ ------------ ------------ ----------- -------------------
PK       1000000              HEIGHT BALANCED         254 2009-24-08 15:34:23

mthod_opt=>'for all indexed columns' 只对索引列进行统计分析。

结论：采用method_opt=>‘for all indexed columns'参数，因为只统计索引列的信息，造成其他列统计分析的缺失，会造成执行计划错误，具体实验如下：

实验1：
SQL>select /*+ gather_plan_statistics */
           count(*)
           from t1
           where a=1;
SQL>select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

-------------------------------------------------------------------------------------
| Id | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 | SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.19 |    3737 |
|* 2 |   TABLE ACCESS FULL| T1   |      1 | 10000 |    500K|00:00:00.01 |    3737 |
-------------------------------------------------------------------------------------

评估行数值=10000
实际行数值=500K
两者的差异=500K/10K=50
因为列a没有进行统计分析，所以对该列的评估值计算rows*1%=10000

实验2：
SQL>select /*+ gather_plan_statistics */
           count(*)
           from t1
           where a=1
           and b=3;
SQL>select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
-------------------------------------------------------------------------------------
| Id | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 | SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.19 |    3737 |
|* 2 |   TABLE ACCESS FULL| T1   |      1 |    100 |    124K|00:00:00.12 |    3737 |
-------------------------------------------------------------------------------------

评估行数值=100
实际行数值=124K
两者的差异=124K/100=1240
因为列a和B没有进行统计分析，所以对该列的评估值计算rows*1%*1%=100

实验3：
SQL>select /*+ gather_plan_statistics */
           count(*)
           from t1
           where a=1
           and b=3
           and d+e>50;
SQL>select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
-------------------------------------------------------------------------------------
| Id | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   1 | SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.25 |    3737 |
|* 2 |   TABLE ACCESS FULL| T1   |      1 |      5 |    124K|00:00:00.25 |    3737 |
-------------------------------------------------------------------------------------

评估行数值=5
实际行数值=124K
两者的差异=124K/5=25800

条件中包含未统计的列越多，执行计划偏离越大，所以我们在进行统计分析中要慎用method_opt=>'for all indexed columns'。

当我们采用method_opt=>'for all columns size 1'，执行计划中的评估值(E-ROWS)
和实际值(A-ROWS)差异将大大缩小。通常建议该差异值不要超过100。

备注:
以上信息来源于Greg Rahn写的《DBMS_STATS, METHOD_OPT and FOR ALL INDEXED COLUMNS》，网址：http://structureddata.org/2008/10/14/dbms_stats-method_opt-and-for-all-indexed-columns/

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/354732/viewspace-613088/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/354732/viewspace-613088/

comnp66042

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
统计分析中不恰当method_opt使用可能造成执行计划问题

对表进行dbms_stats.gather_table_stats统计分析时，参数method_opt主要控制两件事情：1、决定哪些columns统计被收集2、决定哪些columns直方图被收集以及...
复制链接

扫一扫