How to Gather Optimizer Statistics on 10g (Doc ID 605439.1)

In this Document

 Goal
 Solution
 References

APPLIES TO:

Oracle Database - Personal Edition - Version 10.1.0.2 to 10.2.0.5 [Release 10.1 to 10.2]
Oracle Database - Standard Edition - Version 10.1.0.2 to 10.2.0.5 [Release 10.1 to 10.2]
Oracle Database - Enterprise Edition - Version 10.1.0.2 to 10.2.0.5 [Release 10.1 to 10.2]
Information in this document applies to any platform.
***Checked for relevance on 29-Jul-2013***

GOAL

This Document outline the recommended method to gather a standard set of optimizer statistics for use by the Cost Based Optimizer under Oracle 10g.  For other versions see:

Document 1226841.1 How To: Gather Statistics for the Cost Based Optimizer

 

SOLUTION

Quick Recreate Recommendation

To achieve a quick delete and recreate of the statistics on an individual table and it's indexes (adding column statistics for any skewed columns) and following the recommendations in this article use:

exec dbms_stats.delete_table_stats(ownname=>'user_name',-
  tabname=>'table_name',cascade_indexes=>true);

exec dbms_stats.gather_table_stats(ownname=>'user_name',-
   tabname=>'table_name',-
   estimate_percent => 100,-
   cascade=>true,-
   method_opt=>'for all columns size skewonly');


For explanation of these recommendations, see below. For more usage examples see the end of this article. 
Note that, from 10gR2 statistics can be restored using:

Document 452011.1 * Restoring table statistics in 10G onwards


IMPORTANT: PLEASE NOTE:

  • These recommendations apply to the majority of databases.
  • The recommendations aim to generate statistics with as much statistical accuracy as possible. To this  end 100% sample sizes are suggested since any reduction in sample size is always a concession to accuracy. It is acknowledged that such 100% samples are potentially time consuming and consideration needs to be made to fit the statistics gathering activities within the existing maintenance window.
  • Gathering new optimizer statistics should maintain or improve existing execution plans, but it is possible that some queries performance may degrade. Note that from 10gR1 previous copies of statistics are maintained by default for the last 30 days and can be restored in the case of problems. See: 
    Document 452011.1 Restoring table statistics in 10G onwards
  • Gathering new optimizer statistics may invalidate cursors in the shared pool so it is prudent to restrict all gathering operations execution to periods of low activity in the database, such as the scheduled maintenance windows.
  • Apart from object statistics, we also recommend that statistics should be gathered on dictionary objects. See:

    Document 457926.1 How to Gather Statistics on SYS Objects and 'Fixed' Objects?
  • For very large systems, the gathering of statistics can be a very time consuming and resource intensive activity. In this environment sample sizes need to be carefully controlled to ensure that gathering completes within acceptable timescale and resource constraints and within the maintenance window. For guidance on this topic See:
Document 44961.1 Statistics Gathering: Frequency and Strategy Guidelines

In these environments, it is also recommended to utilise change based statistics gathering to avoid re-gathering information unnecessarily.Please see:
Document 237901.1 Gathering Schema or Database Statistics Automatically - Examples
Document 377152.1  Best Practices for automatic statistics collection on Oracle 10g

Gathering Object statistics 

The Cost Based Optimizer (CBO) uses statistics to determine the execution plan for a particular query.  Potentially, with reduced sample sizes, sampling could produce different statistics due to chance groupings of data that may be the result of differing loading methods etc. 

On 10g  it is recommended to:

  • Gather statistics using scheduled statistics gathering scripts. In most cases the default scripts provide an adequate level of sampling taking into account the following recommendations:
  • Use sample size that is large enough. On 10g support suggests an estimate sample size of 100% (if it is possible for this to fit within the maintenance window), even if that means that statistics are gathered on a reduced frequency. If 100% is not feasible, try using at least an estimate of 30%. Generally, the accuracy of the statistics overall outweighs the day to day changes in most applications. This setting is because the default AUTO_SAMPLE_SIZE uses a  very small estimate percentage which can result in poor estimates.
  • Ensure all objects (tables and indexes) have stats gathered. An easy way to achieve this is to use the CASCADE parameter.
  • Ensuring that any columns with skewed data distribution have histograms collected, and at sufficient resolution using the METHOD_OPT parameter. Support recommends a conservative and more plan-stable approach of "adding a histogram only if it is known to be needed" rather than collecting column statistics on all columns. This can be achieved manually or by using the SKEWONLY option to automatically add column statistics to columns that contain data with a non-uniform distribution. Using the default column statistics setting of AUTO which means that DBMS_STATS will decide which columns to add histogram to where it believes that they may help to produce a better plan. If statistics are not completely up to date then the presence of Histograms can cause trouble when parsing values are out of range, or between values for "frequency" histograms. In these circumstances the optimizer has to make guesses which may be inaccurate and, on occasion, cause poor plans.  

    Note that in earlier versions the default setting for the METHOD_OPT parameter was "FOR ALL COLUMNS SIZE 1" which would collect only a high and a low value and effectively meant that there were no detailed column statistics. It is known that in some cases, the effect of a histogram is adverse to the generation of a better plan so users moving between versions may initially wish to set this parameter to its pre-upgrade release value, and later adjust to the post-upgrade release default value. See:
    Document 465787.1 Managing CBO Stats during an upgrade to 10g or 11g

    As ever, testing different values with the application will yield the best results.

    Note also that it only makes sense to collect column statistics (histograms) if your application has the facility to use them. Specifically, if you use bind variables in the application but do not peek the values (for example with _OPTIM_PEEK_USER_BINDS = FALSE ), then the optimizer will not have any value information to lookup column data from and will be unable to utilize column statistic information to improve cardinality estimates.

  • If partitions are in use, gather global statistics if possible due to time constraints. Global stats are very important but gathering is often avoided due to the sizes involved and length of time to required. If 100% samples are not possible then support would recommend going for a minimum of 1%. Gathering with small sample sizes (e.g. 0.001, 0.0001, 0.00001 etc. ) can be very effective but equally, a large proportion of the data will not be examined which could prove decisive to the optimizer's plan choices. Note that the available range for the ESTIMATE_PERCENT parameter is a very flexible [0.000001 -> 100] which can use very small sample sizes suitable for huge partitioned tables. Testing will reveal the most suitable settings for each system.
    See: 
    Document 236935.1 Global statistics - An Explanation
  • Gather system statistics to reflect the CPU loading of the system and to improve the accuracy of the CBO's estimates by providing the CBO with CPU cost estimates in addition to the normal I/O cost estimates. See:
    Document 470316.1 Using Actual System Statistics (Collected CPU and IO information
    Document 149560.1 Collect and Display System Statistics (CPU and IO) for CBO usage
    Document 153761.1 Scaling the System to Improve CBO optimizer

Note that the defaults for statistics gathering on different versions of Oracle are not necessarily the same, for example:

  • ESTIMATE_PERCENT: defaults:
    •  9i : 100%
    • 10g : DBMS_STATS.AUTO_SAMPLE_SIZE (using very small estimate percentage)
    • 11g : DBMS_STATS.AUTO_SAMPLE_SIZE (using larger estimate percentage - 100%)
  • METHOD_OPT: defaults:
    • 9i : "FOR ALL COLUMNS SIZE 1" effectively no detailed column statistics.
    • 10g and 11g : "FOR ALL COLUMNS SIZE AUTO" - This setting means that DBMS_STATS decides which columns to add histogram to where it believes that they may help to produce a better plan

Note that on 11g, although using auto size for ESTIMATE_PERCENT tends to default to 100% ,because this is an auto sample, the engine may still decide to use a different sample size for tables and columns.  This means that Column statistics could still be gathered with a small sample size and create a histogram that is missing key values. When ESTIMATE_PERCENT is set to a specific numeric value, that value will be used for both the table and columns.

Additionally, even though a 100% sample is collected, the gathering process is really fast since a new hashing algorithm is used to compute the statistics rather than sorting (in 9i and 10g the "slow" part was typically the sorting). In 10g, support experience has shown that the default ESTIMATE_PERCENT sample size was extremely small which often resulted in poor statistics and is therefore not recommended. <<<< 看来100%收集也不是全表扫描,而是使用hashing algorithm

SAMPLE STATISTIC GATHERING COMMANDS

Gathering statistics an individual table

exec dbms_stats.gather_table_stats(  -
       ownname => '  Schema_name ', -
       tabname => '  Table_name  ', -
       estimate_percent => 100,  -
       cascade => TRUE,  -
       method_opt => 'FOR ALL COLUMNS SIZE 1' );

N.B. replace '  Schema_name ' and ' Table_name ' with the name of the schema 
and table  to gather statistics for respectively.

Gathering statistics for all objects in a schema
exec dbms_stats.gather_schema_stats( -
 ownname => '  Schema_name ', -
cascade => TRUE, -
method_opt => 'FOR ALL COLUMNS SIZE 1' );
N.B. replace '  Schema_name ' with the name of the desired schema.    
Gathering statistics for all objects in the database:
exec dbms_stats.gather_database_stats( -
cascade => TRUE, -
method_opt => 'FOR ALL COLUMNS SIZE 1' );

NOTE:

For cases where column data is known to be skewed and column statistics are known to be beneficial, Replace:

method_opt => 'FOR ALL COLUMNS SIZE 1'

with

method_opt => 'FOR ALL COLUMNS SIZE AUTO'

or with    

method_opt => 'FOR ALL COLUMNS SIZE SKEWONLY'

to automatically add column statistics to columns that contain data with a non-uniform distribution

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值