转：关于10g中的表分析方面的建议-CSDN博客

关于10g中的表分析方面的建议
很多客户的数据库从9i升到10g 后都出现了性能问题，有些甚至比9i时下降超过50%。后来经过分析绝大多数都是和大量SQL使用了错误的执行计划有关。为什么10g 会出现这么严重的执行计划偏差呢？在9i的时代里，我们都自己编写分析脚本来对表和索引进行分析，在使用采样比例上，大家一般都比较保守，基本上都在20%以上。而10G提供了一个令人心动的技术就是自动分析，10g 数据库由于取消了RBO优化模式，缺省的优化器模式变为了ALL_ROWS，因此对分析数据的依赖程度更高，因此Oracle自带了一个标准的分析脚本，并且定期进行分析。不过随着这种新技术的使用，问题就出现了，oracle 的自动分析脚本往往只做很少量的采样，导致大量的表的评估信息出现严重偏差。最终很多客户选择了停止自动采集作业，改为手工分析。

以前在和一些Oracle的工程师讨论这个问题的时候，大家都觉得Oracle的AUTO SAMPLE是很不错的，是Oracle官方建议的，应该不会有问题，只有少量的系统需要调整，自定义采集作业脚本。今天我在整理以前的资料的时候，发现了一份METALINK的文档，发现这里的观点和我一直坚持的观点很近似。现在拿出来和大家共享。

Recommendations for Gathering Optimizer Statistics on 10g
Doc ID:  605439.1  Type:  HOWTO
Modified Date :  28-JUL-2009  Status:  PUBLISHED

从标题上看，是针对10g统计数据的Recommendations ，我刷新到了最新的版本，是7月28号的，新鲜出炉。

Applies to:
Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 10.2.0.4
Oracle Server - Standard Edition - Version: 10.1.0.2 to 10.2.0.4
Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 10.1.0.2 to 10.2.0.4

Goal
This Document outline the recommended method to gather a standard set of optimizer statistics for use by the Cost Based Optimizer under Oracle 10g.

很明确这个是针对10g 数据库CBO采集数据的建议性文档

IMPORTANT: PLEASE NOTE:

These recommendations apply to the majority of databases.------针对所有主要数据库都适用这些建议
The recommendations aim to generate statistics with as much statistical accuracy as possible. To this  end 100% sample sizes are suggested since any reduction in sample size is always a concession to accuracy. It is acknowledged that such 100% samples are potentially time consuming and consideration needs to be made to fit the statistics gathering activities within the existing maintenance window.
      对于采样比例的建议是尽可能多，在有可能的情况下建议使用100%采样。这和以往官方文档建议AUTO SAMPLE有了本质的区别，这也是我们做日常维护中的常识。

Gathering new optimizer statistics should maintain or improve existing execution plans, but it is possible that some queries performance may degrade. Note that from 10gR1 previous copies of statistics are maintained by default for the last 30 days and can be restored in the case of problems
      新的采样可能提高大多数SQL的执行计划，不过也可能导致某些SQL执行计划变坏，10g提供了一个新的功能，可以随时恢复30天内的统计数据

Gathering new optimizer statistics may invalidate cursors in the shared pool so it is prudent to restrict all gathering operations execution to periods of low activity in the database, such as the scheduled maintenance windows.
      采集数据最好在系统负载较低的时候做，这也是我们的常识

For very large systems, the gathering of statistics can be a very time consuming and resource intensive activity.In this environment sample sizes need to be carefully controlled to ensure that gathering completes within acceptable timescale and resource constraints and within the maintenance window.In these environments, it is also recommended to utilise change based statistics gathering to avoid re-gathering information unnecessarily
      对于大型系统的采样开销很大，因此我们需要控制好采样比例，以便于在维护窗口内完成采样作业。对于大型系统我们需要对分析数据进行分析，不要每次都全部重新分析，而只分析变化的部分。

On 10g  it is recommended to:

Gather statistics using scheduled statistics gathering scripts. In most cases the default scripts provide an adequate level of sampling taking into account the following recommendations:

Use sample size that is large enough. On 10g support suggests an estimate sample size of 100% (if it is possible for this to fit within the maintenance window), even if that means that statistics are gathered on a reduced frequency. If 100% is not feasible, try using at least an estimate of 30%. Generally, the accuracy of the statistics overall outweighs the day to day changes in most applications. This setting is because the default AUTO_SAMPLE_SIZE uses a  very small estimate percentage which can result in poor estimates.
      这段话里再次强调了SAMPLE SIZE的问题，尽可能的100%，如果不可能，尽可能不低于30%，而在这里承认了AUTO_SAMPLE_SIZE使用了十分低的采样比例，可能导致问题。大家可能还记得在DBA日记我们出现过的一次使用AUTO_SAMPLE_SIZE导致第二天系统出现严重性能问题的故事吧

Ensure all objects (tables and indexes) have stats gathered. An easy way to achieve this is to use the CASCADE parameter.
Ensuring that any columns with skewed data distribution have histograms collected, and at sufficient resolution using the METHOD_OPT parameter. Support recommends a conservative and more plan-stable approach of "adding a histogram only if it is known to be needed" rather than collecting column statistics on all columns. This can be achieved manually or by using the SKEWONLY option to automatically add column statistics to columns that contain data with a non-uniform. distribution. Using the default column statistics setting of AUTO which means that DBMS_STATS will decide which columns to add histogram to where it believes that they may help to produce a better plan. If statistics are not completely up to date then the presence of Histograms can cause trouble when parsing values are out of range, or between values for "frequency" histograms. In these circumstances the optimizer has to make guesses which may be inaccurate and, on occasion, cause poor plans.

Note that in earlier versions the default setting for the METHOD_OPT parameter was "FOR ALL COLUMNS SIZE 1" which would collect only a high and a low value and effectively meant that there were no detailed column statistics. It is known that in some cases, the effect of a histogram is adverse to the generation of a better plan so users moving between versions may initially wish to set this parameter to its pre-upgrade release value, and later adjust to the post-upgrade release default value.
f partitions are in use, gather global statistics if possible due to time constraints. Global stats are very important but gathering is often avoided due to the sizes involved and length of time to required. If 100% samples are not possible then support would recommend going for a minimum of 1%. Gathering with small sample sizes (e.g. 0.001, 0.0001, 0.00001 etc. ) can be very effective but equally, a large proportion of the data will not be examined which could prove decisive to the optimizer's plan choices. Note that the available range for the ESTIMATE_PERCENT parameter is a very flexible [0.000001 -> 100] which can use very small sample sizes suitable for huge partitioned tables. Testing will reveal the most suitable settings for each system.
See:
Note 236935.1 Global statistics - An Explanation
Gather system statistics to reflect the CPU loading of the system and to improve the accuracy of the CBO's estimates by providing the CBO with CPU cost estimates in addition to the normal I/O cost estimates. See:
  Note 470316.1 Using Actual System Statistics (Collected CPU and IO information  Note 149560.1 Collect and Display System Statistics (CPU and IO) for CBO usage  Note 153761.1 Scaling the System to Improve CBO optimizer
Note that the defaults for statistics gathering on different versions of Oracle are not necessarily the same, for example:

ESTIMATE_PERCENT: defaults:
9i : 100%
10g : DBMS_STATS.AUTO_SAMPLE_SIZE (using very small estimate percentage)
11g : DBMS_STATS.AUTO_SAMPLE_SIZE (using larger estimate percentage - 100%)
METHOD_OPT: defaults:
9i : "FOR ALL COLUMNS SIZE 1" effectively no detailed column statistics.
10g and 11g : "FOR ALL COLUMNS SIZE AUTO" - This setting means that DBMS_STATS decides which columns to add histogram to where it believes that they may help to produce a better plan.
In 11g, using auto size for ESTIMATE_PERCENT defaults to 100% and therefore is as accurate as possible. Additionally, even though a 100% sample is collected, the gathering process is really fast since a new hashing algorithm is used to compute the statistics rather than sorting (in 9i and 10g the "slow" part was typically the sorting). In 10g the default ESTIMATE_PERCENT sample size was extremely small which often resulted in poor statistics and is therefore not recommended.

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/665930/viewspace-694434/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/665930/viewspace-694434/