Gathering Optimizer Statistics

最新推荐文章于 2022-06-06 11:21:49 发布

cqcj2023

最新推荐文章于 2022-06-06 11:21:49 发布

阅读量397

点赞数

文章标签：数据库人工智能

This chapter explains why statistics are important for the cost-based optimizer and how to gather and use statistics.

The chapter contains the following sections:

Understanding Statistics

As database administrator, you can generate statistics that quantify the data distribution and storage characteristics of tables, columns, indexes, and partitions. The cost-based optimization approach uses these statistics to calculate the selectivity of predicates and to estimate the cost of each execution plan.(

CBO使用统计信息（表，分区，索引，列的数据分布和存储特性）计算谓词的选择性（selectivity of predicates）估计执行计划的价值。）

Selectivity is the fraction of rows in a table that the SQL statement's predicate chooses. The optimizer uses the selectivity of a predicate to estimate the cost of a particular access method and to determine the optimal join order and join method.（selectivity是sql语句的谓词选择表中的行数）（优化器使用谓词的选择性估计访问方法的价值并确定最佳的连接顺序和连接方法）

The statistics are stored in the data dictionary and can be exported from one database and imported into another. For example, you might want to transfer your statistics to a test system to simulate your production environment.

Note:

The statistics mentioned in this section are CBO statistics, not instance performance statistics visible through V$ views.

You should gather statistics periodically for objects where the statistics become stale over time because of changing data volumes or changes in column values. New statistics should be gathered after a schema object's data or structure are modified in ways that make the previous statistics inaccurate. For example, after loading a significant number of rows into a table, collect new statistics on the number of rows. After updating data in a table, you do not need to collect new statistics on the number of rows, but you might need new statistics on the average row length.

Use the DBMS_STATS package to generate statistics.

Statistics generated include the following:

Table statistics
- Number of rows
- Number of blocks
- Average row length
Column statistics
- Number of distinct values (NDV) in column
- Number of nulls in column
- Data distribution (histogram)
Index statistics
- Number of leaf blocks
- Levels
- Clustering factor
System statistics
- I/O performance and utilization
- CPU performance and utilization

Generating Statistics

Because the cost-based approach relies on statistics, you should generate statistics for all tables and clusters and all indexes accessed by your SQL statements before using the cost-based approach. If the size and data distribution of the tables change frequently, then regenerate these statistics regularly to ensure the statistics accurately represent the data in the tables.

Oracle generates statistics using the following techniques:

Estimation based on random data sampling
Exact computation
User-defined statistics collection methods

To perform an exact computation, Oracle requires enough space to perform a scan and sort of the table. If there is not enough space in memory, then temporary space might be required. For estimations, Oracle requires enough space to perform a scan and sort of only the rows in the requested sample of the table. For indexes, computation does not take up as much time or space.

Some statistics are computed exactly, such as the number of data blocks currently containing data in a table or the depth of an index from its root block to its leaf blocks.

Oracle Corporation recommends setting the ESTIMATE_PERCENT parameter of the DBMS_STATS gathering procedures to DBMS_STATS.AUTO_SAMPLE_SIZE to maximize performance gains while achieving necessary statistical accuracy. AUTO_SAMPLE_SIZE lets Oracle determine the best sample size for good statistics. For example, to collect table and column statistics for all tables in the OE schema with auto-sampling:

EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS('OE',DBMS_STATS.AUTO_SAMPLE_SIZE);

To estimate statistics, Oracle selects a random sample of data. You can specify the sampling percentage and whether sampling should be based on rows or blocks. Oracle Corporation recommends using DBMS_STATS.AUTO_SAMPLE_SIZE for the sampling percentage. When in doubt, choose row sampling.

Row sampling reads rows without regard to their physical placement on disk. This provides the most random data for estimates, but it can result in reading more data than necessary. For example, in the worst case a row sample might select one row from each block, requiring a full scan of the table or index.
Block sampling reads a random sample of blocks and uses all of the rows in those blocks for estimates. This reduces the amount of I/O activity for a given sample size, but it can reduce the randomness of the sample if rows are not randomly distributed on disk. Block sampling is not available for index statistics.

When you generate statistics for a table, column, or index, if the data dictionary already contains statistics for the object, then Oracle updates the existing statistics. Oracle also invalidates any currently parsed SQL statements that access the object.

The next time such a statement executes, the optimizer automatically chooses a new execution plan based on the new statistics. Distributed statements issued on remote databases that access the analyzed objects use the new statistics the next time Oracle parses them.

When you associate a statistics type with a column or domain index, Oracle calls the statistics collection method in the statistics type, if you analyze the column or domain index.

Getting Statistics for Partitioned Schema Objects

Partitioned schema objects can contain multiple sets of statistics. They can have statistics that refer to any of the following:

The entire schema object as a whole (global statistics)
An individual partition
An individual subpartition of a composite partitioned object

Unless the query predicate narrows the query to a single partition, the optimizer uses the global statistics. Because most queries are not likely to be this restrictive, it is most important to have accurate global statistics. Intuitively, it can seem that generating global statistics from partition-level statistics is straightforward; however, this is true only for some of the statistics. For example, it is very difficult to figure out the number of distinct values for a column from the number of distinct values found in each partition, because of the possible overlap in values. Therefore, actually gathering global statistics with the DBMS_STATS package is highly recommended, rather than calculating them with the ANALYZE statement.

Note:

Oracle Corporation strongly recommends that you use the DBMS_STATS package rather than ANALYZE to collect optimizer statistics. That package lets you collect statistics in parallel, collect global statistics for partitioned objects, and fine tune your statistics collection in other ways. Further, the cost-based optimizer will eventually use only statistics that have been collected by DBMS_STATS. See Oracle9i Supplied PL/SQL Packages and Types Reference for more information on this package.

However, you must use the ANALYZE statement rather than DBMS_STATS for statistics collection not related to the cost-based optimizer, such as:

To use the VALIDATE or LIST CHAINED ROWS clauses
To collect information on freelist blocks

Using the DBMS_STATS Package

The PL/SQL package DBMS_STATS lets you generate and manage statistics for cost-based optimization. You can use this package to gather, modify, view, export, import, and delete statistics. You can also use this package to identify or name statistics gathered.

The DBMS_STATS package can gather statistics on indexes, tables, columns, and partitions, as well as statistics on all schema objects in a schema or database. It does not gather cluster statistics--you can use DBMS_STATS to gather statistics on the individual tables instead of the whole cluster.

The statistics-gathering operations can run either serially or in parallel. Index statistics are not gathered in parallel.

For partitioned tables and indexes, DBMS_STATS can gather separate statistics for each partition, as well as global statistics for the entire table or index. Similarly, for composite partitioning, DBMS_STATS can gather separate statistics for subpartitions, partitions, and the entire table or index. Depending on the SQL statement being optimized, the optimizer can choose to use either the partition (or subpartition) statistics or the global statistics.

DBMS_STATS gathers only statistics needed for cost-based optimization; it does not gather other statistics. For example, the table statistics gathered by DBMS_STATS include the number of rows, number of blocks currently containing data, and average row length, but not the number of chained rows, average free space, or number of unused data blocks.

See Also:

Oracle9i Supplied PL/SQL Packages and Types Reference for more information about the DBMS_STATS package
Oracle9i Data Cartridge Developer's Guide for more information about user-defined statistics

Gathering Statistics with the DBMS_STATS Package

Table 3-1 lists the procedures in the DBMS_STATS package for gathering statistics:

Table 3-1 Statistics Gathering Procedures in the DBMS_STATS Package

Procedure

Collects

GATHER_INDEX_STATS

Index statistics

GATHER_TABLE_STATS

Table, column, and index statistics

GATHER_SCHEMA_STATS

Statistics for all objects in a schema

GATHER_DATABASE_STATS

Statistics for all objects in a database

GATHER_SYSTEM_STATS

CPU and I/O statistics for the system

See Also:

Oracle9i Supplied PL/SQL Packages and Types Reference for syntax and examples of all DBMS_STATS procedures

Gathering System Statistics

System statistics enable the optimizer to consider a system's I/O and CPU performance and utilization. For each plan candidate, the optimizer computes estimates for I/O and CPU costs. It is important to know the system characteristics to pick the most efficient plan with optimal proportion between I/O and CPU cost.

System I/O characteristics depend on many factors and do not stay constant all the time. Using system statistics management routines, database administrators can capture statistics in the interval of time when the system has the most common workload. For example, database applications can process OLTP transactions during the day and run OLAP reports at night. Administrators can gather statistics for both states and activate appropriate OLTP or OLAP statistics when needed. This enables the optimizer to generate relevant costs with respect to available system resource plans.

When Oracle generates system statistics, it analyzes system activity in a specified period of time. Unlike table, index, or column statistics, Oracle does not invalidate already parsed SQL statements when system statistics get updated. All new SQL statements are parsed using new statistics. Oracle Corporation highly recommends that you gather system statistics.

The DBMS_STATS.GATHER_SYSTEM_STATS routine collects system statistics in a user-defined timeframe. You can also set system statistics values explicitly using DBMS_STATS.SET_SYSTEM_STATS. Use DBMS_STATS.GET_SYSTEM_STATS to verify system statistics.

Note:

You must have DBA privileges to update dictionary system statistics.

Example 3-1 shows database applications processing OLTP transactions during the day and running reports at night. First, system statistics must be collected. The values in this example are user-defined; in other words, you must determine an appropriate time interval and name for your environment.

Example 3-1 Generating System Statistics

Gather statistics during the day. Gathering ends after 720 minutes and is stored in the mystats table:

BEGIN

DBMS_STATS.GATHER_SYSTEM_STATS(

gathering_mode => 'interval',

interval => 720,

stattab => 'mystats',

statid => 'OLTP');

END;

Gather statistics during the night. Gathering ends after 720 minutes and is stored in the mystats table:

BEGIN

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/9599/viewspace-472897/，如需转载，请注明出处，否则将追究法律责任。