dbms_stats.gather_table_stats与analyze table 的区别

最新推荐文章于 2022-01-08 15:10:08 发布

tswisdom

最新推荐文章于 2022-01-08 15:10:08 发布

阅读量4k

点赞数

分类专栏： Oracle Database

Oracle Database 专栏收录该内容

82 篇文章

订阅专栏

本文探讨了Oracle中DBMS_STATS与Analyze在收集表统计数据方面的区别与优劣。DBMS_STATS支持并行处理、自动分析及更精确的全局统计信息收集，尤其适用于分区表。同时，Analyze在验证结构和收集链式行方面仍有其独特价值。

dbms_stats.gather_table_stats与analyze table 的区别

Analyze Statement
The ANALYZE statement can be used to gather statistics for a specific table, index or cluster. The statistics can be computed exactly, or estimated based on a specific number of rows, or a percentage of rows:

ANALYZE TABLE employees COMPUTE STATISTICS;
ANALYZE INDEX employees_pk COMPUTE STATISTICS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 100 ROWS;
ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;
DBMS_UTILITY
The DBMS_UTILITY package can be used to gather statistics for a whole schema or database. Both methods follow the same format as the analyze statement:

EXEC DBMS_UTILITY.analyze_schema('SCOTT','COMPUTE');
EXEC DBMS_UTILITY.analyze_schema('SCOTT','ESTIMATE', estimate_rows => 100);
EXEC DBMS_UTILITY.analyze_schema('SCOTT','ESTIMATE', estimate_percent => 15);

EXEC DBMS_UTILITY.analyze_database('COMPUTE');
EXEC DBMS_UTILITY.analyze_database('ESTIMATE', estimate_rows => 100);
EXEC DBMS_UTILITY.analyze_database('ESTIMATE', estimate_percent => 15);
DBMS_STATS
The DBMS_STATS package was introduced in Oracle 8i and is Oracles preferred method of gathering object statistics. Oracle list a number of benefits to using it including parallel execution, long term storage of statistics and transfer of statistics between servers. Once again, it follows a similar format to the other methods:

EXEC DBMS_STATS.gather_database_stats;
EXEC DBMS_STATS.gather_database_stats(estimate_percent => 15);

EXEC DBMS_STATS.gather_schema_stats('SCOTT');
EXEC DBMS_STATS.gather_schema_stats('SCOTT', estimate_percent => 15);

EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES');
EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES', estimate_percent => 15);

EXEC DBMS_STATS.gather_index_stats('SCOTT', 'EMPLOYEES_PK');
EXEC DBMS_STATS.gather_index_stats('SCOTT', 'EMPLOYEES_PK', estimate_percent => 15);
This package also gives you the ability to delete statistics:

EXEC DBMS_STATS.delete_database_stats;
EXEC DBMS_STATS.delete_schema_stats('SCOTT');
EXEC DBMS_STATS.delete_table_stats('SCOTT', 'EMPLOYEES');
EXEC DBMS_STATS.delete_index_stats('SCOTT', 'EMPLOYEES_PK');

--------------------------------------------------------------------------------------

自从Oracle8.1.5引入dbms_stats包，Experts们便推荐使用dbms_stats取代analyze。理由如下

dbms_stats可以并行分析
dbms_stats有自动分析的功能(alter table monitor )
analyze 分析统计信息的不准确some times

1,2好理解，且第2点实际上在VLDB中是最吸引人的；3以前比较模糊，看了metalink236935.1 解释，analyze在分析Partition表的时候，有时候会计算出不准确的Global statistics .

原因是，dbms_stats会实在的去分析表全局统计信息（当指定参数）；而analyze是将表分区（局部）的statistics 汇总计算成表全局statistics ,可能导致误差。

如果想分析整个用户或数据库，还可以采用工具包，可以并行分析
Dbms_utility(8i以前的工具包)
Dbms_stats(8i以后提供的工具包)
如
dbms_stats.gather_schema_stats(User,estimate_percent=>100,cascade=> TRUE);
dbms_stats.gather_table_stats(User,TableName,degree => 4,cascade => true);

这是对命令与工具包的一些总结

1、对于分区表，建议使用DBMS_STATS，而不是使用Analyze语句。
a) 可以并行进行，对多个用户，多个Table
b) 可以得到整个分区表的数据和单个分区的数据。
c) 可以在不同级别上Compute Statistics：单个分区，子分区，全表，所有分区
d) 可以倒出统计信息
e) 可以用户自动收集统计信息

2、DBMS_STATS的缺点
a) 不能Validate Structure
b) 不能收集CHAINED ROWS, 不能收集CLUSTER TABLE的信息，这两个仍旧需要使用Analyze语句。
c) DBMS_STATS 默认不对索引进行Analyze，因为默认Cascade是False，需要手工指定为True

3、对于oracle 9里面的External Table，Analyze不能使用，只能使用DBMS_STATS来收集信息。

-----------------------------------------------------------------
10G的文档是这么说的：
Do not use the COMPUTE and ESTIMATE clauses of ANALYZE to collect optimizer statistics. These clauses are supported for backward compatibility. Instead, use the DBMS_STATS package, which lets you collect statistics in parallel, collect global statistics for partitioned objects, and fine tune your statistics collection in other ways. The cost-based optimizer, which depends upon statistics, will eventually use only statistics that have been collected by DBMS_STATS

analyze的功能已经明确：
Use the ANALYZE statement (rather than DBMS_STATS) for statistics collection not related to the cost-based optimizer :

To use the VALIDATE or LIST CHAINED ROWS clauses

To collect information on freelist blocks

在收集与CBO优化器不相关的统计信息的时候ANALYZE语句要优于DBMS_STATS包

-----------------------------------
EX:
begin
        for owner in (select username from dba_users where username not in ('SYS','SYSTEM'))
        loop
         dbms_output.disable;
         dbms_output.enable(1000000);
         dbms_output.put_line('Schema: '||owner.username);
         select sysdate into start_time from dual;
         dbms_output.put_line('Analyze start from : '||start_time);
         dbms_stats.gather_schema_stats(ownname => owner.username, estimate_percent => 20, block_sample=> true, cascade=>true);
         select sysdate into end_time from dual;
         dbms_output.put_line('Analyze complete at : '||end_time);
         dbms_output.put_line('---------------------------');
        end loop;

dbms_stats.gather_table_stats(ownname          =>,
                                tabname          =>,
                                partname         =>,
                                estimate_percent =>,
                                block_sample     =>,
                                method_opt       =>,
                                degree           =>,parallel degree(并行收集维度) 看CPU个数
                                granularity      =>,
                                cascade          =>,true is also gather columns and index’s statistics;
                                no_invalidate    =>);
---------------------------------------------------------------------------------------------------------------------------------------------------------
Oracle DBMS_STATS 用法例子
2008-01-16 13:56
GeneralNote: The drawback to collecting REDUNDANT column stats is that they have to be loaded into the dictionary cache, and they have to be considered when the query is optimised. This waste shared pool space and CPU.

On the other hand, if the statistics add value about the data - which does mean EVEN on unindexed columns - queries can run faster.

Most systems should have only a few column histograms in place, but there is no obvious requirement that they should only be on indexed columns.Source{ORACLE_HOME}/rdbms/admin/dbmsstat.sqlFirst Available8.1.5Character SetsFor DBMS_STATS to run properly may require the US numeric separators. If problems exist try the following:

alter session set NLS_NUMERIC_CHARACTERS='.,';
ConstantsConstantData TypeUsageAUTO_CASCADEBOOLEANWhether to collect statistics for indexes or notAUTO_DEGREENUMBERSelect the degree of parallelismAUTO_INVALIDATEBOOLEANDecide when to invalidate dependent cursorsAUTO_SAMPLE_SIZENUMBERIndicate that auto-sample size algorithms should be usedDEFAULT_DEGREENUMBERUsed to determine the system default degree of parallelism
Default ConstantsConstantData TypeValueDEFAULT_CASCADEBOOLEANNULLDEFAULT_DEGREE_VALUENUMBER32766DEFAULT_ESTIMATE_PERCENTNUMBER101DEFAULT_METHOD_OPTVARCHAR2(1)'Z'DEFAULT_NO_INVALIDATEBOOLEANNULLDEFAULT_GRANULARITYVARCHAR2(1)'Z'DEFAULT_PUBLISHBOOLEANTRUEDEFAULT_INCREMENTALBOOLEANFALSEDEFAULT_STALE_PERCENTNUMBER10DEFAULT_AUTOSTATS_TARGETVARCHAR2(1)'Z'
Method_opt ConstantsAccepts:

* FOR ALL [INDEXED | HIDDEN] COLUMNS [size_clause]
* FOR COLUMNS [size clause] column|attribute [size_clause] [,column|attribute [size_clause]...]

size_clause is defined as size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}

- integer : Number of histogram buckets: Range [1,254].
- REPEAT : Collects histograms only on columns that already have
histograms.
- AUTO : Determines the columns to collect histograms based on
data distribution and the workload of the columns.
- SKEWONLY : Determines the columns to collect histograms based on
the data distribution of the columns.

The default is FOR ALL COLUMNS SIZE AUTO.The default value can be changed using the SET_PARAM Procedure.Degree ConstantsDegree of parallelism. The default for degree is NULL. The default value can be changed using the SET_PARAM Procedure. NULL means use the table default value specified by the DEGREE clause in the CREATE TABLE or ALTER TABLE statement. Use the constant DBMS_STATS.DEFAULT_DEGREE to specify the default value based on the initialization parameters. The AUTO_DEGREE value determines the degree of parallelism automatically. This is either 1 (serial execution) or DEFAULT_DEGREE (the system default value based on number of CPUs and initialization parameters) according to size of the object.
Granularity ConstantsGranularity of statistics to collect (only pertinent if the table is partitioned).
ConstantDescriptionALLGathers all (subpartition, partition, and global) statisticsAUTODetermines the granularity based on the partitioning type. This is the default valueDEFAULTGathers global and partition-level statistics. This option is obsolete, and while currently supported, it is included in the documentation for legacy reasons only. You should use the 'GLOBAL AND PARTITION' for this functionalityGLOBALGathers global statisticsGLOBAL AND PARTITIONgathers the global and partition level statistics. No subpartition