1. 自动收集Statistic
- Oracle 的Automatic Statistics Gathering 是通过Scheduler 来实现收集和维护的。
- 自动收集统计信息的Job GATHER_STATS_JOB 收集数据库所有对象的2种统计信息:
(1)Missing statistics(统计信息缺失)
(2)Stale statistics(统计信息陈旧)
默认情况下,job 会在每天晚上10到早上6点和周末全天开启。该过程首先检测统计信息缺失和陈旧的对象。然后确定优先级,再开始进行统计信息。
Scheduler Job的 stop_on_window_close 属性控制GATHER_STATS_JOB 是否继续。该属性默认值为True. 如果该值设置为False,那么GATHER_STATS_JOB 会中断, 而没有收集完的对象将在下次启动时继续收集。 - Gather_stats_job调用dbms_stats.gather_database_stats_job_proc过程来收集statistics的信息。
该过程收集对象statistics的条件如下:
(1)对象的统计信息之前没有收集过。
(2)当对象有超过10%的rows 被修改,此时对象的统计信息也称为stale statistics。 - 查询统计信息的schedule job
SQL> select job_name, program_name,enabled,stop_on_window_close from dba_scheduler_jobs where job_name = ‘gather_stats_job’;
JOB_NAME | PROGRAM_NAME | ENABL | STOP_ON_WINDOW_CLOSE |
---|---|---|---|
GATHER_STATS_JOB | GATHER_STATS_PROG | TRUE | TRUE |
- 停用自动收集统计信息Job
DBMS_SCHEDULER.DISABLE(‘GATHER_STATS_JOB’);
2. 统计信息应用的包及过程
DBMS_STATS.GATHER_TABLE_STATS
3. 统计信息收集的内容
- Table statistics
number of rows
number of blocks
average row length
SQL> SELECT OWNER, TABLE_NAME, NUM_ROWS, BLOCKS, EMPTY_BLOCKS, AVG_SPACE, CHAIN_CNT, AVG_ROW_LEN FROM USER_TABLES;
- Column statistics
number of distinct values (NDV) in column
number of nulls in column
data distribution (histogram)
SQL> SELECT OWNER, TABLE_NAME, COLUMN_NAME, NUM_DISTINCT, LOW_VALUE, HIGH_VALUE, DENSITY, NUM_NULLS, NUM_BUCKETS, HISTOGRAM FROM ALL_TAB_COLUMNS;
- Index statistics
number of leaf blocks
levels
clustering factor
SQL> SELECT OWNER, INDEX_NAME, TABLE_NAME, TABLE_OWNER, BLEVEL, LEAF_BLOCKS, DISTINCT_KEYS, AVG_LEAF_BLOCKS_PER_KEY, AVG_DATA_BLOCKS_PER_KEY, CLUSTERING_FACTOR FROM ALL_INDEXES;
- System statistics
I/O performance and utilization
CPU performance and utilization
4. 统计信息DBMS_STATS.GATHER_TABLE_STATS的语法
SQL> DBMS_STATS.GATHER_TABLE_STATS (ownname VARCHAR2, tabname VARCHAR2, partname VARCHAR2,estimate_percent NUMBER,block_sample BOOLEAN,method_opt VARCHAR2,degree NUMBER, granularity VARCHAR2, cascade BOOLEAN,stattab VARCHAR2,statid VARCHAR2,statown VARCHAR2,no_invalidateBOOLEAN, force BOOLEAN);
参数说明:
- ownname: 要分析表的拥有者
- tabname: 要分析的表名.
- partname: 分区的名字,只对分区表或分区索引有用.
- estimate_percent: 采样行的百分比,取值范围[0.000001,100],null为全部分析,不采样. 常量:DBMS_STATS.AUTO_SAMPLE_SIZE是默认值,由oracle决定最佳取采样值.
- block_sample: 是否用块采样代替行采样.
- method_opt: 决定histograms信息是怎样被统计的.method_opt的取值如下(默认值为FOR ALL COLUMNS SIZE AUTO):
- for all columns: 统计所有列的histograms.
- for all indexed columns: 统计所有indexed列的histograms.
- for all hidden columns: 统计你看不到列的histograms
- for columns SIZE | REPEAT | AUTO | SKEWONLY:统计指定列的histograms.N的取值范围[1,254]; REPEAT上次统计过的histograms;AUTO由oracle决定N的大小;SKEWONLY 选项会耗费大量处理时间,因为它要检查每个索引中的每个列的值的分布情况。
- degree: 决定并行度.默认值为null.
- granularity: Granularity of statistics to collect ,only pertinent if the table is partitioned.
- cascade: 收集索引的信息.默认为FALSE.
- stattab: 指定要存储统计信息的表,statid如果多个表的统计信息存储在同一个stattab中用于进行区分.statown存储统计信息表的拥有者.以上三个参数若不指定,统计信息会直接更新到数据字典.
- no_invalidate: Does not invalidate the dependent cursors if set to TRUE. The procedure invalidates the dependent cursors immediately if set to FALSE.
- force: 即使表锁住了也收集统计信息.
5. 统计信息的存储位置
- 统计信息的收集内容
(1)表自身的分析: 包括表中的行数,数据块数,行长等信息。
(2)列的分析:包括列值的重复数,列上的空值,数据在列上的分布情况。
(3)索引的分析: 包括索引叶块的数量,索引的深度,索引的聚合因子等。 - 统计信息存放的数据字典
DBA_TABLES
DBA_OBJECT_TABLES
DBA_TAB_STATISTICS
DBA_TAB_COL_STATISTICS
DBA_TAB_HISTOGRAMS
DBA_INDEXES
DBA_IND_STATISTICS
DBA_CLUSTERS
DBA_TAB_PARTITIONS
DBA_TAB_SUBPARTITIONS
DBA_IND_PARTITIONS
DBA_IND_SUBPARTITIONS
DBA_PART_COL_STATISTICS
DBA_PART_HISTOGRAMS
DBA_SUBPART_COL_STATISTICS
DBA_SUBPART_HISTOGRAMS
6. 统计直方图
SQL> CREATE TABLE HISTOGRAM_TEST(ID NUMBER);
SQL> DECLARE
2 I NUMBER;
3 BEGIN
4 FOR I IN 1…100 LOOP
5 INSERT INTO HISTOGRAM_TEST VALUES(i);
6 END LOOP;
7 END;
8 /
SQL> COMMIT;
SQL> EXEC DBMS_STATS.GATHER_table_STATS (OWNNAME => ‘SCOTT’, TABNAME => ‘HISTOGRAM_TEST’, METHOD_OPT => ‘FOR COLUMNS SIZE 10 ID’);
–10是buckets的数量,ID 是收集列名。
SQL> SELECT COLUMN_NAME , NUM_DISTINCT, NUM_BUCKETS, HISTOGRAM FROM DBA_TAB_COL_STATISTICS WHERE TABLE_NAME = ‘HISTOGRAM_TEST’ AND COLUMN_NAME = ‘ID’;
COLUMN_NAME | NUM_DISTINCT | NUM_BUCKETS | HISTOGRAM |
---|---|---|---|
ID | 100 | 10 | HEIGHT BALANCED |
SQL> SELECT ENDPOINT_NUMBER, ENDPOINT_VALUE FROM USER_HISTOGRAMS
WHERE TABLE_NAME = ‘HISTOGRAM_TEST’ and COLUMN_NAME = ‘ID’ ORDER BY ENDPOINT_NUMBER;
7. 其他统计信息的收集及删除设置
- 其他统计信息的收集
Procedure | Collects |
---|---|
GATHER_DICTIONARY_STATS | SYSTEM SCHEMAS(including SYS,SYSTEM,CTXSYS,DRSYS) |
GATHER_INDEX_STATS | Index statistics |
GATHER_TABLE_STATS | Table, column, and index statistics |
GATHER_SCHEMA_STATS | Statistics for all objects in a schema |
GATHER_DICTIONARY_STATS | Statistics for all dictionary objects |
GATHER_DATABASE_STATS | Statistics for all objects in a database |
- 删除统计信息
DELETE_TABLE_STATS