曾经有个非常牛的公司面试DBA,你们数据库如何收集统计信息?自动收集?那说一下自动收集的算法?不知道怎么敢用自动收集? 有点小尴尬….
关于自动统计信息收集,以下是11gR2的官方解释
http://docs.oracle.com/cd/E11882_01/server.112/e16638/stats.htm#PFGRF003
Oracle recommends that you enable automatic optimizer statistics collection. In this case, the database automatically collects optimizer statistics for tables with absent or stale statistics. If fresh statistics are required for a table, then the database collects them both for the table and associated indexes.
Automatic collection eliminates many manual tasks associated with managing the optimizer. It also significantly reduces the risks of generating poor execution plans because of missing or stale statistics.
Automatic optimizer statistics collection calls the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure. This internal procedure operates similarly to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The main difference is that GATHER_DATABASE_STATS_JOB_PROC prioritizes database objects that require statistics, so that objects that most need updated statistics are processed first, before the maintenance window closes.
Oracle推荐开启此功能, 相较于gather_database_stats,gather_database_stats_job_proc会优先收集急需统计信息的对象,以确保在维护窗口关闭前完成;
该job默认自动运行,也可通过调用系统包手工干预
BEGIN
DBMS_AUTO_TASK_ADMIN.DISABLE(--手工禁止该job运行
client_name => 'auto optimizer stats collection',
operation => NULL,
window_name => NULL);
END;
/
BEGIN
DBMS_AUTO_TASK_ADMIN.ENABLE(--手工激活
client_name => 'auto optimizer stats collection',
operation => NULL,
window_name => NULL);
END;
/
在10g中,该job名为gather_stats_job,背后也是调用DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC,可通过dba_scheduler_jobs查看其详细信息,也可手工将其禁止
BEGIN
DBMS_SCHEDULER.DISABLE('GATHER_STATS_JOB');
END;
/
维护窗口默认为周一至周五的10:00PM—6:00AM,周六周日则全天进行, stop_on_window_close控制维护窗口关闭时gather_stats_job是否还可继续进行;
至于收集算法,官方文档描述如下
The GATHER_DATABASE_STATS_JOB_PROC procedure collects statistics on database objects when the object has no previously gathered statistics or the existing statistics are stale because the underlying object has been modified significantly (more than 10% of the rows)
主要针对当前没有统计信息或者数据修改幅度超过10%的object,深究一下,oracle怎么知道某个object的数据修改超过10%?
原来oracle专门提供了一个table monitoring facility,专门监控记录表的DML和truncate操作,以及上次信息收集的时间,但须statistics_level设置为typical/all时才能使用;
由user_tab_modifications负责记录相应信息,一次数据操作后可能要等几分钟才能在该视图上显现,也可调用dbms_stats.flush_database_monitoring_info立即刷新内存中的数据;
--10205
SQL> create table tmp_object as select object_id,object_name from dba_objects where 1=0;
Table created.
SQL> insert into tmp_object select object_id,object_name from dba_objects where rownum <=1000;
1000 rows created.
SQL> commit;
Commit complete.
SQL> exec dbms_stats.flush_database_monitoring_info;
PL/SQL procedure successfully completed.
SQL> select INSERTS,UPDATES,DELETES,TIMESTAMP,TRUNCATED,DROP_SEGMENTS from dba_tab_modifications where TABLE_NAME='TMP_OBJECT';
INSERTS UPDATES DELETES TIMESTAMP TRU DROP_SEGMENTS
---------- ---------- ---------- ------------------- --- -------------
1000 0 0 2012-07-23 03:45:23 NO 0
SQL> truncate table tmp_object;
Table truncated.
SQL> select INSERTS,UPDATES,DELETES,TIMESTAMP,TRUNCATED,DROP_SEGMENTS from dba_tab_modifications where TABLE_NAME='TMP_OBJECT';
INSERTS UPDATES DELETES TIMESTAMP TRU DROP_SEGMENTS
---------- ---------- ---------- ------------------- --- -------------
1000 0 0 2012-07-23 03:49:55 YES 0
待到下次gather_stat_job运行时,若计算出表数据修改超过10%,则会对其重新收集统计信息,具体案例可参照maclean的文章
gather_stats_job很好很强大,可以应付绝大多数的情形,但是也有特例,因为其只运行于特定的维护窗口,对于那些数据经常发生大波动的表不太适用,比如
在业务高峰期(相对晚上的维护窗口而言)进行delete/truncate的表,或者有大规模bulk load的表(更新比极易超过10%),不能每次都等到维护窗口再收集,对此可用以下方法应对:
1
使用dynamic sampling,将该表统计信息设定为null并锁定,同时须将optimizer_dynamic_sampling>=2,每次sql运行时都会动态收集该表统计信息;
2
将该表统计信息设置为某个标准值然后锁定,因为gather_stats_job运行时表的数据可能与业务高峰期差异很大进而会收集出错误的统计信息;
3
将统计信息收集脚本嵌套在bulk load程序中,每次load结束后立即进行信息收集
Note:
若statistics_level=basic,则table monitoring facility被禁止,automatic statistics gathering将无法探知stale statistics,此时需要人工收集;
同时由于table monitoring facility的存在,不管过去多久,只要某个表数据不改变gather_stats_job就一直不会对其进行信息收集,从而节省了系统资源;
--10g查看gather_stats_job详细信息, 11g时改job被废除,改为auto task,但是仍旧调用gather_database_stats_job_proc
SQL> select JOB_NAME,PROGRAM_NAME,SCHEDULE_NAME,JOB_CLASS from dba_scheduler_jobs where job_name='GATHER_STATS_JOB';
JOB_NAME PROGRAM_NAME SCHEDULE_NAME JOB_CLASS
------------------------------ ------------------------------ ------------------------------ ------------------------------
GATHER_STATS_JOB GATHER_STATS_PROG MAINTENANCE_WINDOW_GROUP AUTO_TASKS_JOB_CLASS
SQL> select PROGRAM_ACTION from dba_scheduler_programs where PROGRAM_NAME='GATHER_STATS_PROG';
PROGRAM_ACTION
--------------------------------------------------
dbms_stats.gather_database_stats_job_proc
SQL> select * from dba_scheduler_wingroup_members where WINDOW_GROUP_NAME='MAINTENANCE_WINDOW_GROUP';
WINDOW_GROUP_NAME WINDOW_NAME
------------------------------ ------------------------------
MAINTENANCE_WINDOW_GROUP WEEKNIGHT_WINDOW
MAINTENANCE_WINDOW_GROUP WEEKEND_WINDOW
SQL> select WINDOW_NAME,REPEAT_INTERVAL,DURATION from dba_scheduler_windows where WINDOW_NAME in ('WEEKNIGHT_WINDOW','WEEKEND_WINDOW');
WINDOW_NAME REPEAT_INTERVAL DURATION
---------------- ---------------------------------------------------------------------- --------------------
WEEKNIGHT_WINDOW freq=daily;byday=MON,TUE,WED,THU,FRI;byhour=22;byminute=0; bysecond=0 +000 08:00:00
WEEKEND_WINDOW freq=daily;byday=SAT;byhour=0;byminute=0;bysecond=0 +002 00:00:00
此外可通过DBA_SCHEDULER_JOB_RUN_DETAILS查看job运行日志
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/15480802/viewspace-736421/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/15480802/viewspace-736421/