aix+oracle11g
今天项目上报应用客户端无法登陆。根据以往经验肯定是由与原始性能表过大,导致对这些大表
做delete操作引起的。
登陆主机,执行topas查看
CPU User% Kern% Wait% Idle%
ALL 11.4 2.3 31.1 55.2
Network KBPS I-Pack O-Pack KB-In KB-OUT
Total 2.1 10.5 9.5 1.1 1.0
Disk Busy% KBPS TPS KB-Read KB-Writ
Total 100.0 5593.0 618.0 4849.7 743.4 #隔一段时间查询后,KB-Writ有达到几千的时候,说明在busy=100的时候,读写都很高
查看select * from dba_jobs_runnings;发现清理数据库的job在运行。经查找该job在删除最大的原始性能表。
当时怀疑disk busy100%肯定是此job引起的,从而导致应用客户端在登陆时无法获取数据库数据。
立即停止该job的运行,disk下降了几个百分点,但应用可以登录,且速度也不慢。
应用问题已经解决,但disk busy仍然很高,当时为解决此问题,重启了数据库。从日志中发现重要信息
Tue Nov 16 13:59:59 2010
SMON: disabling tx recovery ##############
SMON: disabling cache recovery ##############
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Thread 1 closed at log sequence 1425
Successful close of redo thread 1
Tue Nov 16 14:04:01 2010
Completed: ALTER DATABASE CLOSE NORMAL
ALTER DATABASE DISMOUNT
Completed: ALTER DATABASE DISMOUNT
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
ARCH: Archival disabled due to shutdown: 1089
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Tue Nov 16 14:04:03 2010
Stopping background process VKTM:
Tue Nov 16 14:04:09 2010
Instance shutdown complete
Tue Nov 16 14:05:23 2010
ALTER DATABASE OPEN
Thread 1 opened at log sequence 1425
Current log# 3 seq# 1425 mem# 0: /oracle/oradata/HQNMS/redo03.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
SMON: enabling cache recovery ###############
Successfully onlined Undo Tablespace 2.
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery ##################
Database Characterset is ZHS16GBK
从日志中看到数据库在执行shutdown immediate时暂停了事物恢复,在数据库启动后又开启了事物恢复。这也正应对了当前现象,
停止job后,数据库要对大数据量的回滚,导致disk busy超高。数据库关闭后,disk busy降为0,数据库启动busy升到90多,同时ora_p进程有10多个
,在/oracle/diag/oracle_sid/rdbms/trace,生成大量trc文件,文件内容:
*** 2010-11-15 10:40:44.035
*** SESSION ID:(311.14307) 2010-11-15 10:40:44.043
*** SERVICE NAME:(SYS$BACKGROUND) 2010-11-15 10:40:44.043
Parallel Transaction recovery server caught exception 10388
Parallel Transaction recovery server caught exception 10388
*** 2010-11-15 10:56:37.872
*** SESSION ID:(311.14318) 2010-11-15 10:56:37.879
** SERVICE NAME:(SYS$BACKGROUND) 2010-11-15 10:56:37.879
经过30分钟后,事物恢复完毕,disk busy下降到正常水平20%左右,ora_p进程恢复到3个。
此现象在后来又手动运行清理数据的存储过程,并手动kill掉,再现了上述现象。清理开始-->disk busy升高-->kill清理--->disk busy持续高40分钟,同时trc文件
3分钟生成一个--->40分钟后disk busy恢复到正常水平,trc停止生成,ora_p恢复到正常个数。
最后找研发确认清理的存储过程。
由以上发现oracle11g数据库在执行shutdown immediate时暂停了事物恢复,在数据库启动后又开启了事物恢复,此机制不知是否只针对大数据量的事物恢复有效?
在10g不知道是否也有此机制?
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/16396821/viewspace-678283/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/16396821/viewspace-678283/