1月3日上午10时,一客户数据库实例1重启,当业务切换到实例2时,实例2也重启。
故障分析
日志分析:
下面信息摘取自LMON trace
kjfmrcvrchk:receiver LMS[4] has no heartbeat for 251 sec (1357178400.1357178651.0).
kjfmrcvrchk:receiver LMS[4] not in running mode
kjfmrcvrchk:Dumping callstack of lms4
Submittingasynchronized dump request [20]
kjfmrcvrchk:receivers are not healthy. kill instance.
ksuitm: waitingup to [5] seconds before killing DIAG(13789)
从以上LMON TRACE中可以看出10:04:12检测到进程LMS失去心跳251秒,5秒后将kill实例。因此从实例1的告警日志中可以看出,数据库在10:04:17时报LMON detects unhealthy receivers,被LMON进程kill的信息,详细信息如下:
Thu Jan 0309:55:11 EAT 2013
Thread 1advanced to log sequence 9754 (LGWR switch)
Current log# 1 seq# 9754 mem# 0:/vghn03/oradata/esshn/vghn03_1_rd12.log
Current log# 1 seq# 9754 mem# 1:/vghn02/oradata/esshn/vghn02_1_rd11.log
Thu Jan 0310:04:17 EAT 2013
LMON detectsunhealthy receivers.
Please checkLMON and DIAG trace files for detail.
Thu Jan 0310:04:17 EAT 2013
LMON (ospid:13793) is terminating the instance.
LMON:terminating instance due to error 481