RAC数据库实例启动异常排查分析

重启RAC一个节点的数据库实例时,发现启动速度非常慢。同时连接RAC存活节点的业务也受影响。

通过对日志的分析,在启动数据库时,Reconfiguration速度慢,Reconfiguration后报错IPC Send timeout detected. Sender: ospid 53884 [oracle@test2 (LMD0)],从而出现了数据库实例组的节点驱逐;Wed Apr 13 19:28:02 2022
Instance termination initiated by instance 2 with reason 1.
Instance 2 received a reconfig event from its cluster manager indicating that this instance is supposed to be down
Please check instance 2's alert log and LMON trace file for more details.
Please also examine the CSS log files.
LMON (ospid: 47523): terminating the instance due to error 481

因此,需要排查进程出现IPC Send timeout的原因;这方面有BUG也可能有RAC节点负载原因,可以参考MOS文档中的排查步骤一项项的排查系统的信息:

Instance Evicted After LMON to LMON IPC Send timeout Due to Storage Issue (Doc ID 2080029.1)
    "ipc send timeout" Precedes Database Instance Crash or Eviction (Doc ID 1951216.1)
    While Evicting One of the Instance, the Remaining instances Terminated by LMON with "LMON is running too slowly and in the middle of reconfiguration" (Doc ID 1949505.1)

相关日志如下:

 
1.
2022-04-13 18:57:29.215 节点1集群软件人工重启成功,
数据库实例也启动成功,
Wed Apr 13 18:58:32 2022
QMNC started with pid=100, OS id=52025 
Completed: ALTER DATABASE OPEN /* db agent *//* {1:49652:2} */
 
 
2.节点2 RECONFIG过程中节点1异常
--节点2
Wed Apr 13 19:22:26 2022
Starting ORACLE instance (normal)
 
--节点1:
Wed Apr 13 19:28:00 2022
IPC Send timeout detected. Receiver ospid 47526 [
Wed Apr 13 19:28:00 2022
Errors in file /oracle/app/diag/rdbms/testnew/test1/trace/test1_lmd0_47526.trc:
Wed Apr 13 19:28:02 2022
Instance termination initiated by instance 2 with reason 1.
Instance 2 received a reconfig event from its cluster manager indicating that this instance is supposed to be down
Please check instance 2's alert log and LMON trace file for more details.
Please also examine the CSS log files.
LMON (ospid: 47523): terminating the instance due to error 481
System state dump requested by (instance=1, osid=47523 (LMON)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/diag/rdbms/testnew/test1/trace/test1_diag_47507_20220413192802.trc
Wed Apr 13 19:28:03 2022
ORA-1092 : opitsk aborting process
Instance terminated by LMON, pid = 47523
--节点2:
Wed Apr 13 19:28:00 2022
IPC Send timeout detected. Sender: ospid 53884 [oracle@test2 (LMD0)]
Receiver: inst 1 binc 429458022 ospid 47526
IPC Send timeout to 1.0 inc 4 for msg type 65521 from opid 11
Wed Apr 13 19:28:02 2022
Communications reconfiguration: instance_number 1
Wed Apr 13 19:28:02 2022
Dumping diagnostic data in directory=[cdmp_20220413192802], requested by (instance=1, osid=47523 (LMON)), summary=[abnormal instance termination].
Reconfiguration started (old inc 4, new inc 8)
#############################
3.要查节点2启动,Reconfiguration过程中,IPC Send timeout 的原因--这也是节点2人工启动时感觉很慢的原因 ;同时节点1在19:34启动时报了ORA-00240错误 ,要综合检查一下当时的网络及存储情况以及节点的负载等,参考MOS上文档。
Wed Apr 13 19:34:49 2022
Errors in file /oracle/app/diag/rdbms/testnew/test1/trace/test1_dbw0_247773.trc  (incident=168173):
ORA-00240: control file enqueue held for more than 120 seconds
Instance Evicted After LMON to LMON IPC Send timeout Due to Storage Issue (Doc ID 2080029.1)
    "ipc send timeout" Precedes Database Instance Crash or Eviction (Doc ID 1951216.1)
    While Evicting One of the Instance, the Remaining instances Terminated by LMON with "LMON is running too slowly and in the middle of reconfiguration" (Doc ID 1949505.1)

  • 3
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值