墨墨导读:一套11 G r2(11.2.0.3)版本的2节点RAC adg环境,节点1因为硬件原因异常crash(apply redo 节点), 但同时实例2上的应用也都断开了(原来都是open)。
今天在一套11 G r2(11.2.0.3)版本的2节点RAC adg环境,节点1因为硬件原因异常crash(apply redo 节点), 但同时实例2上的应用也都断开了(原来都是open),adg上是有连接一些只读业务,而且节点2 db alert log未发现明显手动close 实例的日志,并且是自动切换到了mount状态,RAC不是应该高可用吗?为什么死一个节点另外的节点也要跟着受影响?
这里如果检查实例2状态其实是“mount”, 不知道有多少人知道数据库其实是有alter database close命令的,但是在一个实例的生命周期内手动close,也就无法再open,需要重启实例(pdb除外), 并且刚才也说了,实例2 alert没有close迹象,下面附一段
dg instance 2 db alert log
2020-09-03 00:10:39.601000 +08:00
ORA-01555 caused by SQL statement below (SQL ID: 4snkhx5vxrmv2, Query Duration=7340 sec, SCN: 0x0f46.c04d2402):
select....
2020-09-04 11:35:29.504000 +08:00
Archived Log entry 105312 added for thread 2 sequence 200520 ID 0x1fcb56a7 dest 1:
2020-09-08 14:16:17.954000 +08:00
Reconfiguration started (old inc 22, new inc 24)
List of instances:
2 (myinst: 2)
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 23 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 34 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 19 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
2020-09-08 14:16:19.005000 +08:00
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
2020-09-08 14:16:22.014000 +08:00
ARC1: Becoming the active heartbeat ARCH
ARC1: Becoming the active heartbeat ARCH
2020-09-08 14:16:23.328000 +08:00
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
20