RAC环境出现Waiting for instances to leave信息(一)

RAC环境的其中一个节点上的alert文件中,出现了这个信息:Waiting for instances to leave

这一篇描述问题的发现过程。

 

 

首先描述一下问题的发现过程。程序员报告RAC环境影响比较缓慢,而这时同事正好在这个RAC环境的一个节点上检查RMAN的日志,发现RMAN连接数据库很慢,半天都没有响应。而这时,另一个节点几乎无法访问了,一个普通的ssh登陆都需要1分钟以上,更不要说执行命令了。

检查当前还可以访问的节点的alert文件,发现了上面的Waiting for instances to leave信息:

Mon Jul 27 15:52:14 2009
Waiting for instances to leave:
1

由于这个问题产生的原因比较复杂,为了将问题描述清楚,还需要重头说起。

首先是硬件维护任意发现RAC服务器的其中一个节点DOWN机,这个服务器无法通过远程登陆。

在数据库中检查,节点1关闭,节点2正常运行。从节点2上的alert文件中,可以看到,节点2对数据库进行了一些恢复的操作,并成功接管了全部数据库的管理操作:

Fri Jul 24 16:25:09 2009
Reconfiguration started (old inc 4, new inc 6)
List of nodes:
 1
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Fri Jul 24 16:25:09 2009
 LMS 0: 6 GCS shadows cancelled, 0 closed
Fri Jul 24 16:25:09 2009
 LMS 1: 14 GCS shadows cancelled, 1 closed
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
Fri Jul 24 16:25:11 2009
Instance recovery: looking for dead threads
Fri Jul 24 16:25:11 2009
Beginning instance recovery of 1 threads
Fri Jul 24 16:25:12 2009
 LMS 0: 378032 GCS shadows traversed, 0 replayed
Fri Jul 24 16:25:12 2009
 LMS 1: 381728 GCS shadows traversed, 0 replayed
Fri Jul 24 16:25:12 2009
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Reconfiguration complete
Fri Jul 24 16:25:14 2009
 parallel recovery started with 7 processes
Fri Jul 24 16:25:15 2009
Started redo scan
Fri Jul 24 16:25:15 2009
Completed redo scan
 3990 redo blocks read, 986 data blocks need recovery
Fri Jul 24 16:25:15 2009
Started redo application at
 Thread 1: logseq 1678, block 572689
Fri Jul 24 16:25:15 2009
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1678 Reading mem 0
  Mem# 0: /dev/vx/rdsk/datavg/newtrade_redo1_1_1_1g
  Mem# 1: /dev/vx/rdsk/datavg/newtrade_redo1_1_2_1g
Fri Jul 24 16:25:15 2009
Completed redo application
Fri Jul 24 16:25:16 2009
Completed instance recovery at
 Thread 1: logseq 1678, block 576679, scn 1112404710247
 862 data blocks read, 1014 data blocks written, 3990 redo blocks read
Switch log for thread 1 to sequence 1679

随后,硬件维护人员在现场启动了节点1的服务器,在启动的时候似乎报了一个和内存有关的错误(我不在现场,听说的),随后检查没有发现系统的错误信息。

检查Oraclealert信息,没有发现任何的错误。怀疑节点突然崩溃和硬件异常有关。

检查系统的dmest信息,也没有找到系统异常DOWN机的原因。

由于没有找到错误,在节点1启动后,将节点1上数据库的实例1也启动了。

但是没过多长时间,就碰到了前面描述的错误。节点2alert文件中详细的错误信息如下:

Mon Jul 27 15:20:42 2009
MMNL absent for 1202 secs; Foregrounds taking over
Mon Jul 27 15:37:40 2009
MMNL absent for 1599 secs; Foregrounds taking over
Mon Jul 27 15:48:19 2009
IPC Send timeout detected.Sender: ospid 2284
Receiver: inst 1 binc 27438 ospid 3739
Mon Jul 27 15:49:24 2009
IPC Send timeout detected.Sender: ospid 5165
Receiver: inst 1 binc 27438 ospid 3739
Mon Jul 27 15:49:31 2009
IPC Send timeout detected.Sender: ospid 5179
Receiver: inst 1 binc 27438 ospid 3739
.
.
.
Mon Jul 27 15:49:58 2009
IPC Send timeout detected.Sender: ospid 23757
Receiver: inst 1 binc 27438 ospid 3739
Mon Jul 27 15:49:59 2009
IPC Send timeout to 0.2 inc 12 for msg type 32 from opid 46
Mon Jul 27 15:49:59 2009
Communications reconfiguration: instance_number 1
Mon Jul 27 15:50:15 2009
IPC Send timeout to 0.2 inc 12 for msg type 32 from opid 43
Mon Jul 27 15:50:32 2009
Trace dumping is performing id=[cdmp_20090727154959]
Mon Jul 27 15:51:11 2009
Trace dumping is performing id=[cdmp_20090727154959]
Mon Jul 27 15:51:49 2009
Evicting instance 1 from cluster
Mon Jul 27 15:52:14 2009
Waiting for instances to leave:
1
Mon Jul 27 15:52:34 2009
Waiting for instances to leave:
1
Mon Jul 27 15:52:54 2009
Waiting for instances to leave:
1
.
.
.
Mon Jul 27 16:07:35 2009
Waiting for instances to leave:
1
Mon Jul 27 16:07:55 2009
Waiting for instances to leave:
1
Mon Jul 27 16:08:15 2009
Waiting for instances to leave:
1

开始是IPC Send timeout detected错误,随后是Waiting for instances to leave信息。根据错误信息,以及节点1的响应时间判断,问题多半就出在节点1上:多半是由于节点1太忙,导致响应时间太长,而节点2要与节点1直接进行交互,在交互时出现了超时错误。

为了尽快的解决这个问题,在节点2上执行svrctl尝试关闭节点1上的实例:

bash-3.00$ srvctl stop inst -d newtrade -i newtrade1

等待了一段时间,节点1上的实例1终于被关闭。而这时节点1上的响应终于恢复正常了。检查节点1上的alert文件,发现了大量的错误信息:

Mon Jul 27 14:42:45 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:43:21 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:43:21 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 14:51:35 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:52:17 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:52:35 2009
Process PZ99 died, see its trace file
Mon Jul 27 14:52:35 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:52:35 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:53:29 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:54:18 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:54:42 2009
ksvcreate: Process(q002) creation failed
Mon Jul 27 14:55:27 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:55:44 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 15:02:27 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:02:48 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:05:00 2009
GES: Potential blocker (pid=3815) on resource CI-0000000A-00000002;
 enqueue info in file /data/oracle/admin/newtrade/bdump/newtrade1_lmd0_3704.trc and DIAG trace file
Mon Jul 27 15:05:41 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:05:53 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:08:05 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:08:11 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:08:16 2009
Process PZ98 died, see its trace file
Mon Jul 27 15:08:40 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:08:41 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 15:10:36 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:10:55 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 15:11:50 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:12:26 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:14:05 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:14:59 2009
kkjcre1p: unable to spawn jobq slave process
Mon Jul 27 15:14:59 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_cjq0_3879.trc:

Mon Jul 27 15:15:33 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:16:12 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:24:04 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:24:44 2009
GES: Potential blocker (pid=3815) on resource CI-0000000A-00000002;
 enqueue info in file /data/oracle/admin/newtrade/bdump/newtrade1_lmd0_3704.trc and DIAG trace file
Mon Jul 27 15:25:10 2009
GES: Potential blocker (pid=3815) on resource CI-00000001-00000002;
 enqueue info in file /data/oracle/admin/newtrade/bdump/newtrade1_lmd0_3704.trc and DIAG trace file
Mon Jul 27 15:25:13 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:27:07 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:27:13 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:27:40 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 15:27:55 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:28:16 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:28:41 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:33:07 2009
Process startup failed, error stack:
Mon Jul 27 15:33:07 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_psp0_3671.trc:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn3
Mon Jul 27 15:33:08 2009
Process q001 died, see its trace file
Mon Jul 27 15:33:46 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 15:37:54 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 15:39:41 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:39:50 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 15:43:30 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:45:30 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 15:45:40 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:45:40 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:46:28 2009
IPC Send timeout detected. Receiver ospid 3739
Mon Jul 27 15:46:48 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lms1_3739.trc:
Mon Jul 27 15:47:32 2009
IPC Send timeout detected. Receiver ospid 3739
Mon Jul 27 15:47:39 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lms1_3739.trc:
Mon Jul 27 15:48:16 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:49:27 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 15:50:00 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:50:34 2009
Trace dumping is performing id=[cdmp_20090727154959]
Mon Jul 27 15:52:29 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:52:30 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 15:57:04 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:57:32 2009
kkjcre1p: unable to spawn jobq slave process
Mon Jul 27 15:57:32 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_cjq0_3879.trc:

Mon Jul 27 15:57:45 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:58:33 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:59:08 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 16:00:00 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 16:01:23 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 16:01:23 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 16:05:55 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 16:06:22 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_arc0_4240.trc:
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []
Mon Jul 27 16:06:24 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_arc1_4267.trc:
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []
Mon Jul 27 16:06:26 2009
Trace dumping is performing id=[cdmp_20090727160625]
Mon Jul 27 16:06:30 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_arc0_4240.trc:
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []
.
.
.
Mon Jul 27 16:06:30 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_arc1_4267.trc:
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []
Mon Jul 27 16:06:54 2009
USER: terminating instance due to error 481
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lms1_3739.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lms0_3714.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_ckpt_3815.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/udump/newtrade1_ora_16645.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/udump/newtrade1_ora_16926.trc:
ORA-00481: LMON
进程因错误而终止
Mon Jul 27 16:06:54 2009
System state dump is made for local instance
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/udump/newtrade1_ora_16834.trc:
ORA-00481: LMON
进程因错误而终止
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_dbw0_3779.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lmd0_3704.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_pmon_3633.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_cjq0_3879.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_psp0_3671.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_smon_3842.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lck0_4026.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lgwr_3805.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lmon_3682.trc:
ORA-29740: evicted by member 1, group incarnation 14
Mon Jul 27 16:06:56 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_smon_3842.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:56 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_mman_3757.trc:
ORA-00481: LMON process terminated with error
System State dumped to trace file /data/oracle/admin/newtrade/bdump/newtrade1_diag_3644.trc
Mon Jul 27 16:07:05 2009
Dump system state for local instance only
Mon Jul 27 16:07:05 2009
Trace dumping is performing id=[cdmp_20090727160705]
Mon Jul 27 16:07:08 2009
Shutting down instance (abort)
License high water mark = 104
Mon Jul 27 16:07:09 2009
Instance terminated by USER, pid = 26184
Mon Jul 27 16:07:13 2009
Instance terminated by USER, pid = 17258

不但有常规的错误,还有ORA-600错误,看来要解决的问题还真是不少。

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/4227/viewspace-618350/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/4227/viewspace-618350/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值