RAC监听自动断掉

环境:
OS:SUSE 10 SP2
kernel:2.6.16.60-0.21-bigsmp
node1:orcl1 ip:192.168.10.23
node2:orcl2 ip:192.168.10.24
db:oracle 10g RAC
ORACLE_SID=ORCL
hostname:rac1 & rac2
SERVER:IBM X3850M2
SD:IBM  DS4700

故障描述:
         从5月25号以来数据库每天监听都会自动断两次。有时候是节点一,有时候是节点二。每次重启过后过一段时间就会自动断掉。今天早上9点左右来检查数据库的状态一切正常,下午两点过再次检查的时候,节点二的监听断掉(如下)
oracle@rac1:~> crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.orcl.db    application    ONLINE    ONLINE    rac1       
ora....l1.inst application    ONLINE    ONLINE    rac1       
ora....l2.inst application    ONLINE    ONLINE    rac2       
ora....SM1.asm application    ONLINE    ONLINE    rac1       
ora....C1.lsnr application    ONLINE    ONLINE    rac1       
ora.rac1.gsd   application    ONLINE    ONLINE    rac1       
ora.rac1.ons   application    ONLINE    ONLINE    rac1       
ora.rac1.vip   application    ONLINE    ONLINE    rac1       
ora....SM2.asm application    ONLINE    ONLINE    rac2
       
ora....C2.lsnr application    ONLINE    OFFLINE              
ora.rac2.gsd   application    ONLINE    ONLINE    rac2       
ora.rac2.ons   application    ONLINE    ONLINE    rac2       
ora.rac2.vip   application    ONLINE    ONLINE    rac1

oracle@rac2:~> crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.orcl.db    application    ONLINE    ONLINE    rac1       
ora....l1.inst application    ONLINE    ONLINE    rac1       
ora....l2.inst application    ONLINE    ONLINE    rac2       
ora....SM1.asm application    ONLINE    ONLINE    rac1       
ora....C1.lsnr application    ONLINE    ONLINE    rac1       
ora.rac1.gsd   application    ONLINE    ONLINE    rac1       
ora.rac1.ons   application    ONLINE    ONLINE    rac1       
ora.rac1.vip   application    ONLINE    ONLINE    rac1       
ora....SM2.asm application    ONLINE    ONLINE    rac2
       
ora....C2.lsnr application    ONLINE    OFFLINE              
ora.rac2.gsd   application    ONLINE    ONLINE    rac2       
ora.rac2.ons   application    ONLINE    ONLINE    rac2       
ora.rac2.vip   application    ONLINE    ONLINE    rac1

检查alert日志:
在节点一及节点二中并没有发现有六月一号的日志存在。都只到了5月31号就没有日志内容。

oracle@rac2:~/product/10gR2/crs/log/rac2> date
Tue Jun  1 15:07:43 CST 2010

oracle@rac1:~/product/10gR2/crs/log/rac1>  tail -50 alertrac1.log
2010-04-27 14:51:21.772
[evmd(18987)]CRS-1401:EVMD started on node rac1.
2010-04-27 14:51:21.892
[crsd(19064)]CRS-1012:The OCR service started on node rac1.
2010-04-27 14:51:25.843
[crsd(19064)]CRS-1201:CRSD started on node rac1.
2010-04-27 14:51:41.560
[cssd(19614)]CRS-1605:CSSD voting file is online: /dev/raw/raw5. Details in /opt/oracle/product/10gR2/crs/log/rac1/cssd/ocssd.log.
2010-05-04 09:43:20.631
[cssd(19614)]CRS-1612:node rac2 (2) at 50% heartbeat fatal, eviction in 29.094 seconds
2010-05-04 09:43:35.629
[cssd(19614)]CRS-1611:node rac2 (2) at 75% heartbeat fatal, eviction in 14.104 seconds
2010-05-28 16:39:21.378
[cssd(6886)]CRS-1605:CSSD voting file is online: /dev/raw/raw3. Details in /opt/oracle/product/10gR2/crs/log/rac1/cssd/ocssd.log.
2010-05-28 16:39:21.409
[cssd(6886)]CRS-1605:CSSD voting file is online: /dev/raw/raw4. Details in /opt/oracle/product/10gR2/crs/log/rac1/cssd/ocssd.log.
2010-05-28 16:39:21.409
[cssd(6886)]CRS-1605:CSSD voting file is online: /dev/raw/raw5. Details in /opt/oracle/product/10gR2/crs/log/rac1/cssd/ocssd.log.
2010-05-28 16:39:21.409
[cssd(6886)]CRS-1604:CSSD voting file is offline: /dev/raw/raw5. Details in /opt/oracle/product/10gR2/crs/log/rac1/cssd/ocssd.log.
2010-05-28 16:39:41.914
[cssd(6886)]CRS-1605:CSSD voting file is online: /dev/raw/raw5. Details in /opt/oracle/product/10gR2/crs/log/rac1/cssd/ocssd.log.
[cssd(6886)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2010-05-28 16:39:42.983
[crsd(6390)]CRS-1012:The OCR service started on node rac1.
2010-05-28 16:39:42.986
[evmd(6309)]CRS-1401:EVMD started on node rac1.
2010-05-28 16:39:47.065
[crsd(6390)]CRS-1201:CRSD started on node rac1.
2010-05-31 20:27:41.384
[cssd(7144)]CRS-1605:CSSD voting file is online: /dev/raw/raw3. Details in /opt/oracle/product/10gR2/crs/log/rac1/cssd/ocssd.log.
2010-05-31 20:27:41.386
[cssd(7144)]CRS-1605:CSSD voting file is online: /dev/raw/raw4. Details in /opt/oracle/product/10gR2/crs/log/rac1/cssd/ocssd.log.
2010-05-31 20:27:41.391
[cssd(7144)]CRS-1605:CSSD voting file is online: /dev/raw/raw5. Details in /opt/oracle/product/10gR2/crs/log/rac1/cssd/ocssd.log.
[cssd(7144)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2010-05-31 20:27:51.189
[crsd(6463)]CRS-1012:The OCR service started on node rac1.
2010-05-31 20:27:51.192
[evmd(6476)]CRS-1401:EVMD started on node rac1.
2010-05-31 20:27:54.869
[crsd(6463)]CRS-1201:CRSD started on node rac1.

oracle@rac2:~/product/10gR2/crs/log/rac2> tail -50 alertrac2.log
2010-05-28 16:39:14.626
[cssd(6935)]CRS-1604:CSSD voting file is offline: /dev/raw/raw5. Details in /opt/oracle/product/10gR2/crs/log/rac2/cssd/ocssd.log.
[cssd(6935)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac2 .
2010-05-28 16:39:22.294
[crsd(6361)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870336 to 169870336. Details in /opt/oracle/product/10gR2/crs/log/rac2/crsd/crsd.log.
2010-05-28 16:39:22.294
[crsd(6361)]CRS-1012:The OCR service started on node rac2.
2010-05-28 16:39:22.302
[evmd(6368)]CRS-1401:EVMD started on node rac2.
2010-05-28 16:39:24.563
[crsd(6361)]CRS-1201:CRSD started on node rac2.
2010-05-28 16:39:25.087
[crsd(6361)]CRS-1205:Auto-start failed for the CRS resource . Details in rac2.
2010-05-28 16:39:25.235
[crsd(6361)]CRS-1205:Auto-start failed for the CRS resource . Details in rac2.
2010-05-28 16:39:32.562
[crsd(6361)]CRS-1205:Auto-start failed for the CRS resource . Details in rac2.
2010-05-28 16:39:35.459
[crsd(6361)]CRS-1205:Auto-start failed for the CRS resource . Details in rac2.
2010-05-28 16:39:35.625
[cssd(6935)]CRS-1605:CSSD voting file is online: /dev/raw/raw5. Details in /opt/oracle/product/10gR2/crs/log/rac2/cssd/ocssd.log.
2010-05-28 16:39:38.436
[crsd(6361)]CRS-1205:Auto-start failed for the CRS resource . Details in rac2.
2010-05-28 16:39:38.464
[crsd(6361)]CRS-1205:Auto-start failed for the CRS resource . Details in rac2.
[cssd(6935)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2010-05-31 20:27:25.441
[cssd(7264)]CRS-1605:CSSD voting file is online: /dev/raw/raw3. Details in /opt/oracle/product/10gR2/crs/log/rac2/cssd/ocssd.log.
2010-05-31 20:27:25.546
[cssd(7264)]CRS-1605:CSSD voting file is online: /dev/raw/raw4. Details in /opt/oracle/product/10gR2/crs/log/rac2/cssd/ocssd.log.
2010-05-31 20:27:25.566
[cssd(7264)]CRS-1605:CSSD voting file is online: /dev/raw/raw5. Details in /opt/oracle/product/10gR2/crs/log/rac2/cssd/ocssd.log.
[cssd(7264)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac2 .
2010-05-31 20:27:40.260
[crsd(6611)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870336 to 169870336. Details in /opt/oracle/product/10gR2/crs/log/rac2/crsd/crsd.log.
2010-05-31 20:27:40.261
[crsd(6611)]CRS-1012:The OCR service started on node rac2.
2010-05-31 20:27:40.268
[evmd(6474)]CRS-1401:EVMD started on node rac2.
2010-05-31 20:27:42.244
[crsd(6611)]CRS-1201:CRSD started on node rac2.
2010-05-31 20:27:44.114
[crsd(6611)]CRS-1205:Auto-start failed for the CRS resource . Details in rac2.
2010-05-31 20:27:44.277

查看crs日志
oracle@rac1:~/product/10gR2/crs/log/rac1/crsd> tail  crsd.log
2010-05-31 20:28:19.669: [  CRSRES][2682809248]0startRunnable: setting CLI values
2010-05-31 20:28:19.718: [  CRSRES][2691201952]0startRunnable: setting CLI values
2010-05-31 20:28:35.457: [  CRSRES][2707987360]0startRunnable: setting CLI values
2010-05-31 20:28:38.307: [  CRSRES][3065169584]0 startup = 1
2010-05-31 20:28:38.512: [  CRSRES][3065169584]0 startup = 1
2010-05-31 20:28:38.668: [  CRSRES][2699594656]0CRS-1002: Resource 'ora.rac1.LISTENER_RAC1.lsnr' is already running on member 'rac1'

2010-05-31 20:28:38.746: [  CRSRES][2707987360]0startRunnable: setting CLI values
2010-05-31 20:28:38.885: [  CRSRES][2707987360]0Attempting to start `ora.rac1.ASM1.asm` on member `rac1`
2010-05-31 20:28:46.292: [  CRSRES][2707987360]0Start of `ora.rac1.ASM1.asm` on member `rac1` succeeded.
2010-05-31 20:28:46.488: [  CRSRES][2707987360]0startRunnable: setting CLI values
2010-05-31 20:28:46.627: [  CRSRES][2707987360]0Attempting to start `ora.orcl.orcl1.inst` on member `rac1`
2010-05-31 20:29:22.143: [  CRSRES][2707987360]0Start of `ora.orcl.orcl1.inst` on member `rac1` succeeded.
2010-05-31 20:29:27.132: [  OCRUTL][2919832480]u_freem: mem passed is null
2010-06-01 11:10:37.736: [  CRSRES][2699594656]0startRunnable: setting CLI values
oracle@rac2:~/product/10gR2/crs/log/rac2/crsd> tail  crsd.log
2010-05-31 20:28:35.745: [  CRSRES][2674412448]0Attempting to start `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1`
2010-05-31 20:28:37.980: [  CRSRES][2674412448]0Start of `ora.rac1.LISTENER_RAC1.lsnr` on member `rac1` succeeded.
2010-06-01 11:10:36.615: [  CRSAPP][2656766880]0CheckResource error for ora.rac2.vip error code = 1
2010-06-01 11:10:36.616: [  CRSRES][2656766880]0In stateChanged, ora.rac2.vip target is ONLINE
2010-06-01 11:10:36.617: [  CRSRES][2656766880]0ora.rac2.vip on rac2 went OFFLINE unexpectedly
2010-06-01 11:10:36.617: [  CRSRES][2656766880]0StopResource: setting CLI values
2010-06-01 11:10:36.643: [  CRSRES][2656766880]0Attempting to stop `ora.rac2.vip` on member `rac2`
2010-06-01 11:10:36.915: [  CRSRES][2656766880]0Stop of `ora.rac2.vip` on member `rac2` succeeded.
2010-06-01 11:10:36.915: [  CRSRES][2656766880]0ora.rac2.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0
2010-06-01 11:10:36.943: [  CRSRES][2656766880]0ora.rac2.vip failed on rac2 relocating.
2010-06-01 11:10:37.052: [  CRSRES][2656766880]0StopResource: setting CLI values
2010-06-01 11:10:37.067: [  CRSRES][2656766880]0Attempting to stop `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2`
2010-06-01 11:10:37.360: [  CRSRES][2656766880]0Stop of `ora.rac2.LISTENER_RAC2.lsnr` on member `rac2` succeeded.
2010-06-01 11:10:37.507: [  CRSRES][2656766880]0Attempting to start `ora.rac2.vip` on member `rac1`
2010-06-01 11:10:47.477: [  CRSRES][2656766880]0Start of `ora.rac2.vip` on member `rac1` succeeded.
2010-06-01 11:10:47.760: [  CRSRES][2699590560]0CRS-1002: Resource 'ora.rac2.vip' is already running on member 'rac1'

查看racg里面日志
ora.rac1.vip.log里面并没有6月1号的内容。ora.rac2.vip.log里面有ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1) ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)失败。

oracle@rac1:~/product/10gR2/crs/log/rac1/racg> tail -50 ora.rac1.vip.log

2010-05-30 06:23:29.834: [    RACG][3066017456] [7155][3066017456][ora.rac1.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/opt/oracle/product/10gR2/crs

2010-05-30 06:23:29.834: [    RACG][3066017456] [7155][3066017456][ora.rac1.vip]: clsrcexecut: cmd = /opt/oracle/product/10gR2/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /opt/oracle/product/10gR2/crs/bin/racgvip check rac1

2010-05-30 06:23:29.834: [    RACG][3066017456] [7155][3066017456][ora.rac1.vip]: clsrcexecut: rc = 1, time = 6.220s

2010-05-30 06:23:29.834: [    RACG][3066017456] [7155][3066017456][ora.rac1.vip]: end for resource = ora.rac1.vip, action = check, status = 1, time = 6.250s

2010-05-31 09:22:06.114: [    RACG][3065804464] [3333][3065804464][ora.rac1.vip]: IP:192.168.10.25 is already up in the network (host=rac1)
IP:192.168.10.25 is already up in the network (host=rac1)
ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)

2010-05-31 09:22:06.114: [    RACG][3065804464] [3333][3065804464][ora.rac1.vip]: Interface eth0 checked failed (host=rac1)
Invalid parameters, or failed to bring up VIP (host=rac1)

2010-05-31 09:22:06.114: [    RACG][3065804464] [3333][3065804464][ora.rac1.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/opt/oracle/product/10gR2/crs

2010-05-31 09:22:06.114: [    RACG][3065804464] [3333][3065804464][ora.rac1.vip]: clsrcexecut: cmd = /opt/oracle/product/10gR2/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /opt/oracle/product/10gR2/crs/bin/racgvip start rac1

2010-05-31 09:22:06.114: [    RACG][3065804464] [3333][3065804464][ora.rac1.vip]: clsrcexecut: rc = 1, time = 13.020s

2010-05-31 09:22:09.326: [    RACG][3065804464] [3333][3065804464][ora.rac1.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/opt/oracle/product/10gR2/crs

2010-05-31 09:22:09.326: [    RACG][3065804464] [3333][3065804464][ora.rac1.vip]: clsrcexecut: cmd = /opt/oracle/product/10gR2/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /opt/oracle/product/10gR2/crs/bin/racgvip check rac1

2010-05-31 09:22:09.326: [    RACG][3065804464] [3333][3065804464][ora.rac1.vip]: clsrcexecut: rc = 1, time = 3.210s

2010-05-31 09:22:09.326: [    RACG][3065804464] [3333][3065804464][ora.rac1.vip]: end for resource = ora.rac1.vip, action = start, status = 1, time = 16.240s

2010-05-31 09:24:06.451: [    RACG][3065546416] [4295][3065546416][ora.rac1.vip]: ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)

2010-05-31 09:24:08.552: [    RACG][3065816752] [4410][3065816752][ora.rac1.vip]: clsrcstartorp: Error with malloc
2010-05-31 20:28:35.167: [    RACG][3065271984] [7652][3065271984][ora.rac1.vip]: ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)


oracle@rac1:~/product/10gR2/crs/log/rac1/racg> tail -30 ora.rac2.vip.log

2010-05-31 15:40:11.835: [    RACG][3065345712] [9437][3065345712][ora.rac2.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/opt/oracle/product/10gR2/crs

2010-05-31 15:40:11.835: [    RACG][3065345712] [9437][3065345712][ora.rac2.vip]: clsrcexecut: cmd = /opt/oracle/product/10gR2/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /opt/oracle/product/10gR2/crs/bin/racgvip check rac2

2010-05-31 15:40:11.835: [    RACG][3065345712] [9437][3065345712][ora.rac2.vip]: clsrcexecut: rc = 1, time = 3.200s

2010-05-31 15:40:11.835: [    RACG][3065345712] [9437][3065345712][ora.rac2.vip]: end for resource = ora.rac2.vip, action = start, status = 1, time = 12.450s

2010-05-31 15:40:24.489: [    RACG][3066042032] [9611][3066042032][ora.rac2.vip]: ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)

2010-05-31 15:42:37.084: [    RACG][3065276080] [10567][3065276080][ora.rac2.vip]: ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)
Interface eth0 checked failed (host=rac1)
Invalid parameters, or failed to bring up VIP (host=rac1)

2010-05-31 15:42:37.085: [    RACG][3065276080] [10567][3065276080][ora.rac2.vip]: clsrcexecut: env ORACLE_CONFIG_HOME=/opt/oracle/product/10gR2/crs

2010-05-31 15:42:37.085: [    RACG][3065276080] [10567][3065276080][ora.rac2.vip]: clsrcexecut: cmd = /opt/oracle/product/10gR2/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /opt/oracle/product/10gR2/crs/bin/racgvip check rac2

2010-05-31 15:42:37.085: [    RACG][3065276080] [10567][3065276080][ora.rac2.vip]: clsrcexecut: rc = 1, time = 6.200s

2010-05-31 15:42:37.085: [    RACG][3065276080] [10567][3065276080][ora.rac2.vip]: end for resource = ora.rac2.vip, action = check, status = 1, time = 6.230s

2010-05-31 15:42:37.360: [    RACG][3065542320] [10656][3065542320][ora.rac2.vip]: bounce eth0 (host=rac1)

2010-06-01 11:10:47.790: [    RACG][3065239216] [31482][3065239216][ora.rac2.vip]: ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)
ping to 192.168.10.254 via eth0 failed, rc = 1 (host=rac1)

通过上面日志显示节点不能ping通默认网关,分别查看两台服务器网关,节点一与节点二的默认网关都为192.168.10.254。但是不能ping通,查看其他服务器也不能ping通网关,后来确认是移动在防火墙上面做了限制不能ping。关闭节点,并修改网关为192.168.10.11,在启动服务。
oracle@rac1:~/product/10gR2/crs/log/rac1> crs_stat -t
Name           Type           Target    State     Host       
------------------------------------------------------------
ora.orcl.db    application    ONLINE    ONLINE    rac1       
ora....l1.inst application    ONLINE    ONLINE    rac1       
ora....l2.inst application    ONLINE    ONLINE    rac2       
ora....SM1.asm application    ONLINE    ONLINE    rac1       
ora....C1.lsnr application    ONLINE    ONLINE    rac1       
ora.rac1.gsd   application    ONLINE    ONLINE    rac1       
ora.rac1.ons   application    ONLINE    ONLINE    rac1       
ora.rac1.vip   application    ONLINE    ONLINE    rac1       
ora....SM2.asm application    ONLINE    ONLINE    rac2       
ora....C2.lsnr application    ONLINE    ONLINE    rac2       
ora.rac2.gsd   application    ONLINE    ONLINE    rac2       
ora.rac2.ons   application    ONLINE    ONLINE    rac2       
ora.rac2.vip   application    ONLINE    ONLINE    rac2  

能够正常启动。但还需要通过检查确认。继续观察。。。。
参考文档:http://software.digitalchina.com/fw/jsyk-show.asp?column_id=316&column_cat_id=41

疑问:1。既然不能ping通网关,那为什么两节点都能够正常启动起来。因为在做吉林RAC的时候,设置了网关但是不能ping通,节点不能启动。
      2。为什么每次都只是一个节点的监听断掉,节点都不确定。
      3。为什么在alertrac1.log,alertrac2.log中并没有存在记录。

后续观察情况:未出现前几天出现监听自动断掉情况(6月2日)

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22779291/viewspace-664180/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/22779291/viewspace-664180/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值