VIPs Often Go Offline Unexpectedly and Relocate to Another Node(文档 ID 1297867.1)_because the resource offline unexpectedly,on its o-CSDN博客

本文链接：https://blog.csdn.net/feixiangtianshi/article/details/78109476

In this Document

Symptoms

Cause

Solution

APPLIES TO:

Oracle Database - Enterprise Edition - Version 10.2.0.5 and later
Information in this document applies to any platform.

SYMPTOMS

VIPs often go offline unexpectedly, with the following message in crsd.log:

  2011-02-17 15:11:16.437: [ CRSAPP][11321]32CheckResource error for ora.node02.vip error code = 1 
 
 2011-02-17 15:11:16.441: [ CRSRES][11321]32In stateChanged, ora.node02.vip target is ONLINE 
 
 2011-02-17 15:11:16.441: [ CRSRES][11321]32 
 ora.node02.vip on node02 went OFFLINE unexpectedly 

VIP tracing is set by using the following commands:

  #crsctl debug log res "ora.node01.vip:5" 
 
 #crsctl debug log res "ora.node02.vip:5" 

Following error messages (highlighted in bold letters) can be seen in the generated VIP trace "CRS_HOME/log/node02:

  2011-02-18 15:32:39.481: [ RACG][1] [4587556][1][ora.node02.vip]: Fri Feb 18 15:32:37 GMT+08:00 2011 [ 8257768 ] About to execute command: /usr/sbin/ping -S 192.168.220.36 -c 1 -w 1 192.168.220.33 
 
 Fri Feb 18 15:32:39 GMT+08:00 2011 [ 8257768 ]  
 IsIfAlive: RX packets checked if=en1 failed 
 
 2011-02-18 15:32:39.481: [ RACG][1] [4587556][1][ora.node02.vip]: Fri Feb 18 15:32:39 GMT+08:00 2011 [ 8257768 ] Interface en1 checked failed (host=node02) 
 
 Fri Feb 18 15:32:39 GMT+08:00 2011 [ 8257768 ] IsIfAlive: end for if=en1 
 
 Fri Feb 18 15:32:39 GMT+08:00 2011 [ 8257768 ] checkIf: end for if=en1

You can reset the VIP tracing to the default level by using the following commands:

#crsctl debug log res "ora.node01.vip:0"
#crsctl debug log res "ora.node02.vip:0"

CAUSE

The issue can be due to network performance when pinging the gateway using the public IP.

See "man ping" on AIX:

  -S hostname/IP addr 
 
 Uses the IP address as the source address in outgoing ping packets. 
 
 -c Count 
 
 Specifies the number of echo requests, as indicated by the Count 
 
 variable, to be sent (and received). 
 
 -w timeout 
 
 This option works only with the -c option. It causes ping to wait 
 
 for a maximum of 'timeout' seconds for a reply (after sending the 
 
 last packet).

So the following command will check, if 1 packet sent from 192.168.220.36 to 192.168.220.33 will receive a reply within 1s.

  ping -S 192.168.220.36 -c 1 -w 1 192.168.220.33 
 
 ==>192.168.220.36 is the public IP, 192.168.220.33 is the gateway. 

If the problem is with the network, the above "ping" command would take longer than 1s, and this leads to VIPs going offline unexpectedly and relocating to another node.

SOLUTION

To resolve the issue, please contact your network administrator to tune your network and ensure that the reply of the ping command is within 1s.

If you can't improve the network performance, please use the following temporary workaround (which is not recommended):

  1. Stop all node applications. 
 
 % srvctl stop nodeapps -n <hostname> 
 
 2. Backup then Modify the racgvip script . 
 
 Change: 
 
 # timeout of ping in number of loops (1 sec) 
 
 PING_TIMEOUT=" -c 1 -w 1" 
 
 To: 
 
 # timeout of ping in number of loops (3 sec) 
 
 PING_TIMEOUT=" -c 1 -w 3" 
 
 3. Start the node applications and other necessary resources. 
 
 % srvctl start nodeapps -n <hostname>