oracle版本:10.2.0.5 rac
操作系统:redhat 5.8
现象:vip与监听无法启动
由于客户审计需要,客户需要将默认网关去掉,改成自己添加路由,结果有两套rac在去掉默认网关后,发现vip无法启动,分析了日志后解决。
查看vip日志,发现有如下报错,貌似是与默认网关有关:
2016-01-17 12:45:13.230: [ RACG][3049252608] [1549][3049252608][ora.racdb1.vip]:checkIf: Default gateway is not defined (host=racdb1)
Interface eth0 checked failed (host=racdb1)
Invalid parameters, or failed to bring up VIP (host=racdb1)
此时我单独启动下vip资源,结果报错如下:
crs_start ora.racdb1.vip
Attempting to start 'ora.racdb1.vip' on member 'racdb1'
Start of 'ora.racdb1.vip' on member 'racdb1' failed.
Attempting to start 'ora.racdb1.vip' on member 'racdb1'
Start of 'ora.racdb1.vip' on member 'racdb1' failed.
CRS-1006: No more members to consider CRS-0215:
Could not start resource 'ora.racdb1.vip'.
我们通过MOS等资料,发现oracle 10grac会在启动vip时进行默认网关的检查,如果不存在默认网关,则会启动失败.
(10.2/11.1: CRS-0215: Could not start resource 'ora.<nodename>.vip' due to gateway issue (文档 ID 356535.1))
# When the script sets the VIP to an interface, it adds a route to default
# gateway for that interface. It makes sure the node will use the interface
# which VIP is set for going network traffic. ……
# - Variable FAIL_WHEN_DEFAULTGW_NO_FOUND to configure if checkIf() returns
# failure when default gateway is not found. If mii-tool works,
# default gateway is not needed in checkIf().
FAIL_WHEN_DEFAULTGW_NOT_FOUND=0
In this Document
Symptoms |
Changes |
Cause |
Solution |
This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review. |
APPLIES TO:
Oracle Database - Enterprise Edition - Version 10.1.0.4 to 11.1.0.7 [Release 10.1 to 11.1]Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 10.1.0.4 to 11.1.0.7
SYMPTOMS
The command : crs_stat -t output shows VIP is offline and trying to start it gives error :
CRS-0215: Could not start resource 'ora.dbtest2.vip'.
Example: crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....st2.gsd application ONLINE ONLINE dbtest2
ora....st2.ons application ONLINE ONLINE dbtest2
ora....st2.vip application ONLINE OFFLINE
# ./srvctl start nodeapps -n dbtest2
dbtest2:ora.dbtest2.vip:Interface eri0 checked failed (host=dbtest2)
dbtest2:ora.dbtest2.vip:Failed to start VIP 10.11.11.198 (host=dbtest2)
dbtest2:ora.dbtest2.vip:Interface eri0 checked failed (host=dbtest2)
dbtest2:ora.dbtest2.vip:Failed to start VIP 10.11.11.198 (host=dbtest2)
CRS-1006: No more members to consider
CRS-0215: Could not start resource 'ora.dbtest2.vip'.
CRS-0210: Could not find resource ora.dbtest2.LISTENER_DBTEST2.lsnr.
To know what is the nodeapps configuration in particular for VIP,
issue the command: srvctl config nodeapps -n <nodename> -a -g -s -l
Example:
/u01/crs/bin/srvctl config nodeapps -n dbtest2 -a -g -s -l
VIP exists.: /dbtest2-vip/10.11.11.198/255.255.255.0/eri0
GSD exists.
ONS daemon exists.
Listener does not exist.
Further debug it by uncommenting the environment variable _USR_ORA_DEBUG=1 in the script $ORA_CRS_HOME/bin/racgvip
OR simply as root user, issue the command : crsctl debug log res "ora.dbtest2.vip:5"
You may turn off debugging with command : crsctl debug log res "ora.dbtest2.vip:0"
Start the VIP using srvctl start nodeapps again. This will create a log for VIP starting problem for 10.2 and above version
in directory $ORA_CRS_HOME/log/<nodename>/racg/*vip.log
Example: last lines of the *.vip.log shows - |
---|
2005-02-09 20:38:06.711: [ RACG][1] [5602][1][ora.dbtest2.vip]: 203800 [ 5604 ] Checking interface existance 203800 [ 5604 ] Calling getifbyip 203800 [ 5604 ] getifbyip: started for 10.11.11.198 203800 [ 5604 ] getifbyip: returning IF eri0:1 203800 [ 5604 ] Completed getifbyip eri0:1 203801 [ 5604 ] Completed with in 2005-02-09 20:38:06.711: [ RACG][1] [5602][1][ora.dbtest2.vip]: itial interface test 203801 [ 5604 ] checkIf: start for if=eri0 203801 [ 5604 ] checkIf: -z defaultgw 203801 [ 5604 ] defaultgw: started 203801 [ 5604 ] defaultgw: completed with 10.11.11.1 203801 [ 5604 ] checkIf: -n defaultgw 203804 [ 5604 ] checkIf: 2005-02-09 20:38:06.711: [ RACG][1] [5602][1][ora.dbtest2.vip]: in while, before sleep 203805 [ 5604 ] checkIf: in while, before sleep 203806 [ 5604 ] checkIf: checked if=eri0 failed Interface eri0 checked failed (host=dbtest2) 203806 [ 5604 ] checkIf: end for if=eri0 203806 [ 5604 ] Performing CRS_STAT testing 203806 2005-02-09 20:38:06.711: [ RACG][1] [5602][1][ora.dbtest2.vip]: [ 5604 ] Completed CRS_STAT testing 203806 [ 5604 ] Completed second gateway test 203806 [ 5604 ] Interface tests Failed to start VIP 10.11.11.198 (host=dbtest2) |
From above we see the VIP:10.11.11.198 is correct however oracle function checkIf
is failing when trying to reach the default gateway IP:10.11.11.1
CHANGES
- Either the default gateway has got changed to someother IP -or-
- It is on a different network from the client (the network where the VIP is configured on) -or-
- It is not being used anymore.
CAUSE
By default, the server's default gateway is used as a ping target during the Oracle RAC 10g VIP status check action.
Upon a ping failure, Oracle will decide that the current interface where the VIP is running has failed, and will initiate
an interface / internode VIP failover.
In above case, we used just one node for CRS installation, so the VIP coudn't failover to other nodes and thus reported additional error like :
CRS-1006: No more members to consider
CRS-0215: Could not start resource 'ora.dbtest2.vip'.
SOLUTION
10.1.0.4 and above introduced a parameter FAIL_WHEN_DEFAULTGW_NOT_FOUND in the file
$ORA_CRS_HOME/bin/racvip to address this problem.
The following steps will fix the VIP starting problem for above mentioned scenario.
1- stop nodeapps
2- As root,
vi the script $ORA_CRS_HOME/bin/racgvip and change the value of
variable FAIL_WHEN_DEFAULTGW_NOT_FOUND=0 .
3- start nodeapps and you should see the resources ONLINE
You may proceed with netca and dbca to create a RAC database after this.