安装oracle RAC ,在图形界面选择网卡信息时报错,节点无法联通。
[INS-41112]Specified network interface doesnt maintain connectivity across cluster nodes
数据库版本:12.1.0.2
操作系统:HP UNIX 11.31
故障原因分析:
1、/etc/hosts 文件
2、验证ssh 等效性,除了public 我们需要验证心跳的等效性
3、网关或者防火墙(linux系统尤其明显)
经过排查数据库配置方面没有问题,hosts文件和等效性都是正确的。且在HP UNIX 生产中安装很多次oracle RAC了,数据库配置方面方面不会有什么问题。且咨询现场惠普工程师,惠普没有像linux那样关闭防火墙的命令,也没有防火墙。
现场网络工程师也说网络方面没有安全限制,且经过他们清除一些可能影响的残留配置信息,也不能解决。问题似乎是个僵局。都说自己方面没有问题。
进一步分析原因:
上面的lan16是public网段,lan17是心跳网段。心跳网段没有报错,且确定心跳线是直连的! lan16是public网段上面经过了万M交换机,问题似乎是网络方面存在安全限制或者防火墙。
运行预检测脚本,报错信息:
xcywa2:/u01/media/grid> ./runcluvfy.sh comp nodecon -i lan16 -n xcywa1,xcywa2 -verbose
Verifying node connectivity
Checking node connectivity...
Checking hosts config file...
Node Name Status
------------------------------------ ------------------------
xcywa2 passed
xcywa1 passed
Verification of the hosts config file successful
Interface information for node "xcywa2"
Name IP Address Subnet Gateway Def. Gateway HW Address MTU
------ --------------- --------------- --------------- --------------- ----------------- ------
lan16 10.241.8.11 10.241.8.0 10.241.8.11 10.241.8.254 8A:98:AD:92:B3:BD 1500
Interface information for node "xcywa1"
Name IP Address Subnet Gateway Def. Gateway HW Address MTU
------ --------------- --------------- --------------- --------------- ----------------- ------
lan16 10.241.8.10 10.241.8.0 10.241.8.10 10.241.8.254 F2:97:2C:48:01:C1 1500
Check: Node connectivity using interfaces on subnet "10.241.8.0"
Check: Node connectivity of subnet "10.241.8.0"
Source Destination Connected?
------------------------------ ------------------------------ ----------------
xcywa2[10.241.8.11] xcywa1[10.241.8.10] yes
Result: Node connectivity passed for subnet "10.241.8.0" with node(s) xcywa2,xcywa1
Check: TCP connectivity of subnet "10.241.8.0"
Source Destination Connected?
------------------------------ ------------------------------ ----------------
xcywa2 : 10.241.8.11 xcywa2 : 10.241.8.11 passed
xcywa1 : 10.241.8.10 xcywa2 : 10.241.8.11 failed
ERROR:
PRVG-11850 : The system call "connect" failed with error "238" while executing exectask on node "xcywa1"
Connection timed out
xcywa2 : 10.241.8.11 xcywa1 : 10.241.8.10 failed
ERROR:
PRVG-11850 : The system call "connect" failed with error "238" while executing exectask on node "xcywa2"
Connection timed out
xcywa1 : 10.241.8.10 xcywa1 : 10.241.8.10 failed
ERROR:
PRVG-11850 : The system call "connect" failed with error "239" while executing exectask on node "xcywa1"
Connection refused
Result: TCP connectivity check failed for subnet "10.241.8.0"
Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "10.241.8.0".
Subnet mask consistency check passed.
Result: Node connectivity check failed
Verification of node connectivity was unsuccessful on all the specified nodes.
xcywa2:/u01/media/grid> ./runcluvfy.sh comp nodecon -i lan17 -n xcywa1,xcywa2 -verbose
Verifying node connectivity
Checking node connectivity...
Checking hosts config file...
Node Name Status
------------------------------------ ------------------------
xcywa2 passed
xcywa1 passed
Verification of the hosts config file successful
Interface information for node "xcywa2"
Name IP Address Subnet Gateway Def. Gateway HW Address MTU
------ --------------- --------------- --------------- --------------- ----------------- ------
lan17 10.10.10.11 10.10.10.0 10.10.10.11 10.241.8.254 D6:2A:4D:6B:98:8C 1500
Interface information for node "xcywa1"
Name IP Address Subnet Gateway Def. Gateway HW Address MTU
------ --------------- --------------- --------------- --------------- ----------------- ------
lan17 10.10.10.10 10.10.10.0 10.10.10.10 10.241.8.254 F2:69:5F:F0:72:53 1500
Check: Node connectivity using interfaces on subnet "10.10.10.0"
Check: Node connectivity of subnet "10.10.10.0"
Source Destination Connected?
------------------------------ ------------------------------ ----------------
xcywa2[10.10.10.11] xcywa1[10.10.10.10] yes
Result: Node connectivity passed for subnet "10.10.10.0" with node(s) xcywa2,xcywa1
Check: TCP connectivity of subnet "10.10.10.0"
Source Destination Connected?
------------------------------ ------------------------------ ----------------
xcywa2 : 10.10.10.11 xcywa2 : 10.10.10.11 passed
xcywa1 : 10.10.10.10 xcywa2 : 10.10.10.11 failed
ERROR:
PRVG-11850 : The system call "connect" failed with error "238" while executing exectask on node "xcywa1"
Connection timed out
xcywa2 : 10.10.10.11 xcywa1 : 10.10.10.10 failed
ERROR:
PRVG-11850 : The system call "connect" failed with error "238" while executing exectask on node "xcywa2"
Connection timed out
xcywa1 : 10.10.10.10 xcywa1 : 10.10.10.10 failed
ERROR:
PRVG-11850 : The system call "connect" failed with error "239" while executing exectask on node "xcywa1"
Connection refused
Result: TCP connectivity check failed for subnet "10.10.10.0"
Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "10.10.10.0".
Subnet mask consistency check passed.
Result: Node connectivity check failed
Verification of node connectivity was unsuccessful on all the specified nodes.
我们看到 public和心跳检测都是报错的,但是图形界面安装中只是提示public网卡节点连通性存在问题。到底哪个是准备的,给我带来了困扰同时又更不好判断问题到底在网络方面还是系统方面。
根据错误信息查找mos,获得文档如下:
首先说明这篇只是获得了检测命令,对于问题的诊断没有提供信息。文档最后说根据单独检测网卡口命令返回的结果,再去查相关的文档。但是通过上面的命令没有返回相关 的PRVF-7617 报错信息。
|
|
In this Document
APPLIES TO: Oracle Database - Enterprise Edition - Version 11.2.0.1 and later The note lists problems, solutions or workarounds that's related to the following 11gR2 GI OUI error: [FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. PRVG-11850 : The system call "string" failed with error "number" while executing exectask on node "racnode"
CAUSE: Installer has detected that network interface eth1 does not maintain connectivity on all cluster nodes. runcluvfy.sh comp nodecon -i -n ,, -verbose
Refer to note 1335136.1 for details.
Refer to note 1429104.1 for details.
The cause is the installation files in staged area are corrupted, download again and install NOTE:1429104.1 - PRVF-6020 : Different MTU values used across network interfaces in subnet "10.10.10.0" |
我们不妨也看看 NOTE:1335136.1 - PRVF-7617: TCP connectivity check failed for subnet的内容。
都是一些bug信息,或者告诉你可以忽略一些不相关的网段检测信息。
|
|
In this Document
APPLIES TO: Oracle Database - Enterprise Edition - Version 11.2.0.1 and later PURPOSE The note is to list problems, solutions or workarounds that's related to the following error: PRVF-7617: TCP connectivity check failed for subnet Result: Node connectivity failed for subnet "10.10.16.0" [INS-41110] Specified network interface doesnt maintain connectivity across cluster nodes. DETAILS
CVU checks network interfaces that's marked "do not use", fixed in 11.2.0.3 GI PSU1
Happens on Linux if network adapter virbr0 exists, fixed in 11.2.0.3.
As Solaris does not support the socket option SO_RCVTIMEO, TCP server fails to start:
When more than one network interface are on the same subnet, it is possible that the wrong interface is used to verify TCP connectivity.
Refer to note 1286394.1 for details.
CVU trace: [7041@racnode1] [Thread-408] [ 2013-06-13 12:41:17.772 GMT+04:00 ] [StreamReader.run:65] OUTPUT>/usr/sbin/ping -i 192.168.169.2 192.168.169.2 3 /usr/sbin/ping: sendto Network is unreachable Manually run the "ping -i" command, receives same error To find out current "hostmodel": # ipadm show-prop -p hostmodel ip To change hostmodel: ipadm set-prop -p hostmodel=weak ipv4 The workaround is to set hostmodel to weak In addition, Solaris bug 16827053 is open to fix on OS level.
The bug is closed as duplicate of internal bug 17070860 which is fixed in 11.2.0.4
runcluvfy.sh comp nodecon -i -n ,, -verbose Sample output Check: Node connectivity for interface "eth1" Check: TCP connectivity of subnet "10.64.131.0" Result: Node connectivity check passed
If the error happened on network that's not related to Oracle Clusterware, it can be ignored, i.e. if happened on administrative network and not affecting anything, it can be ignored. |
到这里我们还是没有获取到明确的信息到底问题出在哪里?网络还是惠普的操作系统?请关注第二篇博客!
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29582917/viewspace-2122703/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/29582917/viewspace-2122703/