Instances Unable To Start If MTU Size Is Different for Cluster_interconnect (文档 ID 300388.1)

最新推荐文章于 2023-01-20 14:20:07 发布

Franklin_H

最新推荐文章于 2023-01-20 14:20:07 发布

阅读量381

点赞数

分类专栏： share 文章标签： Database 官方文档

share 专栏收录该内容

18 篇文章 0 订阅

订阅专栏

对于Cluster_interconnect ，如果节点之间心跳网卡的MTU设置不同，可能会造成实例无法启动。

APPLIES TO:

应用于：

Oracle Server - Enterprise Edition - Version 9.0.1.0 to 11.2.0.3 [Release 9.0.1 to 11.2]
Information in this document applies to any platform.

***Checked for relevance on 07-Jan-2010***

Oracle服务器 - 企业版 - 版本9.0.1.0到11.2.0.3 [版本9.0.1到11.2]

本文档中的信息适用于任何平台。

SYMPTOMS

问题描述：

If the MTU size on the network cards used for the interconnect differs on the cluster member nodes, RAC instance(s) will not start

在各个集群节点上，如果用于互连的网卡（心跳网卡）MTU参数大小不同，则RAC实例将不会启动。

CHANGES

变动部分：

Network configuration

网络参数配置

CAUSE

具体原因（主要通过分析网络参数配置和查看日志）：

The MTU size is set on the private network interface, for example, two interfaces of two cluster members:

MTU大小在专用网络接口（用于节点间互连的心跳网络接口）上，比如下面列出了集群上两个节点的网络配置：

node 1
eth0 Link encap:Ethernet HWaddr 00:0E:0C:08:4B:D5
inet addr: xxx.x.x.x Bcast:xxx.x.x.x Mask:255.255.255.0
inet6 addr: fe80::20e:cff:fe08:4bd5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1

node 2
eth0 Link encap:Ethernet HWaddr 00:0E:0C:08:03:59
inet addr: xxx.x.x.x Bcast:xxx.x.x.x Mask:255.255.255.0
inet6 addr: fe80::20e:cff:fe08:359/64 Scope:Link
UP BROADCAST RUNNING MULTICAST *MTU:1500* Metric:1

If you have different MTU sizes configured a startup will hang with following error in the alert-log:

一旦你设置了不同的MTU值，startup数据库后将会挂起，并报出以下日志：

Tue Mar 1 01:50:35 2005
lmon registered with NM - instance id 2 (internal mem no 1)
Tue Mar 1 01:50:36 2005
Reconfiguration started (old inc 0, new inc 2)
List of nodes:
0 1
Global Resource Directory frozen
Update rdomain variables
Communication channels reestablished
* domain 0 valid = 0 according to instance 0
Tue Mar 1 01:55:44 2005
IPC Send timeout to 0.0 inc 9 for msg type 53 from opid 5
Tue Mar 1 01:59:25 2005
Trace dumping is performing id=[cdmp_20050301095925]
Tue Mar 1 01:59:31 2005
Reconfiguration started (old inc 2, new inc 3)
List of nodes:

Typically you see timeouts in the alert-file and in the traces of background processes (LMD and LMON).

从警报日志与后台程序（LMD和LMON）的跟踪记录中，很容易看出超时。

SOLUTION

解决方案：

Identifiy the interface being used by Oracle RAC using oradebug ipc - Metalink note 181489.1

用Oadebug 工具的IPC 命令打印出的IPC信息，定位Rac中的导致数据库hang住的网络接口。

Check the network configuration, for example with ifconfig, for example: /sbin/ifconfig eth0

用相关命令检查心跳网卡的网络配置

Ping the ip-address of the network card with a packetsize that should fit for all interfaces. Use -M switch to avoid packet splitting, for example:
ping <nodename> -s <biggest-size-that fits> -M do

使用适合所有心跳网络接口的数据包大小，来ping 故障网卡的IP地址，看哪个值能够ping通。使用-M do 用以避免IP分片。

例如：

ping <nodename> -s <biggest-size-that fits> -M do