How to Modify Private Network Information in Oracle Clusterware (文档 ID 283684.1)

转到底部转到底部

2013-11-8HOWTO
为此文档评级通过电子邮件发送此文档的链接在新窗口中打开文档可打印页

In this Document

 Goal
 Solution
 Case I. Changing private hostname
 Case II. Changing private IP only without changing network interface, subnet and netmask
 Case III. Changing private network MTU only
 Case IV. Changing private network interface name, subnet or netmask
 A. For pre-11gR2 Oracle Clusterware
 B. For 11gR2 and higher
 Something to note for 11gR2
 Notes for Windows Systems
 Ramifications of Changing Interface Names Using oifcfg
 Oifcfg Usage
 References

APPLIES TO:

Oracle Database - Enterprise Edition - Version 10.1.0.2 to 11.2.0.3 [Release 10.1 to 11.2]
Information in this document applies to any platform.

GOAL

The purpose of this note is to describe how to change or update the private network (cluster_interconnect) information in Oracle Clusterware. 

It may be necessary to change or update interface names, or subnet associated with an interface if there is a network change affecting the servers, or if the original information that was input during the installation was incorrect.   It may also be the case that for some reason, the Oracle Interface Configuration Assistant  ('oifcfg')  did not succeed during the installation.

Please refer to Note 276434.1 for modifying public network and VIP associated information
and refer to Note 1386709.1 for basics of IPv4 subnet and Oracle Clusterware.

Note: for Oracle Engineered system (Exadata) and Oracle Database Applicance (ODA), please do not make such changes following this note.

 

SOLUTION

Network information(interface, subnet and role of each interface) for Oracle Clusterware is managed by 'oifcfg', but actual IP address for each interfaces are not, 'oifcfg' can not update IP address information. 'oifcfg getif' can be used to find out currently configured interfaces in OCR:

% $CRS_HOME/bin/oifcfg getif 
eth0 10.2.156.0 global public 
eth1 192.168.0.0 global cluster_interconnect

On Unix/Linux systems, the interface names are generally assigned by the OS, and standard names vary by platform. For Windows systems, see additional notes below. Above example shows currently interface eth0 is used for public with subnet 10.2.156.0, and eth1 for cluster_interconnect/private with subnet 192.168.0.0.

The 'public' network is for database client communication (VIP also uses the same network though it's stored in OCR as separate entry), whereas the 'cluster_interconnect' network is for RDBMS/ASM cache fusion. Starting with 11gR2, cluster_interconnect is also used for clusterware heartbeats - this is significant change compare to prior release as pre-11gR2 uses the private nodename that were specified at installation time for clusterware heartbeats.

If the subnet or interface name for 'cluster_interconnect' interface is incorrect, it needs to be changed as crs/grid user.

Case I. Changing private hostname

In pre-11.2 Oracle Clusterware, private hostname is recorded in OCR, it can not be updated. Generally private hostname is not required to change. Its associated IP can be changed. The only way to change private hostname is by deleting/adding nodes, or reinstall Oracle Clusterware.

In 11.2 Grid Infrastructure, private hostname is no longer recorded in OCR and there is no dependancy on the private hostname. It can be changed freely in /etc/hosts.

Case II. Changing private IP only without changing network interface, subnet and netmask

For example, private IP is changed from 192.168.1.10 to 192.168.1.21, network interface name and subnet remain the same,.

Simply shutdown Oracle Clusterware stack on the node where change required, make IP modification at OS layer (eg: /etc/hosts, OS network config etc) for private network, restart Oracle Clusterware stack will complete the task.

Case III. Changing private network MTU only

For example, private network MTU is changed from 1500 to 9000 (enable jumbo frame), network interface name and subnet remain the same.

1. Shutdown Oracle Clusterware stack on all nodes
2. Make the required network change of MTU size at OS network layer, ensure private network is available with the desired MTU size, ping with the desired MTU size works on all cluster nodes
3. Restart Oracle Clusterware stack on all nodes

Case IV. Changing private network interface name, subnet or netmask

Note: When the netmask is changed but the subnet ID doesn't change, for example:
The netmask is changed from 255.255.0.0 to 255.255.255.0 with private IP like 192.168.0.x, the subnet ID remains the same as 192.168.0.0, the network interface name is not changed.
Please follow the same procedure as outlined in Case II.

When the netmask is changed, the associated subnet ID is often changed. Oracle only store network interface name and subnet ID in OCR, not the netmask. Oifcfg command can be used for such change, oifcfg commands only require to run on 1 of the cluster node, not all.

A. For pre-11gR2 Oracle Clusterware

1. Use oifcfg to add the new private network information, delete the old private network information:

% $ORA_CRS_HOME/bin/oifcfg/oifcfg setif -global <if_name>/<subnet>:cluster_interconnect
% $ORA_CRS_HOME/bin/oifcfg/oifcfg delif -global <if_name>[/<subnet>]]

For example:
% $ORA_CRS_HOME/bin/oifcfg setif -global  eth3/ 192.168.2.0:cluster_interconnect
% $ORA_CRS_HOME/bin/oifcfg delif -global eth1/192.168.1.0

To verify the change
% $ORA_CRS_HOME/bin/oifcfg getif   
eth0 10.2.166.0 global public 
eth3 192.168.2.0 global cluster_interconnect

2. Shutdown Oracle Clusterware stack

As root user: # crsctl stop crs

3. Make required network change at OS level, /etc/hosts file should be modified on all nodes to reflect the change.
Ensure the new network is available on all cluster nodes:

% ping <private hostname/IP>
% ifconfig -a  on Unix/Linux 
or 
% ipconfig /all on windows

4. restart the Oracle Clusterware stack

As root user: # crsctl start crs

Note:  If running OCFS2 on Linux, one  may also need to change the private IP address that OCFS2 is using to communicate with other nodes.   For more information, please refer to  Note 604958.1

 

B. For 11gR2 and higher

As of 11.2 Grid Infrastructure, the private network configuration is not only stored in OCR but also in the gpnp profile.  If the private network is not available or its definition is incorrect, the CRSD process will not start and any subsequent changes to the OCR will be impossible. Therefore care needs to be taken when making modifications to the configuration of the private network. It is important to perform the changes in the correct order. Please also note that manual modification of gpnp profile is not supported.

Please take a backup of profile.xml on all cluster nodes before proceeding, as grid user:
$ cd $GRID_HOME/gpnp/<hostname>/profiles/peer/
$ cp -p profile.xml profile.xml.bk


1. Ensure Oracle Clusterware is running on ALL cluster nodes in the cluster

2. As grid user:

Get the existing information. For example:

$ oifcfg getif
eth1 100.17.10.0 global public
eth0 192.168.0.0 global cluster_interconnect


Add the new cluster_interconnect information:

$ oifcfg setif -global <interface>/<subnet>:cluster_interconnect

For example:
a. add a new interface bond0 with the same subnet
$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect

b. add a new subnet with the same interface name but different subnet or new interface name
$ oifcfg setif -global eth0/192.65.0.0:cluster_interconnect
or
$ oifcfg setif -global eth3/192.168.1.96:cluster_interconnect

 

1. This can be done with -global option even if the interface is not available yet, but this can not be done with -node option if the interface is not available, it will lead to node eviction.

2. If the interface is available on the server, subnet address can be identified by command:
$ oifcfg iflist

It lists the network interface and its subnet address. This command can be run even if Oracle Clusterware is not running.  Please note, subnet address might not be in the format of x.y.z.0, it can be x.y.z.24, x.y.z.64 or x.y.z.128 etc. For example,
$ oifcfg iflist 
lan1 18.1.2.0
lan2  10.2.3.64        << this is the private network subnet address associated with private network IP: 10.2.3.86

3. If it is for adding a 2nd private network, not replacing the existing private network, please ensure  MTU size of both interfaces are the same, otherwise instance startup will report error:
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if MTU failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpcini2
ORA-27303: additional information: requested interface lan1:801 has a different MTU (1500) than lan3:801 (9000), which is not supported. Check output from ifconfig command


Verify the change:

$ oifcfg getif


3. Shutdown Oracle Clusterware on all nodes and disable the Oracle Clusterware as root user:

# crsctl stop crs
# crsctl disable crs


4. Make the network configuration change at OS level as required, ensure the new interface is available on all nodes after the change.

$ ifconfig -a
$ ping <private hostname>


5. Enable Oracle Clusterware and restart Oracle Clusterware on all nodes as root user:

# crsctl enable crs
# crsctl start crs


6. Remove the old interface if required:

$ oifcfg delif -global <if_name>[/<subnet>]
eg:
$ oifcfg delif -global eth0/192.168.0.0

 

Something to note for 11gR2


1. If underlying network configuration has been changed, but oifcfg has not been run to make the same change,  then upon Oracle Clusterware restart, the CRSD will not be able to start.

The crsd.log will show:

2010-01-30 09:22:47.234: [ default][2926461424] CRS Daemon Starting
..
2010-01-30 09:22:47.273: [ GPnP][2926461424]clsgpnp_Init: [at clsgpnp0.c:837] GPnP client pid=7153, tl=3, f=0
2010-01-30 09:22:47.282: [ OCRAPI][2926461424] clsu_get_private_ip_addresses: no ip addresses found.
2010-01-30 09:22:47.282: [GIPCXCPT][2926461424] gipcShutdownF: skipping shutdown, count 2, from [ clsinet.c : 1732], ret gipcretSuccess (0)
2010-01-30 09:22:47.283: [GIPCXCPT][2926461424] gipcShutdownF: skipping shutdown, count 1, from [ clsgpnp0.c : 1021], ret gipcretSuccess (0)
[ OCRAPI][2926461424]a_init_clsss: failed to call clsu_get_private_ip_addr (7)
2010-01-30 09:22:47.285: [ OCRAPI][2926461424]a_init:13!: Clusterware init unsuccessful : [44]
2010-01-30 09:22:47.285: [ CRSOCR][2926461424] OCR context init failure. Error:  PROC-44: Error in network address and interface operations Network address and interface operations error [7]
2010-01-30 09:22:47.285: [ CRSD][2926461424][PANIC] CRSD exiting: Could not init OCR, code: 44
2010-01-30 09:22:47.285: [ CRSD][2926461424] Done.

Above errors indicate a mismatch between OS setting (oifcfg iflist) and gpnp profile setting profile.xml.

Workaround: restore the OS network configuration back to the original status, start Oracle Clusterware. Then follow above steps to make the changes again. 

If the underlying network has not been changed, but oifcfg setif has been run with a wrong subnet address or interface name, same issue will happen.



2. If any one node is down in the cluster, oifcfg command will fail with error:

$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
PRIF-26: Error in update the profiles in the cluster

Workaround: start Oracle Clusterware on the node where it is not running. Ensure Oracle Clusterware is up on all cluster nodes. If the node is down for any OS reason, please remove the node from the cluster before performing private network change.

3. If a user other than Grid Infrastructure owner issues above command, it will fail with same error:

$ oifcfg setif -global bond0/192.168.0.0:cluster_interconnect
PRIF-26: Error in update the profiles in the cluster

Workaround: ensure to login as Grid Infrastructure owner to perform such command.

4. From 11.2.0.2 onwards, if attempt to delete the last private interface (cluster_interconnect) without adding a new one first, following error will occur:

PRIF-31: Failed to delete the specified network interface because it is the last private interface

Workaround: Add new private interface first before deleting the old private interface.

5. If Oracle Clusterware is down on the node, the following error is expected:

$ oifcfg getif
PRIF-10: failed to initialize the cluster registry

Workaround: Start the Oracle Clusterware on the node

 

Notes for Windows Systems

The syntax for changing the interfaces on Windows/RAC clusters is the same as on Unix/Linux, but the interface names will be slightly different. On Windows systems, the default names assigned to the interfaces are generally named such as:

Local Area Connection
Local Area Connection 1 
Local Area Connection 2

If using an interface name that has space in it, the name must be enclosed in quotes. Also, be aware that it is case sensitive.  For example, on Windows,  to set cluster_interconnect:

C:\oracle\product\10.2.0\crs\bin\oifcfg setif -global "Local Area Connection 1"/192.168.1.0:cluster_interconnect

However, it is best practice on Windows to rename the interfaces to be more meaningful, such as renaming them to 'ocwpublic' and 'ocwprivate'.   If interface names are renamed after Oracle Clusterware is installed, then you will need to run 'oifcfg'  to add the new interface and delete the old one, as described above.

You can view the available interface names on each node by running the command:

oifcfg iflist -p -n

This command must be run on each node to verify the interface names are defined the same.

Ramifications of Changing Interface Names Using oifcfg

For the Private interface, the database will use the interface stored in the OCR and defined as a 'cluster_interconnect' for cache fusion traffic.  The cluster_interconnect information is available at startup in the alert log, after the parameter listing - for example:

For pre 11.2.0.2:
Cluster communication is configured to use the following interface(s) for this instance 
192.168.1.1


For 11.2.0.2+: (HAIP address will show in alert log instead of private IP)
Cluster communication is configured to use the following interface(s) for this instance
  169.254.86.97

If this is incorrect, then instance is required to restart once the OCR entry is corrected. This applies to ASM instances and Database instances alike. On Windows systems, after shutting down the instance, it is also required to stop/restart the OracleService<SID> (or OracleASMService<ASMSID> before the OCR will be re-read.

 

Oifcfg Usage

To see the full options of oifcfg, simply type:

$ $ORA_CRS_HOME/bin/oifcfg

 

Database - RAC/Scalability Community
To discuss this topic further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Database - RAC/Scalability Community

REFERENCES

NOTE:1386709.1 - The Basics of IPv4 Subnet and Oracle Clusterware
NOTE:276434.1 - How to Modify Public Network Information including VIP in Oracle Clusterware
NOTE:604958.1 - OCFS2 Node Fence Caused by Removing the External Network Cable
NOTE:1054902.1 - How to Validate Network and Name Resolution Setup for the Clusterware and RAC
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Applies to: Oracle Server - Enterprise Edition - Version: 10.2.0.4 to 11.1.0.6 Information in this document applies to any platform. Description The following errors may be reported: ORA-00603: ORACLE server session terminated by fatal error ORA-27504: IPC error creating OSD context ORA-27300: OS system dependent operation:sskgxp_select failed with status: 3 ORA-27301: OS failure message: No such process ORA-27302: failure occurred at: skgxpvfymmtu ORA-27303: additional information: MTU could not be verified. Did not receive valid message. These errors are caused by more aggressive checking introduced in 10.2.0.4. Likelihood of Occurrence This only affects Oracle Real Application Clusters and can be reported in ASM as well as database instances. The issue was introduced in Oracle 10.2.0.4 so earlier versions are not affected Possible Symptoms The errors above can be reported by Oracle shadow processes or by background processes. Additionally they may be reported in the alert.log. The symptoms can include: - process failure - startup failure - processes spinning in function sskgxp_select with high CPU usage Workaround or Resolution There is no workaround available. However, if the instance fails to start, a reboot of the server supporting the instance will usually allow startup to succeed. Patches At the time of writing, patches were under development on top of 10.2.0.4 on some platforms. Please check Patch 7331323 for availability on your platform. The problem is resolved in 11.1.0.7. References NOTE:419937.1 - Alert.log shows frequently "skgxpvfymmtu: process failed because of a resource problem in the OS" NOTE:419871.1 - Failures due to "skgxpvfymmtu: process failed because of a resource problem in the OS" on 32-bit Linux

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值