Root.sh Failed on Second Node: Configuration of ASM Failed: Disk Group ... Already Exists [ID 138472_disk group data creation failed with the following-CSDN博客

In this Document
  Symptoms
  Cause
  Solution

This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review.

Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.2 and later [Release: 11.2 and later ]
Information in this document applies to any platform.

Symptoms

Root.sh failed on second node

rootcrs_node2.log: 
 ...
 2011-10-14 12:19:00: Configuring ASM via ASMCA
 2011-10-14 12:19:00: Executing as oracle: /oracle/app/product/11.2.0/grid/bin/asmca -silent -diskGroupName DATA -diskList '/asmdisks/10GB/c0t60000970000192604380533031384543d0s4,/asmdisks/10GB/c0t60000970000192604380533031384544d0s4,/asmdisks/10GB/c0t60000970000192604380533031384545d0s4' -redundancy NORMAL -diskString '/asmdisks/10GB/*' -configureLocalASM
 2011-10-14 12:19:00: Running as user oracle: /oracle/app/product/11.2.0/grid/bin/asmca -silent -diskGroupName DATA -diskList '/asmdisks/10GB/c0t60000970000192604380533031384543d0s4,/asmdisks/10GB/c0t60000970000192604380533031384544d0s4,/asmdisks/10GB/c0t60000970000192604380533031384545d0s4' -redundancy NORMAL -diskString '/asmdisks/10GB/*' -configureLocalASM
 2011-10-14 12:19:00: Invoking "/oracle/app/product/11.2.0/grid/bin/asmca -silent -diskGroupName DATA -diskList '/asmdisks/10GB/c0t60000970000192604380533031384543d0s4,/asmdisks/10GB/c0t60000970000192604380533031384544d0s4,/asmdisks/10GB/c0t60000970000192604380533031384545d0s4' -redundancy NORMAL -diskString '/asmdisks/10GB/*' -configureLocalASM" as user "oracle"
 2011-10-14 12:19:00: Executing /bin/su oracle -c "/oracle/app/product/11.2.0/grid/bin/asmca -silent -diskGroupName DATA -diskList '/asmdisks/10GB/c0t60000970000192604380533031384543d0s4,/asmdisks/10GB/c0t60000970000192604380533031384544d0s4,/asmdisks/10GB/c0t60000970000192604380533031384545d0s4' -redundancy NORMAL -diskString '/asmdisks/10GB/*' -configureLocalASM"
 2011-10-14 12:19:00: Executing cmd: /bin/su oracle -c "/oracle/app/product/11.2.0/grid/bin/asmca -silent -diskGroupName DATA -diskList '/asmdisks/10GB/c0t60000970000192604380533031384543d0s4,/asmdisks/10GB/c0t60000970000192604380533031384544d0s4,/asmdisks/10GB/c0t60000970000192604380533031384545d0s4' -redundancy NORMAL -diskString '/asmdisks/10GB/*' -configureLocalASM"
 2011-10-14 12:19:29: Command output:
 > 
 > Disk Group DATA creation failed with the following message:
 > Disk Group DATA already exists. Cannot be created again
 > 
 >End Command output
 2011-10-14 12:19:29: Configuration of ASM ... failed

ASMCA log shows:

[main] [ 2011-10-14 12:19:27.813 EDT ] [UsmcaLogger.logInfo:142] Instance running true
 [main] [ 2011-10-14 12:19:27.823 EDT ] [UsmcaLogger.logInfo:142] Diskgroup does not exist, creating..
 ...
 [main] [ 2011-10-14 12:19:28.380 EDT ] [UsmcaLogger.logInfo:142] Disk Group DATA creation failed with the following message:
 [main] [ 2011-10-14 12:19:28.380 EDT ] [UsmcaLogger.logInfo:142] Disk Group DATA already exists. Cannot be created again
 ...
 [main] [ 2011-10-14 12:19:28.381 EDT ] [UsmcaLogger.logInfo:142] Diskgroup creation is not successful.

For the first node, root.sh did not failed and Oracle Clusterware is up/running as well as +ASM1 which has the ASM diskgroup mounted.

Cause

Checked the ASM disks from second node, and these are visible/accessible, so the problem is not caused by the storage.

A 'ps -ef|grep -i asm' from the second node shows several background processes running for ASM but these are for ASM1? This should be +ASM2 instead. A 'cat /etc/oratab' from the second node, confirms that +ASM1 (not +ASM2) is configured on the second node.

Having +ASM1 on more than one node occurs when root.sh is executed on the second node before root.sh completes on the first node. Since the Oracle Clusterware has not completed registering +ASM1 on the first node (where root.sh is executed first but yet still running), root.sh on the second node is not aware that +ASM1 configuration is in progress so it proceed to configure it on the second node which eventually fails.

To confirm this, check the timestamps for rootcrs_node<x>.log from all nodes.

Solution

To correct this please do the following manual steps since ASMCA does not have yet an option to remove an ASM instance:

To recreate ASM on the node where it initially failed:

1) Deconfigure Grid Infrastructure (GI, gi_home) on the second node:
    As root
    # cd <gi_home>/crs/install
    # ./rootcrs.pl -verbose -deconfig -force
    Remove the ASM entry from the second node's /etc/oratab. Example: +ASM1:/u01/app/11.2.0/grid:N # line added by Agent
    Backup and remove any ASM named files from the second node's <gi_home>/dbs directory: Example: ab_+ASM1.dat hc_+ASM2.dat

2) Reconfigure GI on the node where it initially failed (second node):
    As root
    # cd <gi_home>
    # ./root.sh

3) Checks:
     root.sh on node2 must succeed: Example (last line): Configure Oracle Grid Infrastructure for a Cluster ... succeeded
     As the gi_home owner/user:
     $ cd <gi_home>/bin
     $ ./crsctl stat res -t <-- must show Oracle Clusterware, ASM, etc running on both nodes (+ASM1 on first node, +ASM2 on second node)