In this Document
Symptoms
Cause
Solution
This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review. |
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.2 and later [Release: 11.2 and later ]Information in this document applies to any platform.
Symptoms
Root.sh failed on second noderootcrs_node2.log:
...
2011-10-14 12:19:00: Configuring ASM via ASMCA
2011-10-14 12:19:00: Executing as oracle: /oracle/app/product/11.2.0/grid/bin/asmca -silent -diskGroupName DATA -diskList '/asmdisks/10GB/c0t60000970000192604380533031384543d0s4,/asmdisks/10GB/c0t60000970000192604380533031384544d0s4,/asmdisks/10GB/c0t60000970000192604380533031384545d0s4' -redundancy NORMAL -diskString '/asmdisks/10GB/*' -configureLocalASM
2011-10-14 12:19:00: Running as user oracle: /oracle/app/product/11.2.0/grid/bin/asmca -silent -diskGroupName DATA -diskList '/asmdisks/10GB/c0t60000970000192604380533031384543d0s4,/asmdisks/10GB/c0t60000970000192604380533031384544d0s4,/asmdisks/10GB/c0t60000970000192604380533031384545d0s4' -redundancy NORMAL -diskString '/asmdisks/10GB/*' -configureLocalASM
2011-10-14 12:19:00: Invoking "/oracle/app/product/11.2.0/grid/bin/asmca -silent -diskGroupName DATA -diskList '/asmdisks/10GB/c0t60000970000192604380533031384543d0s4,/asmdisks/10GB/c0t60000970000192604380533031384544d0s4,/asmdisks/10GB/c0t60000970000192604380533031384545d0s4' -redundancy NORMAL -diskString '/asmdisks/10GB/*' -configureLocalASM" as user "oracle"
2011-10-14 12:19:00: Executing /bin/su oracle -c "/oracle/app/product/11.2.0/grid/bin/asmca -silent -diskGroupName DATA -diskList '/asmdisks/10GB/c0t60000970000192604380533031384543d0s4,/asmdisks/10GB/c0t60000970000192604380533031384544d0s4,/asmdisks/10GB/c0t60000970000192604380533031384545d0s4' -redundancy NORMAL -diskString '/asmdisks/10GB/*' -configureLocalASM"
2011-10-14 12:19:00: Executing cmd: /bin/su oracle -c "/oracle/app/product/11.2.0/grid/bin/asmca -silent -diskGroupName DATA -diskList '/asmdisks/10GB/c0t60000970000192604380533031384543d0s4,/asmdisks/10GB/c0t60000970000192604380533031384544d0s4,/asmdisks/10GB/c0t60000970000192604380533031384545d0s4' -redundancy NORMAL -diskString '/asmdisks/10GB/*' -configureLocalASM"
2011-10-14 12:19:29: Command output:
>
> Disk Group DATA creation failed with the following message:
> Disk Group DATA already exists. Cannot be created again
>
>End Command output
2011-10-14 12:19:29: Configuration of ASM ... failed
ASMCA log shows:
[main] [ 2011-10-14 12:19:27.813 EDT ] [UsmcaLogger.logInfo:142] Instance running true
[main] [ 2011-10-14 12:19:27.823 EDT ] [UsmcaLogger.logInfo:142] Diskgroup does not exist, creating..
...
[main] [ 2011-10-14 12:19:28.380 EDT ] [UsmcaLogger.logInfo:142] Disk Group DATA creation failed with the following message:
[main] [ 2011-10-14 12:19:28.380 EDT ] [UsmcaLogger.logInfo:142] Disk Group DATA already exists. Cannot be created again
...
[main] [ 2011-10-14 12:19:28.381 EDT ] [UsmcaLogger.logInfo:142] Diskgroup creation is not successful.
For the first node, root.sh did not failed and Oracle Clusterware is up/running as well as +ASM1 which has the ASM diskgroup mounted.
Cause
Checked the ASM disks from second node, and these are visible/accessible, so the problem is not caused by the storage.A 'ps -ef|grep -i asm' from the second node shows several background processes running for ASM but these are for ASM1? This should be +ASM2 instead. A 'cat /etc/oratab' from the second node, confirms that +ASM1 (not +ASM2) is configured on the second node.
Having +ASM1 on more than one node occurs when root.sh is executed on the second node before root.sh completes on the first node. Since the Oracle Clusterware has not completed registering +ASM1 on the first node (where root.sh is executed first but yet still running), root.sh on the second node is not aware that +ASM1 configuration is in progress so it proceed to configure it on the second node which eventually fails.
To confirm this, check the timestamps for rootcrs_node<x>.log from all nodes.
Solution
To correct this please do the following manual steps since ASMCA does not have yet an option to remove an ASM instance:To recreate ASM on the node where it initially failed:
1) Deconfigure Grid Infrastructure (GI, gi_home) on the second node:
As root
# cd <gi_home>/crs/install
# ./rootcrs.pl -verbose -deconfig -force
Remove the ASM entry from the second node's /etc/oratab. Example: +ASM1:/u01/app/11.2.0/grid:N # line added by Agent
Backup and remove any ASM named files from the second node's <gi_home>/dbs directory: Example: ab_+ASM1.dat hc_+ASM2.dat
2) Reconfigure GI on the node where it initially failed (second node):
As root
# cd <gi_home>
# ./root.sh
3) Checks:
root.sh on node2 must succeed: Example (last line): Configure Oracle Grid Infrastructure for a Cluster ... succeeded
As the gi_home owner/user:
$ cd <gi_home>/bin
$ ./crsctl stat res -t <-- must show Oracle Clusterware, ASM, etc running on both nodes (+ASM1 on first node, +ASM2 on second node)
As root
# cd <gi_home>/crs/install
# ./rootcrs.pl -verbose -deconfig -force
Remove the ASM entry from the second node's /etc/oratab. Example: +ASM1:/u01/app/11.2.0/grid:N # line added by Agent
Backup and remove any ASM named files from the second node's <gi_home>/dbs directory: Example: ab_+ASM1.dat hc_+ASM2.dat
2) Reconfigure GI on the node where it initially failed (second node):
As root
# cd <gi_home>
# ./root.sh
3) Checks:
root.sh on node2 must succeed: Example (last line): Configure Oracle Grid Infrastructure for a Cluster ... succeeded
As the gi_home owner/user:
$ cd <gi_home>/bin
$ ./crsctl stat res -t <-- must show Oracle Clusterware, ASM, etc running on both nodes (+ASM1 on first node, +ASM2 on second node)