ORA-15040, ORA-15066, ORA-15042 when ASM disk is not present in all nodes of a Rac Cluster. Adding a

In this Document
Symptoms
10.2.0.2 and below
10.2.0.3 and above
Cause
10.2.0.2 and below
10.2.0.3 and above
Solution
10.2.0.2 and below
10.2.0.3 and above
How to verify the disk is present in all the nodes of the cluster.
References


Applies to:

Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 10.2.0.2 - Release: 10.1 to 10.2
Information in this document applies to any platform.

Symptoms

Environments running RAC and ASM, before a disk became part of the diskgroup, it has to be validated in all nodes of the cluster. Failing to this validation, produce different results, particularly for environments using version 10.2.0.2 and below.

10.2.0.2 and below


The diskgroup is mounted in the ASM instance where the operation was executed, but is not mounted in other ASM instances of the cluster where the disk was missing.

Trying to mount the diskgroup could report possible errors like:

ORA-15001: diskgroup "1" does not exist or is not mounted
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "" may result in a data loss
ORA-15042: ASM disk "3" is missing

or

SQL> alter diskgroup test1 mount;
alter diskgroup test1 mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing
When adding disks to the diskgroup:

SQL> alter diskgroup test1 add disk '/dev/asmdisk_KH5' force;
alter diskgroup test1 add disk '/dev/asmdisk_KH5' force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15075: disk(s) are not visible cluster-wide
[@more@]

10.2.0.3 and above


Starting with 10.2.0.3 , if the disk can not be validated in all nodes of the cluster, it won't be added to the diskgroup. During the process, the header of the disk is formatted with valid information looking like a valid ASM disk.

Trying to delete the disk will failed with error ORA-15032 and ORA-15054

SQL> alter diskgroup test drop disk '/dev/raw/raw8';
alter diskgroup test drop disk '/dev/raw/raw8'
*
ERROR at line 1:
ORA-15032 : not all alterations performed
ORA-15054 : disk "/DEV/RAW/RAW8" does not exist in diskgroup "TEST"

Trying to add the disk again will fail with errors ORA-15032 and ORA-15020
SQL> alter diskgroup test add disk '/dev/raw/raw8';
alter diskgroup TEST add disk '/dev/raw/raw8'
*
ERROR at line 1:
ORA-15032 : not all alterations performed
ORA-15020 : discovered duplicate ASM disk "TEST_0005"






Cause

10.2.0.2 and below


When a disk is added to the diskgroup, it has to be validated through all the nodes in the cluster. If the disk is not present in one of the nodes, in the alert.log file of the ASM instance were the command was executed (master instance), it will report the following message:

SQL> ALTER DISKGROUP DG3_DBDATA ADD DISK '/dev/raw/raw8' SIZE 25595M
Mon Dec 12 15:16:10 2005
NOTE: reconfiguration of group 1/0xfaa07f1e (DG3_DBDATA), full=1
Mon Dec 12 15:16:10 2005
NOTE: initializing header on grp 1 disk DG3_DBDATA_0003
NOTE: cache opening disk 3 of grp 1: DG3_DBDATA_0003 path:/dev/raw/raw8
NOTE: requesting all-instance disk validation for group=1
At this point the disk has been partially added (receiving a name, number, etc), and it needs to be validated across all the instances.
The other instances in the cluster will trigger a disk scan and will read the header of the disks, looking for the new disk added. If the new disk is not found in the node, then the master instance (where the operation was started), will report in the alert.log the following message:

NOTE: disk validation pending for group 1/0xfaa07f1e (DG3_DBDATA)
SUCCESS: validated disks for 1/0xfaa07f1e (DG3_DBDATA)
SUCCESS: refreshed PST for 1/0xfaa07f1e (DG3_DBDATA)
Received dirty detach msg from node 1 for dom 1
From the last message, we can identify the node where the disk was not discovered: from node 1.

Reviewing the alert.log for the ASM instance of that particular node will found errors like:

NOTE: reconfiguration of group 1/0xfaa064f3 (DG3_DBDATA), full=1
NOTE: disk validation pending for group 1/0xfaa064f3 (DG3_DBDATA)
ERROR: group 1/0xfaa064f3 (DG3_DBDATA): could not validate disk 3
SUCCESS: validated disks for 1/0xfaa064f3 (DG3_DBDATA)
SUCCESS: refreshed PST for 1/0xfaa064f3 (DG3_DBDATA)
ERROR: ORA-15040 thrown in RBAL for group number 1
Mon Dec 12 15:16:15 2005
Errors in file /oralog/+asm2_rbal_8654.trc:
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "" may result in a data loss
ORA-15042: ASM disk "3" is missing
If the diskgroup is using external redundancy, it will be dismounted on the instances where the new disk was not discovered. Before mounting the diskgroup, the disks have to be available across all the nodes.

10.2.0.3 and above

Starting with 10.2.0.3, the situation changed. Now the disk is not going to be added to the diskgroup. Also the messages reported in the instance where the disk was not found, have a lite difference:

NOTE: reconfiguration of group 2/0xc9489bc5 (TEST), full=1
NOTE: disk validation pending for group 2/0xc9489bc5 (TEST)
ERROR: group 2/0xc9489bc5 (TEST): could not validate disk 3

Solution

10.2.0.2 and below

1. Please, DO NOT manipulate the header of the disk.
Remember that the disk has been " partially" added to the diskgroup, which means it already contains an ASM header and metadata, and the diskgroup is available in some of the nodes. If the header of the new disk is changed outside of ASM, it wont let ASM to reconfigure the disks.
2. DO NOT dismount the diskgroup.

Do not try to dismount the diskgroup from the ASM instances where it is still mounted. Also do not try to reboot the nodes.
3. Generate a manual rebalance.

If the problem is reported when creating the diskgroup, it will be created and mounted without problems on the ASM instance where the command was executed, but an error will be reported when mounting the diskgroup in the other nodes from the cluster. Before mounting the diskgroup, verify the disks are present in all the nodes, following guidelines from steps 4 and 5.

If the problem is caused after adding disks to an existent diskgroup:

The disk added is in a "partial" status. Triggering a manual rebalance will expel this disk from the diskgroup. Remember, the diskgroup will be mounted on the ASM instance where the command was executed.

SQL> alter diskgroup rebalance power X;

Review V$ASM_OPERATION and look for rows related to the particular diskgroup. No rows in the view means no rebalance operation running. Either it did not run or it has finished. Try to mount the diskgroup in one of the instances where it is not mounted.

If the disk is present in all the nodes, then other way to expel the disk is running the following commands:

SQL> alter diskgroup drop disk ;
SQL> alter diskgroup undrop disks;


It will trigger a rebalance operation and will remove the incomplete disk.

Before trying to add the disk again, verify the disk is present in all the nodes of the cluster.

10.2.0.3 and above

The disk is not part of the diskgroup, then is not required to run drop or rebalance commands.


Note how the disk is displayed in v$asm_disk

In this example, the disks were using ASMLIB


GROUP
NUMBER
DISK
NUMBER
PATHHEADER
STATUS
STATENAME
08ORCL:DISK4MEMBERIGNORED
10ORCL:DISK1MEMBERCACHEDDISK1
21ORCL:DISK2MEMBERCACHEDDISK2
32ORCL:DISK3MEMBERCACHEDDISK3

The following columns identify the disk is not part of any diskgroup:

MOUNT_STATUS = IGNORED
GROUP_NUMBER = 0. This is the number used when a disk is not mounted by a diskgroup

Don't get confused when using kfed on the problematic disk, as it will show a valid ASM disk header.
Remember this is formatted at the beginning of the operation but never changed if process failed.

After validating the disk is present in all nodes, clear the content of the disk or rebuild the asmlib disk and add it to the diskgroup



How to verify the disk is present in all the nodes of the cluster.


There is not required that all the disks have the same name across all the nodes. This just makes easier the administration when having large number of disks. What is relevant for ASM, is that all the disks can be discovered (asm_diskstring). When the disk is validated in the other nodes, ASM reads the header of the disk in order to complete the operation. The disks are discovered based on the asm_diskstring parameter.

There are default paths for the platforms and those directories will be scanned even if parameter asm_diskstring is not set. The devices with right permissions will be discovered.

Some of the common causes that cause the disk not been discovered are:

* If using ASMLIB, the command /etc/init.d/oracleasm scandisk was not executed in the remaining nodes of the
cluster.
* Incorrect permissions. Oracle user is not the owner or can not read/write.
* The disk is not present under any name.
* In Linux when using raw bindings and the raw name is consistent across the nodes, either the device
is incorrect (different disk), or there are missing entries in file /proc/partitions.
* The name could be the same across all the nodes, but they are not pointing to the same device.
* In IBM AIX, when using disks without LVM, there are particular attributes for the disks ( Table 1).

STORAGE TYPEATTRIBUTEVALUE
IBM STORAGE (ESS,FasTt,DSXXX)reserve_policyno_reserve
EMC STORAGEreserve_lockno
Table 1
For example, using ESS disks and configuring them with reserve_policy=single_path, will allow to mount the diskgroup only in one node. As an additional test, dd command can be used to read from the device on the second node. A failure of this operation at the Operating System level, demonstrate the cause is related to the disk configuration.

lsattr -E -l hdisk19|grep reser
reserve_policy no_reserve Reserve Policy True
lsattr -E -l hdisk21|grep reser
reserve_policy single_path Reserve Policy True
*********** Wont be discovered in a second node
How to verify the disk is the same across all the nodes.

* If using ASMLIB, run /etc/init.d/oracleasm listdisk in all the nodes, and verify the output is the same. Other
option is running ls -l /dev/oracleasm/disks

* Use kfed to review the content of the header.

In each node execute:

$kfed read text=name.txt
* Compare the output of the txt files. It has to be same.

* In the node where the disk was discovered, this is part of the valid output:
fbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483649 ; 0x008: TYPE=0x8 NUMB=0x1
kfbh.check: 3189683974 ; 0x00c: 0xbe1eb706
/* SOME DATA REMOVED INTENTIONALLY */kfdhdb.dsknum: 1 ; 0x024: 0x0001
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DATA_0001 ; 0x028: length=9
kfdhdb.grpname: DATA ; 0x048: length=4
kfdhdb.fgname: DATA_0001 ; 0x068: length=9
* In the failing node, if the disk is not the same, you will get different output or the operation will fail.
KFED-00303: unable to open file ''

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/11420681/viewspace-1059403/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/11420681/viewspace-1059403/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值