ORA-15040, ORA-15066, ORA-15042 when ASM disk is not present in all nodes of a Rac Cluster. Adding a

最新推荐文章于 2022-07-08 16:37:42 发布

cpj720325

最新推荐文章于 2022-07-08 16:37:42 发布

阅读量548

点赞数

文章标签：数据库

In this Document
Symptoms
10.2.0.2 and below
10.2.0.3 and above
Cause
10.2.0.2 and below
10.2.0.3 and above
Solution
10.2.0.2 and below
10.2.0.3 and above
How to verify the disk is present in all the nodes of the cluster.
References

Applies to:

Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 10.2.0.2 - Release: 10.1 to 10.2
Information in this document applies to any platform.

Symptoms

Environments running RAC and ASM, before a disk became part of the diskgroup, it has to be validated in all nodes of the cluster. Failing to this validation, produce different results, particularly for environments using version 10.2.0.2 and below.

10.2.0.2 and below

The diskgroup is mounted in the ASM instance where the operation was executed, but is not mounted in other ASM instances of the cluster where the disk was missing.

Trying to mount the diskgroup could report possible errors like:

ORA-15001: diskgroup "1" does not exist or is not mounted
ORA-15040: diskgroup is incomplete
ORA-15066: offlining disk "" may result in a data loss
ORA-15042: ASM disk "3" is missing

or

SQL> alter diskgroup test1 mount;
alter diskgroup test1 mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing

When adding disks to the diskgroup:

SQL> alter diskgroup test1 add disk '/dev/asmdisk_KH5' force;
alter diskgroup test1 add disk '/dev/asmdisk_KH5' force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15075: disk(s) are not visible cluster-wide

[@more@]

10.2.0.3 and above

Starting with 10.2.0.3 , if the disk can not be validated in all nodes of the cluster, it won't be added to the diskgroup. During the process, the header of the disk is formatted with valid information looking like a valid ASM disk.

Trying to delete the disk will failed with error ORA-15032 and ORA-15054

SQL> alter diskgroup test drop disk '/dev/raw/raw8';
alter diskgroup test drop disk '/dev/raw/raw8'
*
ERROR at line 1:
ORA-15032 : not all alterations performed
ORA-15054 : disk "/DEV/RAW/RAW8" does not exist in diskgroup "TEST"

Trying to add the disk again will fail with errors ORA-15032 and ORA-15020

SQL> alter diskgroup test add disk '/dev/raw/raw8';
alter diskgroup TEST add disk '/dev/raw/raw8'
*
ERROR at line 1:
ORA-15032 : not all alterations performed
ORA-15020 : discovered duplicate ASM disk "TEST_0005"

Cause

10.2.0.2 and below

When a disk is added to the diskgroup, it has to be validated through all the nodes in the cluster. If the disk is not present in one of the nodes, in the alert.log file of the ASM instance were the command was executed (master instance), it will report the following message:

SQL> ALTER DISKGROUP DG3_DBDATA ADD DISK '/dev/raw/raw8' SIZE 25595M
Mon Dec 12 15:16:10 2005
NOTE: reconfiguration of group 1/0xfaa07f1e (DG3_DBDATA), full=1
Mon Dec 12 15:16:10 2005
NOTE: initializing header on grp 1 disk DG3_DBDATA_0003
NOTE: cache opening disk 3 of grp 1: DG3_DBDATA_0003 path:/dev/raw/raw8
NOTE: requesting all-instance disk validation for group=1

At this point the disk has been partially added (receiving a name, number, etc), and it needs to be validated across all the instances.

The other instances in the cluster will trigger a disk scan and will read the header of the disks, looking for the new disk added. If the new disk is not found in the node, then the master instance (where the operation was started), will report in the alert.log the following message:

NOTE: disk validation pending for group 1/0xfaa07f1e (DG3_DBDATA) SUCCESS: validated disks for 1/0xfaa07f1e (DG3_DBDATA) SUCCESS: refreshed PST for 1/0xfaa07f1e (DG3_DBDATA) Received dirty detach msg from node 1 for dom 1

From the last message, we can identify the node where the disk was not discovered: from node 1.

Reviewing the alert.log for the ASM instance of that particular node will found errors like:

NOTE: reconfiguration of group 1/0xfaa064f3 (DG3_DBDATA), full=1 NOTE: disk validation pending for group 1/0xfaa064f3 (DG3_DBDATA) ERROR: group 1/0xfaa064f3 (DG3_DBDATA): could not validate disk 3 SUCCESS: validated disks for 1/0xfaa064f3 (DG3_DBDATA) SUCCESS: refreshed PST for 1/0xfaa064f3 (DG3_DBDATA) ERROR: ORA-15040 thrown in RBAL for group number 1 Mon Dec 12 15:16:15 2005 Errors in file /oralog/+asm2_rbal_8654.trc: ORA-15040: diskgroup is incomplete ORA-15066: offlining disk "" may result in a data loss ORA-15042: ASM disk "3" is missing

If the diskgroup is using external redundancy, it will be dismounted on the instances where the new disk was not discovered. Before mounting the diskgroup, the disks have to be available across all the nodes.

10.2.0.3 and above

Starting with 10.2.0.3, the situation changed. Now the disk is not going to be added to the diskgroup. Also the messages reported in the instance where the disk was not found, have a lite difference:
NOTE: reconfiguration of group 2/0xc9489bc5 (TEST), full=1
NOTE: disk validation pending for group 2/0xc9489bc5 (TEST)
ERROR: group 2/0xc9489bc5 (TEST): could not validate disk 3

Solution

10.2.0.2 and below

1. Please, DO NOT manipulate the header of the disk.

Remember that the disk has been " partially" added to the diskgroup, which means it already contains an ASM header and metadata, and the diskgroup is available in some of the nodes. If the header of the new disk is changed outside of ASM, it wont let ASM to reconfigure the disks.

2. DO NOT dismount the diskgroup.

Do not try to dismount the diskgroup from the ASM instances where it is still mounted. Also do not try to reboot the nodes.

3. Generate a manual rebalance.

If the problem is reported when creating the diskgroup, it will be created and mounted without problems on the ASM instance where the command was executed, but an error will be reported when mounting the diskgroup in the other nodes from the cluster. Before mounting the diskgroup, verify the disks are present in all the nodes, following guidelines from steps 4 and 5.

If the problem is caused after adding disks to an existent diskgroup:

The disk added is in a "partial" status. Triggering a manual rebalance will expel this disk from the diskgroup. Remember, the diskgroup will be mounted on the ASM instance where the command was executed.

SQL> alter diskgroup rebalance power X;

Review V$ASM_OPERATION and look for rows related to the particular diskgroup. No rows in the view means no rebalance operation running. Either it did not run or it has finished. Try to mount the diskgroup in one of the instances where it is not mounted.

If the disk is present in all the nodes, then other way to expel the disk is running the following commands:

SQL> alter diskgroup drop disk ;
SQL> alter diskgroup undrop disks;

It will trigger a rebalance operation and will remove the incomplete disk.

Before trying to add the disk again, verify the disk is present in all the nodes of the cluster.

10.2.0.3 and above

The disk is not part of the diskgroup, then is not required to run drop or rebalance commands.

Note how the disk is displayed in v$asm_disk

In this example, the disks were using ASMLIB

GROUP NUMBER	DISK NUMBER	PATH	HEADER STATUS	STATE	NAME
0	8	ORCL:DISK4	MEMBER	IGNORED
1	0	ORCL:DISK1	MEMBER	CACHED	DISK1
2	1	ORCL:DISK2	MEMBER	CACHED	DISK2
3	2	ORCL:DISK3	MEMBER	CACHED	DISK3

The following columns identify the disk is not part of any diskgroup:

MOUNT_STATUS = IGNORED
GROUP_NUMBER = 0. This is the number used when a disk is not mounted by a diskgroup

Don't get confused when using kfed on the problematic disk, as it will show a valid ASM disk header.
Remember this is formatted at the beginning of the operation but never changed if process failed.

After validating the disk is present in all nodes, clear the content of the disk or rebuild the asmlib disk and add it to the diskgroup

How to verify the disk is present in all the nodes of the cluster.

There is not required that all the disks have the same name across all the nodes. This just makes easier the administration when having large number of disks. What is relevant for ASM, is that all the disks can be discovered (asm_diskstring). When the disk is validated in the other nodes, ASM reads the header of the disk in order to complete the operation. The disks are discovered based on the asm_diskstring parameter.

There are default paths for the platforms and those directories will be scanned even if parameter asm_diskstring is not set. The devices with right permissions will be discovered.

Some of the common causes that cause the disk not been discovered are:

* If using ASMLIB, the command /etc/init.d/oracleasm scandisk was not executed in the remaining nodes of the
cluster.
* Incorrect permissions. Oracle user is not the owner or can not read/write.
* The disk is not present under any name.
* In Linux when using raw bindings and the raw name is consistent across the nodes, either the device
is incorrect (different disk), or there are missing entries in file /proc/partitions.
* The name could be the same across all the nodes, but they are not pointing to the same device.
* In IBM AIX, when using disks without LVM, there are particular attributes for the disks ( Table 1).

STORAGE TYPE ATTRIBUTE VALUE
IBM STORAGE (ESS,FasTt,DSXXX) reserve_policy no_reserve
EMC STORAGE reserve_lock no

Table 1
For example, using ESS disks and configuring them with reserve_policy=single_path, will allow to mount the diskgroup only in one node. As an additional test, dd command can be used to read from the device on the second node. A failure of this operation at the Operating System level, demonstrate the cause is related to the disk configuration.

lsattr -E -l hdisk19|grep reser reserve_policy no_reserve Reserve Policy True lsattr -E -l hdisk21|grep reser reserve_policy single_path Reserve Policy True *********** Wont be discovered in a second node

How to verify the disk is the same across all the nodes.

* If using ASMLIB, run /etc/init.d/oracleasm listdisk in all the nodes, and verify the output is the same. Other
option is running ls -l /dev/oracleasm/disks

* Use kfed to review the content of the header.

In each node execute:

$kfed read text=name.txt

* Compare the output of the txt files. It has to be same.

* In the node where the disk was discovered, this is part of the valid output:

fbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483649 ; 0x008: TYPE=0x8 NUMB=0x1
kfbh.check: 3189683974 ; 0x00c: 0xbe1eb706 /* SOME DATA REMOVED INTENTIONALLY */kfdhdb.dsknum: 1 ; 0x024: 0x0001 kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER kfdhdb.dskname: DATA_0001 ; 0x028: length=9 kfdhdb.grpname: DATA ; 0x048: length=4 kfdhdb.fgname: DATA_0001 ; 0x068: length=9

* In the failing node, if the disk is not the same, you will get different output or the operation will fail.

KFED-00303: unable to open file ''

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/11420681/viewspace-1059403/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/11420681/viewspace-1059403/

cpj720325

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ORA-15040, ORA-15066, ORA-15042 when ASM disk is not present in all nodes of a Rac Cluster. Adding a

In this Document Symptoms 10.2.0.2 and below 10.2.0.3 and above Cause 10.2.0.2 and below...
复制链接

扫一扫