ORA-15042 故障终极恢复----惜分飞

接到一个朋友恢复请求,19个lun的asm 磁盘组,由于其中一个lun有问题,他们进行了增加一个新lun,删除老lun的方法操作,但是操作一半hang住了(因为坏的lun是底层损坏,无法完成rebalance),然后存储工程师继续修复异常lun,非常幸运异常lun修复好了,但是高兴过了头,直接从存储上删除了新加入的lun(已经rebalance一部分数据进去了),这个时候asm dg彻底趴下了,不能mount成功,请求恢复支持。由于某种原因,无法从lun层面恢复,只能让我们提供数据库层面恢复

Mon Sep 21 19:52:35 2015
SQL> alter diskgroup  dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012
NOTE: Assigning number (1,20) to disk ( /dev/rhdisk116 )
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk DG_XFF_0020
NOTE: requesting all-instance disk validation for group=1
Mon Sep 21 19:52:44 2015
NOTE: skipping rediscovery for group 1 /0xb94738f1 (DG_XFF) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1 /0xb94738f1 (DG_XFF) on local instance.
NOTE: initiating PST update: grp = 1
Mon Sep 21 19:52:44 2015
GMON updating group 1 at 25 for pid 27, osid 12124486
NOTE: PST update grp = 1 completed successfully
NOTE: membership refresh pending for group 1 /0xb94738f1 (DG_XFF)
GMON querying group 1 at 26 for pid 18, osid 10092734
NOTE: cache opening disk 20 of grp 1: DG_XFF_0020 path: /dev/rhdisk116
GMON querying group 1 at 27 for pid 18, osid 10092734
SUCCESS: refreshed membership for 1 /0xb94738f1 (DG_XFF)
Mon Sep 21 19:52:47 2015
SUCCESS: alter diskgroup  dg_XFF add disk '/dev/rhdisk116' size 716800M drop disk dg_XFF_0012
NOTE: starting rebalance of group 1 /0xb94738f1 (DG_XFF) at power 1
Starting background process ARB0
Mon Sep 21 19:52:47 2015
ARB0 started with pid=28, OS id =10944804
NOTE: assigning ARB0 to group 1 /0xb94738f1 (DG_XFF) with 1 parallel I /O
NOTE: Attempting voting file refresh on diskgroup DG_XFF
Mon Sep 21 20:35:06 2015
SQL> ALTER DISKGROUP DG_XFF MOUNT  /* asm agent * // * {1:51107:7083} */
NOTE: cache registered group DG_XFF number=1 incarn=0xdd6f975a
NOTE: cache began mount (first) of group DG_XFF number=1 incarn=0xdd6f975a
NOTE: Assigning number (1,0) to disk ( /dev/rhdisk10 )
NOTE: Assigning number (1,1) to disk ( /dev/rhdisk11 )
NOTE: Assigning number (1,2) to disk ( /dev/rhdisk16 )
NOTE: Assigning number (1,3) to disk ( /dev/rhdisk17 )
NOTE: Assigning number (1,4) to disk ( /dev/rhdisk22 )
NOTE: Assigning number (1,5) to disk ( /dev/rhdisk23 )
NOTE: Assigning number (1,6) to disk ( /dev/rhdisk28 )
NOTE: Assigning number (1,7) to disk ( /dev/rhdisk29 )
NOTE: Assigning number (1,8) to disk ( /dev/rhdisk33 )
NOTE: Assigning number (1,9) to disk ( /dev/rhdisk34 )
NOTE: Assigning number (1,10) to disk ( /dev/rhdisk4 )
NOTE: Assigning number (1,11) to disk ( /dev/rhdisk40 )
NOTE: Assigning number (1,12) to disk ( /dev/rhdisk41 )
NOTE: Assigning number (1,13) to disk ( /dev/rhdisk45 )
NOTE: Assigning number (1,14) to disk ( /dev/rhdisk46 )
NOTE: Assigning number (1,15) to disk ( /dev/rhdisk5 )
NOTE: Assigning number (1,16) to disk ( /dev/rhdisk52 )
NOTE: Assigning number (1,17) to disk ( /dev/rhdisk53 )
NOTE: Assigning number (1,18) to disk ( /dev/rhdisk57 )
NOTE: Assigning number (1,19) to disk ( /dev/rhdisk58 )
Wed Sep 30 11:08:07 2015
NOTE: start heartbeating (grp 1)
GMON querying group 1 at 33 for pid 35, osid 4194488
NOTE: Assigning number (1,20) to disk ()
GMON querying group 1 at 34 for pid 35, osid 4194488
NOTE: cache dismounting (clean) group 1 /0xDD6F975A (DG_XFF)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1 /0xDD6F975A (DG_XFF)
NOTE: cache ending mount (fail) of group DG_XFF number=1 incarn=0xdd6f975a
NOTE: cache deleting context for group DG_XFF 1 /0xdd6f975a
GMON dismounting group 1 at 35 for pid 35, osid 4194488
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
NOTE: Disk  in mode 0x8 marked for de-assignment
ERROR: diskgroup DG_XFF was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "20" is missing from group number "1"
ERROR: ALTER DISKGROUP DG_XFF MOUNT  /* asm agent * // * {1:51107:7083} */

这里比较明显,由于存储工程师直接删除了lun,这里导致磁盘组DG_XFF丢失asm disk 20,使得磁盘组无法直接mount,由于该磁盘组已经进行了较长时间的rebalance,丢失的盘中已经有大量数据(包括元数据),因此就算修改pst让磁盘组mount起来(不一定成功),也会丢失大量数据,也不一定可以直接拿出来里面的数据,如果只是加入盘,但是由于某种原因没有做rebalance,那我们直接可以通过修改pst,使得磁盘组mount起来。因此对于这样的情况,我们能够做的,只能从底层扫描磁盘,生成数据文件(因为有部分文件的元数据在丢失lun之上,如果直接使用现存元数据信息,直接拷贝,或者unload数据都会丢失大量数据),然后再进一步unload数据,完成恢复。需要恢复磁盘信息

grp # dsk# bsize ausize disksize diskname        groupname       path
---- ---- ----- ------ -------- --------------- --------------- -------------
    1    0  4096  4096K   179200 DG_XFF_0000     DG_XFF          /dev/rhdisk10
    1    1  4096  4096K   179200 DG_XFF_0001     DG_XFF          /dev/rhdisk11
    1    2  4096  4096K   179200 DG_XFF_0002     DG_XFF          /dev/rhdisk16
    1    3  4096  4096K   179200 DG_XFF_0003     DG_XFF          /dev/rhdisk17
    1    4  4096  4096K   179200 DG_XFF_0004     DG_XFF          /dev/rhdisk22
    1    5  4096  4096K   179200 DG_XFF_0005     DG_XFF          /dev/rhdisk23
    1    6  4096  4096K   179200 DG_XFF_0006     DG_XFF          /dev/rhdisk28
    1    7  4096  4096K   179200 DG_XFF_0007     DG_XFF          /dev/rhdisk29
    1    8  4096  4096K   179200 DG_XFF_0008     DG_XFF          /dev/rhdisk33
    1    9  4096  4096K   179200 DG_XFF_0009     DG_XFF          /dev/rhdisk34
    1   10  4096  4096K   179200 DG_XFF_0010     DG_XFF          /dev/rhdisk4
    1   11  4096  4096K   179200 DG_XFF_0011     DG_XFF          /dev/rhdisk40
    1   12  4096  4096K   179200 DG_XFF_0012     DG_XFF          /dev/rhdisk41
    1   13  4096  4096K   179200 DG_XFF_0013     DG_XFF          /dev/rhdisk45
    1   14  4096  4096K   179200 DG_XFF_0014     DG_XFF          /dev/rhdisk46
    1   15  4096  4096K   179200 DG_XFF_0015     DG_XFF          /dev/rhdisk5
    1   16  4096  4096K   179200 DG_XFF_0016     DG_XFF          /dev/rhdisk52
    1   17  4096  4096K   179200 DG_XFF_0017     DG_XFF          /dev/rhdisk53
    1   18  4096  4096K   179200 DG_XFF_0018     DG_XFF          /dev/rhdisk57
    1   19  4096  4096K   179200 DG_XFF_0019     DG_XFF          /dev/rhdisk58

这次运气比较好,丢失的磁盘组只是一个业务磁盘组,而且里面只有19个表空间,10个分区表,因此在数据字典完成的情况下,恢复10个分区表(一共6443个分区)的数据,整体恢复效果如下:
RECOVER


从整体数据量看恢复比例为:6003.26953/6027.26935*100%=99.6018127%,对于丢失了一个已经rebalance的大部分的lun,依旧能够恢复如此的数据,整体看非常理想.

如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持

原文:ORA-15042: ASM disk “N” is missing from group number “M” 故障恢复

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值