____________________________________________________________________________________________
某大型数据库系统,在删除磁盘组之后,使用srvctl启动数据库”srvctl start instance –d gzdw –i gzdw1”无法启动,报如下错误:
PRCR-1013 : Failed to start resource ora.gzdw.db PRCR-1064 : Failed to start resource ora.gzdw.db on node jczc-1 CRS-5017: The resource action "ora.GDATA.dg start" encountered the following error: ORA-15032: not all alterations performed ORA-15017: diskgroup "GDATA" cannot be mounted ORA-15040: diskgroup is incomplete . For details refer to "(:CLSN00107:)" in "/oracle/app/11.2.0/grid/log/jczc-1/agent/crsd/oraagent_gzgdb//oraagent_gzgdb.log". CRS-2674: Start of 'ora.GDATA.dg' on 'jczc-1' failed |
我尝试用sqlplus启动,实例是可以启动,但资源仍是异常状态,不仅diskgroup资源,db的状态也不正确。这个问题必须弄清楚。
SOLUTION
_____________________________________________________________________________________________
最后定位的问题是Bug 19016181 - Drop diskgroup does not delete CRS resource and database dependency profile (文档 ID
19016181.8)导致,删除磁盘组不会自动删除CRS资源和数据库依赖profile.目前没有补丁,只能通过手工删除,见文尾。
如下是分析过程:
1.看到diskgroup报错,先检查asm日志吧,如下
Mon Jul 13 17:04:13 2015 SQL> ALTER DISKGROUP GDATA MOUNT /* asm agent *//* {1:8011:1192} */ NOTE: cache registered group GDATA number=5 incarn=0xe33c47b2 NOTE: cache began mount (first) of group GDATA number=5 incarn=0xe33c47b2 Mon Jul 13 17:04:13 2015 ERROR: no read quorum in group: required 2, found 0 disks<<< NOTE: cache dismounting (clean) group 5/0xE33C47B2 (GDATA) NOTE: messaging CKPT to quiesce pins Unix process pid: 50830, image: oracle@jczc-1 (TNS V1-V3) NOTE: dbwr not being msg'd to dismount NOTE: lgwr not being msg'd to dismount NOTE: cache dismounted group 5/0xE33C47B2 (GDATA) NOTE: cache ending mount (fail) of group GDATA number=5 incarn=0xe33c47b2 NOTE: cache deleting context for group GDATA 5/0xe33c47b2 GMON dismounting group 5 at 80 for pid 30, osid 50830 ERROR: diskgroup GDATA was not mounted ORA-15032: not all alterations performed ORA-15017: diskgroup "GDATA" cannot be mounted ORA-15040: diskgroup is incomplete ERROR: ALTER DISKGROUP GDATA MOUNT /* asm agent *//* {1:8011:1192} */ |
Asm 在mounting GDATA时found 0 disks;会不会权限有问题呢?
2.检查存储磁盘权限
lrwxrwxrwx 1 gzgdb oinstall 8 Jul 8 17:04 /dev/mapper/data_disk001 -> ../dm-11 lrwxrwxrwx 1 gzgdb oinstall 7 Jul 8 17:04 /dev/mapper/data_disk002 -> ../dm-9 lrwxrwxrwx 1 gzgdb oinstall 8 Jul 8 17:04 /dev/mapper/data_disk003 -> ../dm-14 lrwxrwxrwx 1 gzgdb oinstall 8 Jul 8 17:04 /dev/mapper/data_disk004 -> ../dm-10 lrwxrwxrwx 1 gzgdb oinstall 8 Jul 8 17:04 /dev/mapper/data_disk005 -> ../dm-12 lrwxrwxrwx 1 gzgdb oinstall 7 Jul 8 17:04 /dev/mapper/data_disk006 -> ../dm-8 lrwxrwxrwx 1 gzgdb oinstall 8 Jul 8 17:04 /dev/mapper/data_disk007 -> ../dm-13 lrwxrwxrwx 1 gzgdb oinstall 7 Jul 8 17:04 /dev/mapper/data_disk008 -> ../dm-7 lrwxrwxrwx 1 gzgdb oinstall 7 Jul 8 17:04 /dev/mapper/ocrvt_disk001 -> ../dm-6 lrwxrwxrwx 1 gzgdb oinstall 7 Jul 8 17:04 /dev/mapper/ocrvt_disk002 -> ../dm-4 |
权限全应是gzgdb:asmadmin,这里却是gzgdb:oinstall,虽然有出入也不会找不到磁盘,其它磁盘组是可以mount的。所以排除存储权限导致。
3.看看磁盘组GDATA资源的状态
NAME TARGET STATE SERVER ------------------------------------------------------------------------------- ora.GDATA.dg ONLINE OFFLINE jczc-1
|
这里状态一个ONLINE,一个OFFINE,看来这个DG确实是有问题。
4.查看下数据库实例里的DG 状态,看是否正常
SQL> select GROUP_NUMBER,NAME,STATE from v$asm_diskgroup;
GROUP_NUMBER NAME STATE ------------ ------------------------------ ----------- 1 GDATA1 MOUNTED 2 GDATA2 MOUNTED 3 OCR_VOTE MOUNTED 4 ODSDATA MOUNTED |
居然没有GDATA这个磁盘组,原来这个磁盘组已经被删掉了,那为什么CRS里怎么还有这资源呢?难道没有更新?
5.尝试添加磁盘组,验证是否是资源没有更新
SQL> select GROUP_NUMBER,DISK_NUMBER,PATH,OS_MB/1024 OS_GB,HEADER_STATUS from v$asm_disk where HEADER_STATUS <> 'MEMBER';
GROUP_NUMBER DISK_NUMBER PATH OS_GB HEADER_STATU ------------------------ -------------------- ----------------------------------- ------------ ---------------------- 0 0 /dev/mapper/app4data 2048 CANDIDATE 0 1 /dev/mapper/app3data 2048 CANDIDATE 0 2 /dev/mapper/crsdata 8 CANDIDATE 0 11 /dev/mapper/ocrvt_disk002 5 FORMER 0 4 /dev/mapper/app1data 2048 CANDIDATE 0 5 /dev/mapper/app2data 2048 CANDIDATE 0 3 /dev/mapper/dbdata 60 CANDIDATE
SQL>create diskgroup GDATA external redundancy disk '/dev/mapper/ocrvt_disk002'; Diskgroup created. |
再尝试启动,可成功启动。
那为什么删除磁盘组,CRS资源不更新呢?
6.尝试手动删除下资源,模拟一下刚刚的情况
#先干掉磁盘组 SQL> drop diskgroup GDATA; Diskgroup dropped. srvctl stop diskgroup -g GDATA srvctl disable diskgroup -g GDATA [gzgdb@jczc-1 ~]$ srvctl remove diskgroup -g GDATA<<<< 删除资源,报错了 PRCA-1002 : Failed to remove CRS resource ora.GDATA.dg for ASM Disk Group GDATA PRCR-1028 : Failed to remove resource ora.GDATA.dg PRCR-1072 : Failed to unregister resource ora.GDATA.dg CRS-0222: Resource 'ora.GDATA.dg' has dependency error. |
无法remove 'ora.GDATA.dg'提示还存在依赖关系
7.查看与DB存在被依赖关系
[gzgdb@jczc-1 ~]$ crsctl status resource ora.gzdw.db -dependency ================================================================================ Resource Start Dependencies ================================================================================ ----------------------------------ora.gzdw.db----------------------------------- ora.gzdw.db(ora.database.type)-> | ora.GDATA.dg(ora.diskgroup.type)[hard,pullup] | | ora.asm(ora.asm.type)[hard,pullup] | | | ora.LISTENER.lsnr(ora.listener.type)[weak] | | | | type:ora.cluster_vip_net1.type[hard:type,pullup:type] | | | | | ora.net1.network(ora.network.type)[hard,pullup] | ora.ODSDATA.dg(ora.diskgroup.type)[hard,pullup] | | ora.asm(ora.asm.type)[hard,pullup] | | | ora.LISTENER.lsnr(ora.listener.type)[weak] | | | | type:ora.cluster_vip_net1.type[hard:type,pullup:type] | | | | | ora.net1.network(ora.network.type)[hard,pullup] | ora.GDATA1.dg(ora.diskgroup.type)[hard,pullup] | | ora.asm(ora.asm.type)[hard,pullup] | | | ora.LISTENER.lsnr(ora.listener.type)[weak] | | | | type:ora.cluster_vip_net1.type[hard:type,pullup:type] | | | | | ora.net1.network(ora.network.type)[hard,pullup] | ora.GDATA2.dg(ora.diskgroup.type)[hard,pullup] | | ora.asm(ora.asm.type)[hard,pullup] | | | ora.LISTENER.lsnr(ora.listener.type)[weak] | | | | type:ora.cluster_vip_net1.type[hard:type,pullup:type] | | | | | ora.net1.network(ora.network.type)[hard,pullup] | type:ora.listener.type[weak:type] | | type:ora.cluster_vip_net1.type[hard:type,pullup:type] | | | ora.net1.network(ora.network.type)[hard,pullup] | type:ora.scan_listener.type[weak:type:global] | | ora.scan1.vip(ora.scan_vip.type)[hard,pullup] | | | ora.net1.network(ora.network.type)[hard,pullup:global] | | | type:ora.scan_vip.type[dispersion:type:active] | | type:ora.scan_listener.type[dispersion:type:active] | ora.ons(ora.ons.type)[weak:uniform] | | ora.net1.network(ora.network.type)[hard,pullup] | ora.gns[weak:global]
[gzgdb@jczc-1 ~]$ crsctl status resource ora.gzdw.db -p|grep DEPENDEN START_DEPENDENCIES=hard(ora.GDATA.dg,ora.ODSDATA.dg,ora.GDATA1.dg,ora.GDATA2.dg) weak(type:ora.listener.type,global:type:ora.scan_listener.type,uniform:ora.ons,global:ora.gns) pullup(ora.GDATA.dg,ora.ODSDATA.dg,ora.GDATA1.dg,ora.GDATA2.dg) STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.ODSDATA.dg,shutdown:ora.GDATA1.dg,shutdown:ora.GDATA2.dg) |
发现DB的确依赖着已经删除的diskgroup资源,START_DEPENDENCIES与GDATA错误资源存在依赖,STOP_DEPENDENCIES不存在,所以
我们只需要修改START_DEPENDENCIES就行了
8. 如何解除依赖?可通过下列方法修改属性
[gzgdb@jczc-1 ~]$ crsctl getperm resource ora.gzdw.db Name: ora.gzdw.db owner:gzodb:rwx,pgrp:oinstall:r--,other::r--,group:dba:r-x,group:oinstall:r-x,user:gzgdb:r-x [gzgdb@jczc-1 ~]$ su – gzodb <<<才有权限修改该资源,所以切过去,若都没有权限使用setperm参数修改 jczc-1[/home/gzodb]$crsctl modify res ora.gzdw.db -attr "START_DEPENDENCIES=hard(ora.ODSDATA.dg,ora.GDATA1.dg,ora.GDATA2.dg)" CRS-4228: Value of attribute 'ora.GDATA1.dg' is missing
CRS-4000: Command Modify failed, or completed with errors. 为何无法修改,详见"crsctl modify resource" fails with "CRS-4228: Value of attribute ')' is missing" (文档 ID 958455.1) " jczc-1[/home/gzodb]$crsctl modify res ora.gzdw.db -attr "START_DEPENDENCIES='hard(ora.ODSDATA.dg,ora.GDATA1.dg,ora.GDATA2.dg) weak(type:ora.listener.type,global:type:ora.scan_listener.type,uniform:ora.ons,global:ora.gns) pullup(ora.ODSDATA.dg,ora.GDATA1.dg,ora.GDATA2.dg)'"
|
如上所述,这里没有修改STOP_DEPENDENCIES属性,是因为STOP_DEPENDENCIES里没有依赖。若有也是需要修改的。按情况而定
9.删除资源,重启实例
srvctl remove diskgroup -g GDATA srvctl start instance -d gzdw -i gzdw1 |
皆已正常
10.该问题已经SR确认是由于BUG19016181导致
Bug 19016181 - Drop diskgroup does not delete CRS resource and database dependency profile (文档 ID 19016181.8)
影响版本11.2.0.4,目前没有补丁,12.2已修复该问题,所以得把操作理一下。
11.删除磁盘组操作文档
drop diskgroup GDATA; srvctl stop diskgroup -g GDATA crsctl status resource ora.gzdw.db –dependency crsctl status resource ora.gzdw.db -p|grep DEPENDEN crsctl getperm resource ora.gzdw.db crsctl modify res ora.gzdw.db -attr "START_DEPENDENCIES='hard(ora.ODSDATA.dg,ora.GDATA1.dg,ora.GDATA2.dg) weak(type:ora.listener.type,global:type:ora.scan_listener.type,uniform:ora.ons,global:ora.gns) pullup(ora.ODSDATA.dg,ora.GDATA1.dg,ora.GDATA2.dg)'" crsctl modify res ora.gzdw.db -attr STOP_DEPENDENCIES srvctl remove diskgroup –g GDATA srvctl start instance –d gzdw –i gzdw1 |
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/30264304/viewspace-1732408/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/30264304/viewspace-1732408/