方案简述:
客户公司一台测试库已近半年没有启动了,听说当时因为数据宕掉以后,就没有用了,最近客户公司分析员需要里面的部分数据,看能否启动起来。 说是这套一直没动什么配置,只是在原来基础上创建了单实例。
1, 通过公司协议授权及等公司层面以后,我这边开始针对这个问题case 处理!
2 通过vpn 连接以后,通过几个系统命令查看两台实例可否检索隐射的 multipath, 通过fdisk , 没有显示,查看系统日志,发现以前有个iscsi 协议,最后通过重新配置iscsi 协议,修改multipath 参数,发现两个node 节点发现 磁盘。
3 因为客户是通过asmlib做的触动,通过 /etc/init.d/oraclescan diskgroup 发现没有,最后 通过oracleasm 配置,检测,发现磁盘组 。
4 通过crs_stat -t (10g) crsctl status database -t 发现产生报错 ocr - 报错(具体没记录) 设计所有的服务没有起来, 通过查看css,crs 日志发现: 仲裁磁盘有问题。
5, 查看votedisk 备份,想通过还原,但是失败了! 最后通过重新格式化 votedisk --详见笔记《rac 仲裁磁盘管理》
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE add441701e954fd3bffb1689c36bac90 (ORCL:VODISK1) [VODISK]
2. ONLINE 6e7c237aca724fe3bfec34182251157a (ORCL:VODISK3) [VODISK]
3. ONLINE 1d1179441de14fd4bfaddc9dbd43915b (ORCL:VODISK2) [VODISK]
[grid@lmocm190 ~]$ ocrconfig -showbackup
lmocm189 2014/03/11 10:43:48 /u01/app/11.2/grid/cdata/dominic/backup00.ocr
lmocm189 2014/03/11 06:43:47 /u01/app/11.2/grid/cdata/dominic/backup01.ocr
lmocm189 2014/03/11 02:43:46 /u01/app/11.2/grid/cdata/dominic/backup02.ocr
lmocm189 2014/03/10 02:43:41 /u01/app/11.2/grid/cdata/dominic/day.ocr
lmocm189 2014/03/08 18:43:15 /u01/app/11.2/grid/cdata/dominic/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
[grid@lmocm190 ~]$ ocrconfig -showbackup manual
PROT-25: Manual backups for the Oracle Cluster Registry are not available
[grid@lmocm190 ~]$ ocrcheck -local
Status of Oracle Local Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2672
Available space (kbytes) : 259448
ID : 636928871
Device/File Name : /u01/app/11.2/grid/cdata/lmocm190.olr
Device/File integrity check succeeded
Local registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
[grid@lmocm190 ~]$ srvctl status diskgroup -g VOTEDISK
PRCA-1000 : ASM Disk Group VOTEDISK does not exist
PRCR-1001 : Resource ora.VOTEDISK.dg does not exist
[grid@lmocm190 ~]$ srvctl status diskgroup -g VODISK1
PRCA-1000 : ASM Disk Group VODISK1 does not exist
PRCR-1001 : Resource ora.VODISK1.dg does not exist
[grid@lmocm190 ~]$ srvctl status diskgroup -g VODISK
Disk Group VODISK is running on lmocm190,lmocm189
[grid@lmocm189 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....ELOG.dg ora....up.type ONLINE ONLINE lmocm189
ora....FILE.dg ora....up.type ONLINE ONLINE lmocm189
ora....ER.lsnr ora....er.type ONLINE ONLINE lmocm189
ora....N1.lsnr ora....er.type ONLINE ONLINE lmocm189
ora.LOGFILE.dg ora....up.type ONLINE ONLINE lmocm189
ora.VODISK.dg ora....up.type ONLINE ONLINE lmocm189
ora.asm ora.asm.type ONLINE ONLINE lmocm189
ora.cvu ora.cvu.type ONLINE ONLINE lmocm189
ora.dominic.db ora....se.type ONLINE ONLINE lmocm189
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....SM1.asm application ONLINE ONLINE lmocm189
ora....89.lsnr application ONLINE ONLINE lmocm189
ora....189.gsd application OFFLINE OFFLINE
ora....189.ons application ONLINE ONLINE lmocm189
ora....189.vip ora....t1.type ONLINE ONLINE lmocm189
ora....SM2.asm application ONLINE ONLINE lmocm190
ora....90.lsnr application ONLINE ONLINE lmocm190
ora....190.gsd application OFFLINE OFFLINE
ora....190.ons application ONLINE ONLINE lmocm190
ora....190.vip ora....t1.type ONLINE ONLINE lmocm190
ora....network ora....rk.type ONLINE ONLINE lmocm189
ora.oc4j ora.oc4j.type ONLINE ONLINE lmocm189
ora.ons ora.ons.type ONLINE ONLINE lmocm189
ora....ry.acfs ora....fs.type ONLINE ONLINE lmocm189
ora.scan1.vip ora....ip.type ONLINE ONLINE lmocm189
[root@lmocm189 bin]# ./srvctl start instance -d dominic -i dominic1,dominic2
PRCR-1013 : Failed to start resource ora.dominic.db
PRCR-1064 : Failed to start resource ora.dominic.db on node lmocm190
CRS-5017: The resource action "ora.dominic.db start" encountered the following error:
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '+DATAFILE/dominic/spfiledominic.ora'
ORA-17503: ksfdopn:2 Failed to open file +DATAFILE/dominic/spfiledominic.ora
ORA-12547: TNS:lost contact
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2/grid/log/lmocm190/agent/crsd/oraagent_oracle//oraagent_oracle.log".
CRS-2674: Start of 'ora.dominic.db' on 'lmocm190' failed
PRCR-1064 : Failed to start resource ora.dominic.db on node lmocm189
CRS-5017: The resource action "ora.dominic.db start" encountered the following error:
ORA-00205: error in identifying control file, check alert log for more info
. For details refer to "(:CLSN00107:)" in "/u01/app/11.2/grid/log/lmocm189/agent/crsd/oraagent_oracle//oraagent_oracle.log".
CRS-2674: Start of 'ora.dominic.db' on 'lmocm189' failed
[grid@lmocm189 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[grid@lmocm189 ~]$ crsctl check css
CRS-4529: Cluster Synchronization Services is online
SQL> select * from v$active_instances;
INST_NUMBER INST_NAME
----------- ------------------------------------------------------------
1 lmocm189:dominic1
2 lmocm190:dominic2
SQL> select inst_id,instance_number,instance_name,host_name,status from gv$instance;
INST_ID INSTANCE_NUMBER INSTANCE_NAME
---------- --------------- ----------------
HOST_NAME STATUS
---------------------------------------------------------------- ------------
2 2 dominic2
lmocm190 STARTED
1 1 dominic1
lmocm189
8 通过这个问题,现在问题明朗了,应该控制文件出现了错误,现在有两个办法:
1,重建控制文件,但是比较麻烦,因为库没有起来,所有的日志文件,dbfile 文件 路劲不知,需要通过grid asmcmd 去查找。
2 ,看时候rman 配置那里有 设置,每次controlfile 根据文件,表空间的变更时候自动备份。
9 最后通过rman 登陆,show all 命令查看,发现 有设置, 最后在 $ORACLE_HOME/dbs 下发现有关于一个控制文件的 镜像, 因为归档日志都存在, 最后通过restore 恢复控制文件,问题解决。
后记: 这样的问题 日常工作应用中很常见,但是需要在问题处理前有个整体的思路,尽量先多查询一下信息(日志文件,状态值,配置参数等)再就是 对于dba 来说,一份最新的备份很重要。