ASM ocr磁盘组重建测试
RAC环境中用来存放ocr文件、以及voting file的磁盘组ocr_vot,做了normal冗余,有3个失败组,现在人为用dd命令破坏asm盘头部,测试RAC集群能否再起来,ocr_vot能否再重建。
1、查看当前ocr、voting所在磁盘组
3个失败组,每个失败组1个磁盘,大小4g
SQL> select name,state,total_mb/1024 total_gb,free_mb,required_mirror_free_mb,usable_file_mb from v$asm_diskgroup where name like 'OCR%';
NAME STATE TOTAL_GB FREE_MB REQUIRED_MIRROR_FREE_MB USABLE_FILE_MB
---------- ----------- ---------- ---------- ----------------------- --------------
OCR_VOT MOUNTED 12 11248 4096 3576
SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE,NAME,FAILGROUP,PATH,FAILGROUP_TYPE from v$asm_disk where name like 'OCR%' order by path;
GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE NAME FAILGROUP PATH FAILGRO
------------ ----------- ------- ------------ ------- -------- ------------ ------------ -------------------- -------
3 2 CACHED MEMBER ONLINE NORMAL OCR_VOT_0002 OCR_FG1 /dev/asm_4g_1 REGULAR
3 0 CACHED MEMBER ONLINE NORMAL OCR_VOT_0000 OCR_FG2 /dev/asm_4g_2 REGULAR
3 1 CACHED MEMBER ONLINE NORMAL OCR_VOT_0001 OCR_FG3 /dev/asm_4g_3 REGULAR
2、第一次dd其中OCR一个失败组磁盘
dd if=/dev/zero of=/dev/asm_4g_1 bs=1024 count=1
SQL> select GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE,NAME,FAILGROUP,PATH,FAILGROUP_TYPE from v$asm_disk where name like 'OCR%' order by path;
GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE NAME FAILGROUP PATH FAILGRO
------------ ----------- ------- ------------ ------- -------- ------------ ------------ -------------------- -------
3 2 CACHED CANDIDATE ONLINE NORMAL OCR_VOT_0002 OCR_FG1 /dev/asm_4g_1 REGULAR
3 0 CACHED MEMBER ONLINE NORMAL OCR_VOT_0000 OCR_FG2 /dev/asm_4g_2 REGULAR
3 1 CACHED MEMBER ONLINE NORMAL OCR_VOT_0001 OCR_FG3 /dev/asm_4g_3 REGULAR
可以看到OCR_FG1失败组状态以及改变为CANDIDATE
3、重启rac集群
[root@rac1 ~]# crsctl stop cluster -all
[root@rac1 ~]# crsctl start cluster -all
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac2'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssdmonitor' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac2'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac2'
CRS-2676: Start of 'ora.diskmon' on 'rac2' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2676: Start of 'ora.cssd' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rac2'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac2'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'rac2'
CRS-2676: Start of 'ora.evmd' on 'rac1' succeeded
CRS-2676: Start of 'ora.evmd' on 'rac2' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac2'
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2674: Start of 'ora.asm' on 'rac2' failed
CRS-2674: Start of 'ora.asm' on 'rac1' failed
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-4705: Start of Clusterware failed on node rac1.
CRS-4705: Start of Clusterware failed on node rac2.
CRS-4000: Command Start failed, or completed with errors.
rac集群已无法启动
[root@rac1 ~]# crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE INTERMEDIATE rac1 OCR not started
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE rac1
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE rac1
ora.cssdmonitor
1 ONLINE ONLINE rac1
ora.ctssd
1 ONLINE ONLINE rac1 ACTIVE:0
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE INTERMEDIATE rac1
ora.gipcd
1 ONLINE ONLINE rac1
ora.gpnpd
1 ONLINE ONLINE rac1
ora.mdnsd
1 ONLINE ONLINE rac1
crsd组件起不来,导致asm实例也起不来
4、在两个节点关闭crs
[root@rac1 ~]# crsctl stop crs -f
[root@rac2 ~]# crsctl stop crs -f
5、尝试以nocrs启动rac
[root@rac1 ~]#crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node rac2, number 2, and is terminating
CRS-2674: Start of 'ora.cssd' on 'rac1' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'rac1'
CRS-2681: Clean of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rac1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-4000: Command Start failed, or completed with errors.
6、编辑rac安装中用于配置crs配置的文件
vi /u01/app/11.2.0/grid/crs/install/crsconfig_params
这个文件是安装grid时,root脚本调用的配置脚本,主要用于初始化asm磁盘组ocr、voting。
现在要修改下,使用一块新盘放ocr
下面两个参数,在两个节点改下
。
确认好冗余方式跟新的asm盘即可
ASM_REDUNDANCY=External
ASM_DISKS=/dev/dev/asm_4g_1
7、两个节点清除老的rac配置
节点1:
/u01/app/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force
节点2:
/u01/app/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force -lastnode
crsctl delete for vds in ocr_vot ... failed
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2'
CRS-2673: Attempting to stop 'ora.crf' on 'rac2'
CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded
CRS-2677: Stop of 'ora.crf' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'
CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2'
CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
error: package cvuqdisk is not installed
Successfully deconfigured Oracle clusterware stack on this node
8、两个节点重新执行root脚本,恢复ocr、voting
[root@rac1 ~]# /u01/app/11.2.0/grid/root.sh
Performing root user operation for Oracle 11g
The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /u01/app/11.2.0/grid
Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
Adding Clusterware entries to inittab
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
ASM created and started successfully.
Disk Group ocr_vot created successfully.
clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Successful addition of voting disk a7a2e67f1fd94f36bf36632755cc652d.
Successfully replaced voting disk group with +ocr_vot.
CRS-4266: Voting file(s) successfully replaced
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE a7a2e67f1fd94f36bf36632755cc652d (/dev/asm_4g_1) [OCR_VOT]
Located 1 voting disk(s).
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.OCR_VOT.dg' on 'rac1'
CRS-2676: Start of 'ora.OCR_VOT.dg' on 'rac1' succeeded
Preparing packages...
cvuqdisk-1.0.9-1.x86_64
Configure Oracle Grid Infrastructure for a Cluster ... succeeded
9、监听文件、实例、数据库注册到ocr
[grid@rac1:/home/grid]$srvctl add listener -l listener
[oracle@rac1:/home/oracle]$srvctl add database -d orcl -o /u01/app/oracle/product/11.2.0/db_1/ -c RAC
[oracle@rac1:/home/oracle]$srvctl add instance -d orcl -i orcl1 -n rac1
[oracle@rac1:/home/oracle]$srvctl add instance -d orcl -i orcl2 -n rac2
10、重启rac集群
[root@rac1 ~]# crsctl stop cluster -all
[root@rac1 ~]# crsctl start cluster -all
11、在两个节点asm实例挂载asm磁盘组
SQL> alter diskgroup data mount;
Diskgroup altered.
SQL> alter diskgroup arch mount;
Diskgroup altered.
之前的ocr磁盘组挂了之后,现在居然也能重新挂载之前的数据盘,
这里猜测asm实例会根据"asm_diskstring"去找对应asm盘磁盘头,
asm磁盘头包含磁盘组名称、成员、磁盘号。
这个参数估计是来源于执行root脚本时读取配置文件/u01/app/11.2.0/grid/crs/install/crsconfig_params中“ASM_DISCOVERY_STRING=/dev/asm*”
12、启动下数据库
[root@rac1 ~]# srvctl start database -d orcl
13、检查下状态
[root@rac2 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.OCR_VOT.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.TEST.dg
ONLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac2
ora.cvu
1 ONLINE ONLINE rac2
ora.oc4j
1 ONLINE ONLINE rac2
ora.orcl.db
1 ONLINE ONLINE rac1 Open
2 ONLINE ONLINE rac2 Open
ora.rac1.vip
1 ONLINE ONLINE rac1
ora.rac2.vip
1 ONLINE ONLINE rac2
ora.scan1.vip
1 ONLINE ONLINE rac2
rac的各个组件都启动完毕
总结
:ocr所在磁盘组损坏后,可以对其重建,只要其余数据文件所在磁盘组完好,数据库实例可以正常恢复。