观察日志
linux /var/log/message
rac
su - grid
ORACLE_HOME/log/$hostname/
集群日志:alertrac2.log
cssd
ohasd
集群起不来
[root@rac1 ~]# crsctl stat res -t -init
ora.asm
1 ONLINE OFFLINE Instance Shutdown
ora.cluster_interconnect.haip
1 ONLINE OFFLINE
ora.crf
1 ONLINE ONLINE rac1
ora.crsd
1 OFFLINE OFFLINE
ora.cssd
1 ONLINE OFFLINE STARTING
ora.cssdmonitor
1 ONLINE ONLINE rac1
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 OFFLINE OFFLINE
ora.gipcd
1 ONLINE ONLINE rac1
ora.gpnpd
1 ONLINE ONLINE rac1
ora.mdnsd
1 ONLINE ONLINE rac1
[root@rac2 ~]# ps -ef|grep d.bin
root 10009 1 5 05:11 ? 00:00:01 /u01/11.2.0/grid/bin/ohasd.bin reboot
grid 10122 1 1 05:11 ? 00:00:00 /u01/11.2.0/grid/bin/oraagent.bin
grid 10134 1 0 05:11 ? 00:00:00 /u01/11.2.0/grid/bin/mdnsd.bin
grid 10146 1 1 05:11 ? 00:00:00 /u01/11.2.0/grid/bin/gpnpd.bin
grid 10159 1 0 05:11 ? 00:00:00 /u01/11.2.0/grid/bin/gipcd.bin
root 10162 1 1 05:11 ? 00:00:00 /u01/11.2.0/grid/bin/orarootagent.bin
root 10176 1 1 05:11 ? 00:00:00 /u01/11.2.0/grid/bin/osysmond.bin
root 10187 1 1 05:11 ? 00:00:00 /u01/11.2.0/grid/bin/cssdmonitor
root 10206 1 1 05:11 ? 00:00:00 /u01/11.2.0/grid/bin/cssdagent
grid 10220 1 2 05:11 ? 00:00:00 /u01/11.2.0/grid/bin/ocssd.bin
卡住不动
查看ocssd.log
2020-08-28 14:54:50.687: [ SKGFD][3340285696]UFS discovery with :/dev/raw/*:
2020-08-28 14:54:50.687: [ SKGFD][3340285696]Fetching UFS disk :/dev/raw/rawctl:
2020-08-28 14:54:50.687: [ SKGFD][3340285696]Fetching UFS disk :/dev/raw/raw1:
2020-08-28 14:54:50.687: [ SKGFD][3340285696]Fetching UFS disk :/dev/raw/raw2:
2020-08-28 14:54:50.687: [ SKGFD][3340285696]Fetching UFS disk :/dev/raw/raw3:
2020-08-28 14:54:50.687: [ SKGFD][3340285696]OSS discovery with :/dev/raw/*:
2020-08-28 14:54:50.687: [ CSSD][3340285696]clssnmvDiskVerify: Successful discovery of 0 disks
[cssd(5162)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /g01/11.2.0/grid/log/vrh1/cssd/ocssd.log
2012-08-09 03:35:41.207
[cssd(5162)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /g01/11.2.0/grid/log/vrh1/cssd/ocssd.log
2012-08-09 03:35:56.240
[cssd(5162)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /g01/11.2.0/grid/log/vrh1/cssd/ocssd.log
2012-08-09 03:36:11.284
[cssd(5162)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /g01/11.2.0/grid/log/vrh1/cssd/ocssd.log
2012-08-09 03:36:26.305
[cssd(5162)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /g01/11.2.0/grid/log/vrh1/cssd/ocssd.log
2012-08-09 03:36:41.328
利用dd命令清空ocr和votedisk所在diskgroup header,模拟diskgroup corruption:
- 检查votedisk和 ocr备份
[root@vrh1 ~]# crsctl query css votedisk
[root@rac1 ~]# ocrconfig -showbackup
PROT-26: Oracle Cluster Registry backup locations were retrieved from a local copy
rac2 2020/08/14 10:17:53 /u01/11.2.0/grid/cdata/rac-cluster/backup00.ocr
rac2 2020/08/14 06:17:52 /u01/11.2.0/grid/cdata/rac-cluster/backup01.ocr
rac1 2020/08/14 01:26:34 /u01/11.2.0/grid/cdata/rac-cluster/backup02.ocr
rac1 2020/08/13 07:11:06 /u01/11.2.0/grid/cdata/rac-cluster/day.ocr
rac1 2020/08/10 10:35:10 /u01/11.2.0/grid/cdata/rac-cluster/week.ocr
rac2 2020/08/14 04:15:25 /u01/11.2.0/grid/cdata/rac-cluster/backup_20200814_041525.ocr
rac1 2019/09/08 08:33:56 /u01/11.2.0/grid/cdata/rac-cluster/backup_20190908_083356.ocr
rac1 2019/09/08 08:32:57 /u01/11.2.0/grid/cdata/rac-cluster/backup_20190908_083257.ocr
- 彻底关闭所有节点上的clusterware ,OHASD
crsctl stop has -f
- GetAsmDH.sh ==> GetAsmDH.sh是ASM disk header的备份脚本 --我的机子上没这个
请养成良好的习惯,做危险操作前备份asm header
[grid@vrh1 ~]$ ./GetAsmDH.sh
- 使用dd 命令 破坏ocr和votedisk所在diskgroup
dd if=/dev/zero of=/dev/sdb1 bs=1024k count=1
partprobe /dev/sdb
- 以-excl -nocrs 方式启动cluster,这将可以启动ASM实例 但不启动CRS
[root@vrh1 vrh1]# crsctl start crs -excl -nocrs
6.重建原ocr和votedisk所在diskgroup,注意compatible.asm必须是11.2:
create diskgroup OCR_VOTE external redundancy disk ‘/dev/raw/raw1’ ATTRIBUTE ‘compatible.asm’=‘11.2.0.0.0’;
注:磁盘组 ocr_vote 命名为原磁盘名称,否则会报错
正确的顺序应该是先restore ocr,然后再replace votedisk。
[root@vrh1 ~]# ocrconfig -restore /g01/11.2.0/grid/cdata/vrh-cluster/backup00.ocr
[root@vrh1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3180
Available space (kbytes) : 258940
ID : 1238458014
Device/File Name : +OCR_VOTE
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
注意:下面是替换votedisk,而非ocr
- 准备恢复votedisk ,可能会遇到下面的错误:
[grid@vrh1 ~]$ crsctl replace votedisk +OCR_VOTE
CRS-4602: Failed 27 to add voting file 2e4e0fe285924f86bf5473d00dcc0388.
CRS-4602: Failed 27 to add voting file 4fa54bb0cc5c4fafbf1a9be5479bf389.
CRS-4602: Failed 27 to add voting file a109ead9ea4e4f28bfe233188623616a.
CRS-4602: Failed 27 to add voting file 042c9fbd71b54f5abfcd3ab3408f3cf3.
CRS-4602: Failed 27 to add voting file 7b5a8cd24f954fafbf835ad78615763f.
Failed to replace voting disk group with +OCR_VOTE
CRS-4000: Command Replace failed, or completed with errors.
需要重新配置一下ASM的参数,并重启ASM:
SQL> alter system set asm_diskstring=’/dev/asm*’;
System altered.
SQL> create spfile from memory;
报错
执行以下的步骤手动创建ASM SPFILE文件:
1).创建ASM PFILE文件:
[root@rac2 ~]# vi /tmp/asm_pfile.txt
加入如下参数:
*.asm_power_limit=1
*.diagnostic_dest=’/u01/app/grid/11.2.0/log’
*.instance_type=‘asm’
*.large_pool_size=12M
*.remote_login_passwordfile=‘EXCLUSIVE’
2).创建SPFILE文件:
[root@rac2 ~]# su - grid
[grid@rac2 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Fri Nov 9 00:03:22 2012
Copyright © 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> create spfile=’+ocr_vote’ from pfile=’/tmp/asm_pfile.txt’;
File created.
SQL> show parameter spfile
NAME TYPE VALUE
spfile string
重启执行 2、5步
[root@rac2 bin]# su - grid
[grid@rac2 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.3.0 Production on Fri Nov 9 00:32:25 2012
Copyright © 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> show parameter spfile
NAME TYPE VALUE
spfile string +OCR/wstcluster/asmparameterfile/registry.253.798854653
SPFILE被成功配置。
SQL> alter system set asm_diskstring=’/dev/asm*’;
System altered.
SQL> create spfile from memory;
File created.
SQL> startup force mount;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile
NAME TYPE VALUE
spfile string /g01/11.2.0/grid/dbs/spfile+AS
M1.ora
[grid@vrh1 trace]$ crsctl replace votedisk +OCR_VOTE
[root@rac1 ~]# cat /etc/oracle/ocr.loc
#Device/file +DATA getting replaced by device +ocr_vote
ocrconfig_loc=+ocr 注意此处ocr名字
local_only=false
8、重启集群
root
crsctl stop crs -f
crsctl start crs
查看集群日志
disk.EnableUUID = “true”