一次存储维修后,一套rac的1节点的集群资源起不来
crs日志
2018-07-12 21:20:40.484 [CRSD(23288)]CRS-0804: Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage Storage layer error [Insufficient quorum to open OCR devices] [0]]. .
2018-07-12 21:20:40.531 [CRSD(23304)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 23304
集群状态
[root@ybnode01 ~]# /grid/app/12.1.0/grid/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE ybnode01 Started,STABLE
ora.cluster_interconnect.haip
1 ONLINE ONLINE ybnode01 STABLE
ora.crf
1 ONLINE ONLINE ybnode01 STABLE
ora.crsd
1 ONLINE OFFLINE STABLE
ora.cssd
1 ONLINE ONLINE ybnode01 STABLE
ora.cssdmonitor
1 ONLINE ONLINE ybnode01 STABLE
ora.ctssd
1 ONLINE ONLINE ybnode01 OBSERVER,STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.drivers.acfs
1 ONLINE ONLINE ybnode01 STABLE
ora.evmd
1 ONLINE INTERMEDIATE ybnode01 STABLE
ora.gipcd
1 ONLINE ONLINE ybnode01 STABLE
ora.gpnpd
1 ONLINE ONLINE ybnode01 STABLE
ora.mdnsd
1 ONLINE ONLINE ybnode01 STABLE
ora.storage
1 ONLINE ONLINE ybnode01 STABLE
crs处于offline状态,crs报错说存储读取有问题
查看下asm日志
hu Jul 12 21:20:45 2018
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_23400.trc:
ORA-15081: failed to submit an I/O operation to a disk
Thu Jul 12 21:20:46 2018
NOTE: [crsd.bin@ybnode01 (TNS V1-V3) 23402] opening OCR file +DG_OCR.255.4294967295
Thu Jul 12 21:20:46 2018
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_23420.trc:
ORA-15025: could not open disk "/dev/asm/hdisk005"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Thu Jul 12 21:20:46 2018
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_23420.trc:
ORA-15081: failed to submit an I/O operation to a disk
Thu Jul 12 21:20:47 2018
NOTE: [crsd.bin@ybnode01 (TNS V1-V3) 23422] opening OCR file +DG_OCR.255.4294967295
Thu Jul 12 21:20:47 2018
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_23429.trc:
ORA-15025: could not open disk "/dev/asm/hdisk005"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Thu Jul 12 21:20:47 2018
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_23429.trc:
ORA-15081: failed to submit an I/O operation to a disk
failed to submit an I/O operation to a disk
Thu Jul 12 21:20:46 2018
NOTE: [crsd.bin@ybnode01 (TNS V1-V3) 23402] opening OCR file +DG_OCR.255.4294967295
Thu Jul 12 21:20:46 2018
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_23420.trc:
ORA-15025: could not open disk "/dev/asm/hdisk005"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Thu Jul 12 21:20:46 2018
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_23420.trc:
ORA-15081: failed to submit an I/O operation to a disk
Thu Jul 12 21:20:47 2018
NOTE: [crsd.bin@ybnode01 (TNS V1-V3) 23422] opening OCR file +DG_OCR.255.4294967295
Thu Jul 12 21:20:47 2018
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_23429.trc:
ORA-15025: could not open disk "/dev/asm/hdisk005"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3
Thu Jul 12 21:20:47 2018
Errors in file /grid/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_23429.trc:
ORA-15081: failed to submit an I/O operation to a disk
asmcmd查看asm挂载情况
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 5242880 2549402 0 2549402 0 N DG_DATA/
MOUNTED EXTERN N 512 4096 1048576 1048576 1048325 0 1048325 0 N DG_FRA/
MOUNTED NORMAL N 512 4096 1048576 10240 9894 0 4947 1 N DG_OCR/
MOUNTED NORMAL N 512 4096 8388608 30720 20664 10240 5212 0 Y DG_VOTE/
对比过其他正常节点的状态,是一致的
查看磁盘的权限
lrwxrwxrwx 1 root root 8 Jul 12 21:55 /dev/asm/hdisk006 -> ../dm-13
lrwxrwxrwx 1 root root 7 Jul 12 21:55 /dev/asm/hdisk005 -> ../dm-6
brw-rw---- 1 grid asmadmin 253, 13 Jul 12 21:55 /dev/dm-13
brw-rw---- 1 grid asmadmin 253, 6 Jul 12 21:55 /dev/dm-6
磁盘权限grid asmadmin,与其他节点权限一致,且multipath正常
用kfed读/dev/asm/hdisk005的磁盘头,盘头信息正常。
分析到这里就比较神奇了,asm报错但是asm磁盘组挂载正常,crs又没有起来,磁盘权限和读取都是正常,为什么还会报IO的错误?
最后查看了臭名昭著的文件$ORACLE_HOME/bin/oracle,果然权限不正常,更改起权限为oracle:asmadmin 6751后即可正常启动集群。
以后遇到集群起不来的问题,$ORACLE_HOME/bin/oracle权限是必须要检查的。
[oracle@ybnode01 ~]$ ls -lrt $ORACLE_HOME/bin/oracle
-rwsr-s--x 1 oracle asmadmin 324277793 Jun 14 04:56 /oracle/app/oracle/12.1.0/bin/oracle