系统环境:Centos6.8
数据库版本:11.2.0.4.0
由华为多路径替换为Centos自带的device-mapper-multipath后,RAC集群启动一直卡在CSSD服务,状态一直是starting。
由于11gR2中CRS服务依赖于ASM,因为ocr存放在ASM中,所以ASM若无法有效启动,这导致CRS服务也无法正常工作:
集群日志:
2022-03-26 12:08:02.469:
[ohasd(28938)]CRS-2112:The OLR service started on node RAC01.
2022-03-26 12:08:02.486:
[ohasd(28938)]CRS-1301:Oracle High Availability Service started on node RAC01.
2022-03-26 12:08:02.493:
[ohasd(28938)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
2022-03-26 12:08:06.212:
[/u01/app/11.2.0/grid/bin/orarootagent.bin(29065)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2022-03-26 12:08:10.405:
[ohasd(28938)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2022-03-26 12:08:10.412:
[gpnpd(29482)]CRS-2328:GPNPD started on node RAC01.
2022-03-26 12:08:12.745:
[cssd(29564)]CRS-1713:CSSD daemon is started in clustered mode
2022-03-26 12:08:14.621:
[ohasd(28938)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2022-03-26 12:08:14.621:
[ohasd(28938)]CRS-2769:Unable to failover resource 'ora.diskmon'.
2022-03-26 12:08:17.884:
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:08:32.900:
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:08:47.916:
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:09:02.932:
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:09:17.949:
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:09:32.965:
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:09:47.981:
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
2022-03-26 12:10:02.998:
[cssd(29564)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/RAC01/cssd/ocssd.log
ocssd.log
2022-03-26 12:09:47.981: [ CLSF][3717953280]checksum failed for disk:/dev/asm-datadisk01-new:
2022-03-26 12:09:47.981: [ CLSF][3717953280]Error: obj 2147483658 blk 0 name 'check_kfbh' num1 1289612970 num2 2751807285
2022-03-26 12:09:47.981: [ CLSF][3717953280]bh: ptr 0x7f5dc8138e00 size 512
2022-03-26 12:09:47.981: [ SKGFD][3717953280]bh: dump of 0x0x7f5dc8138e00, len 512
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138e00 01 82 01 01 00 00 00 00 - 0a 00 00 80 aa ee dd 4c ...............L
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138e10 a0 6f 80 15 00 00 00 00 - 00 00 00 00 00 00 00 00 .o..............
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138e20 4f 52 43 4c 44 49 53 4b - 00 00 00 00 00 00 00 00 ORCLDISK........
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138e30 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138e40 00 00 20 0b 0a 00 01 03 - 44 41 54 41 5f 30 30 31 .. .....DATA_001
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138e50 30 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 0...............
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138e60 00 00 00 00 00 00 00 00 - 44 41 54 41 00 00 00 00 ........DATA....
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138e70 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138e80 00 00 00 00 00 00 00 00 - 44 41 54 41 5f 30 30 31 ........DATA_001
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138e90 30 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 0...............
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138ea0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138eb0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138ec0 00 00 00 00 00 00 00 00 - d1 8e f9 01 00 1c 9b 10 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138ed0 d1 8e f9 01 00 28 9b 10 - 00 02 00 10 00 00 10 00 .....(..........
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138ee0 80 bc 01 00 00 00 20 00 - 14 00 00 00 01 00 00 00 ...... .........
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138ef0 02 00 00 00 12 1c 0d 00 - 0a 00 ff ff ff ff ff ff ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138f00 00 00 10 0a ab 51 f8 01 - 00 b0 35 65 00 00 00 00 .....Q....5e....
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138f10 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138f20 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138f30 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138f40 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138f50 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138f60 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138f70 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138f80 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138f90 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138fa0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138fb0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138fc0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138fd0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138fe0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]0x0x7f5dc8138ff0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 ................
2022-03-26 12:09:47.981: [ SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc81388e0 for disk :/dev/asm-datadisk01-new:
2022-03-26 12:09:47.981: [ SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc8139110 for disk :/dev/asm-datadisk02-new:
2022-03-26 12:09:47.981: [ SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc813abb0 for disk :/dev/asm-datadisk09-new:
2022-03-26 12:09:47.981: [ SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc813b660 for disk :/dev/asm-datadisk03-new:
2022-03-26 12:09:47.981: [ SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc813c280 for disk :/dev/asm-datadisk10-new:
2022-03-26 12:09:47.981: [ SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc81413c0 for disk :/dev/asm-datadisk05-new:
2022-03-26 12:09:47.981: [ SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc8141fe0 for disk :/dev/asm-datadisk04-new:
2022-03-26 12:09:47.981: [ SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc8142c00 for disk :/dev/asm-datadisk06-new:
2022-03-26 12:09:47.981: [ SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc8143820 for disk :/dev/asm-datadisk08-new:
2022-03-26 12:09:47.981: [ SKGFD][3717953280]Lib :UFS:: closing handle 0x7f5dc8144440 for disk :/dev/asm-datadisk07-new:
2022-03-26 12:09:47.981: [ CSSD][3717953280]clssnmvDiskVerify: Successful discovery of 0 disks
2022-03-26 12:09:47.981: [ CSSD][3717953280]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2022-03-26 12:09:47.981: [ CSSD][3717953280]clssnmvFindInitialConfigs: No voting files found
2022-03-26 12:09:47.982: [ CSSD][3717953280](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
2022-03-26 12:09:47.986: [ CSSD][3720496896]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(0x7f5dd406ed80) client((nil))
2022-03-26 12:09:47.986: [ CSSD][3720496896]clssgmDeadProc: proc 0x7f5dd406ed80
2022-03-26 12:09:47.986: [ CSSD][3720496896]clssgmDestroyProc: cleaning up proc(0x7f5dd406ed80) con(0x896) skgpid ospid 29509 with 0 clients, refcount 0
2022-03-26 12:09:47.986: [ CSSD][3720496896]clssgmDiscEndpcl: gipcDestroy 0x896
2022-03-26 12:09:52.573: [ CSSD][3720496896]clssscSelect: cookie accept request 0x26156b0
checksum failed for disk:/dev/asm-datadisk01-new:
Error: obj 2147483658 blk 0 name 'check_kfbh' num1 1289612970 num2 2751807285
bh: ptr 0x7f5dc8138e00 size 512
bh: dump of 0x0x7f5dc8138e00, len 512
clssnmvDiskVerify: Successful discovery of 0 disks
找不到votedisk
解决方法:
1、首先彻底关闭OHASD服务:
crsctl stop has -f
2、以-excl -nocrs方式启动CRS,这将仅启动ASM 实例而不会启动CRS服务:
crsctl start crs -excl -nocrs
3、修改ASM实例的disk_strings为当前的ASM DISK PATH信息,并重建spfile文件:
[root@RAC01 ~]# su - grid
[grid@RAC01 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Sun Jul 15 04:40:40 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> ALTER SYSTEM SET asm_diskgroups = CRS, DATA;
System altered.
SQL> alter system set asm_diskstring='/dev/asm*';
System altered.
SQL> alter diskgroup CRS mount;
Diskgroup altered.
SQL> alter diskgroup DATA mount;
Diskgroup altered.
SQL> create spfile from memory;
File created.
SQL> startup force mount;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string /g01/grid/app/11.2.0/grid/dbs/
spfile+ASM1.ora
SQL> show parameter disk
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups string CRS, DATA
asm_diskstring string /dev/asm*
SQL> create pfile from spfile;
File created.
SQL> create spfile='+CRS' from pfile;
File created.
SQL> startup force;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started
Total System Global Area 283930624 bytes
Fixed Size 2227664 bytes
Variable Size 256537136 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string +CRS/RAC-cluster/asmparameterfile/registry.253.788682933
以上成功修改了asm_diskstring ,且更新了ASM DISKGROUP上的SPFILE , 由于ASM使用共享的SPFILE所以其他节点上一般无需在做其他操作。
4、crsctl replace votedisk 命令将votedisk重置位置:
[root@RAC01 ~]# crsctl replace votedisk +CRS
Successful addition of voting disk b0d8ba07a9684fcfbfe7660e829128d5.
Successfully replaced voting disk group with +CRS.
CRS-4266: Voting file(s) successfully replaced
[root@RAC01 ~]# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE b0d8ba07a9684fcfbfe7660e829128d5 (/dev/asm-crsdisk-new) [CRS]
Located 1 voting disk(s).
[root@RAC01 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2940
Available space (kbytes) : 259180
ID : 2028826513
Device/File Name : +CRS
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded
以上replace了votedisk到新的 ASM DISK上,并确认votedisk和OCR均为可用状态。
5、重启CRS服务:
[root@RAC01 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'RAC01'
CRS-2673: Attempting to stop 'ora.ctssd' on 'RAC01'
CRS-2673: Attempting to stop 'ora.asm' on 'RAC01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'RAC01'
CRS-2677: Stop of 'ora.mdnsd' on 'RAC01' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'RAC01' succeeded
CRS-2677: Stop of 'ora.asm' on 'RAC01' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'RAC01'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'RAC01' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'RAC01'
CRS-2677: Stop of 'ora.cssd' on 'RAC01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'RAC01'
CRS-2677: Stop of 'ora.gipcd' on 'RAC01' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'RAC01'
CRS-2677: Stop of 'ora.gpnpd' on 'RAC01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'RAC01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@RAC01 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@RAC01 ~]# crsctl status res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE ONLINE RAC01 Started
ora.cluster_interconnect.haip
1 ONLINE ONLINE RAC01
ora.crf
1 OFFLINE OFFLINE
ora.crsd
1 ONLINE ONLINE RAC01
ora.cssd
1 ONLINE ONLINE RAC01
ora.cssdmonitor
1 ONLINE ONLINE RAC01
ora.ctssd
1 ONLINE ONLINE RAC01 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.evmd
1 ONLINE ONLINE RAC01
ora.gipcd
1 ONLINE ONLINE RAC01
ora.gpnpd
1 ONLINE ONLINE RAC01
ora.mdnsd
1 ONLINE ONLINE RAC01
因为上面更新了ASM共享使用的SPFILE,所以其他节点上一般不会存在问题,直接重启后CRS即可正常工作。
以上修复过程参考:在11gR2 RAC中修改ASM DISK Path磁盘路径