os: centos 7.4
db: oracle 19c rac (19.3)
rac 一个节点重启后检查集群状态时报错
CRS-4535: Cannot communicate with Cluster Ready Services
# /u01/app/grid/product/19.0.0/grid_1/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
检查发现 crsd.bin 不存在
# ps -ef | grep crsd.bin
root 12232 4773 0 10:07 pts/1 00:00:00 grep --color=auto crsd.bin
查看ora.crsd的状态,有问题
# /u01/app/grid/product/19.0.0/grid_1/bin/crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE STABLE
ora.cluster_interconnect.haip
1 ONLINE ONLINE node2 STABLE
ora.crf
1 ONLINE ONLINE node2 STABLE
ora.crsd
1 ONLINE OFFLINE STABLE
ora.cssd
1 ONLINE ONLINE node2 STABLE
ora.cssdmonitor
1 ONLINE ONLINE node2 STABLE
ora.ctssd
1 ONLINE OFFLINE STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.evmd
1 ONLINE ONLINE node2 STABLE
ora.gipcd
1 ONLINE ONLINE node2 STABLE
ora.gpnpd
1 ONLINE ONLINE node2 STABLE
ora.mdnsd
1 ONLINE ONLINE node2 STABLE
ora.storage
1 ONLINE ONLINE node2 STABLE
--------------------------------------------------------------------------------
检查asm磁盘
# ls -l /dev/sd*
brw-rw---- 1 root disk 8, 0 Feb 16 09:57 /dev/sda
brw-rw---- 1 root disk 8, 1 Feb 16 09:57 /dev/sda1
brw-rw---- 1 root disk 8, 2 Feb 16 09:57 /dev/sda2
brw-rw---- 1 grid asmadmin 8, 16 Feb 16 10:12 /dev/sdb
brw-rw---- 1 grid asmadmin 8, 32 Feb 16 09:58 /dev/sdc
brw-rw---- 1 grid asmadmin 8, 48 Feb 16 09:58 /dev/sdd
brw-rw---- 1 grid asmadmin 8, 64 Feb 16 09:58 /dev/sde
# ls -l /dev/asm*
lrwxrwxrwx 1 root root 3 Feb 16 09:58 /dev/asm-diskb -> sdb
lrwxrwxrwx 1 root root 3 Feb 16 09:58 /dev/asm-diskc -> sdc
lrwxrwxrwx 1 root root 3 Feb 16 09:58 /dev/asm-diskd -> sdd
lrwxrwxrwx 1 root root 3 Feb 16 09:58 /dev/asm-diske -> sde
查看日志 $ORACLE_BASE/diag/crs/$HOSTNAME/crs/alert/log.xml
<msg time='2021-02-16T09:58:20.860+08:00' org_id='oracle' comp_id='crs'
msg_id='clsdadr_process_queue:4927:2974305713' type='UNKNOWN' group='CLSDADR'
level='16' host_id='node2' host_addr='192.168.56.12'
pid='6543'>
<txt>2021-02-16 09:58:20.843 [OCTSSD(6543)]CRS-2407: The new Cluster Time Synchronization Service reference node is host node1.
</txt>
</msg>
<msg time='2021-02-16T09:58:21.016+08:00' org_id='oracle' comp_id='crs'
msg_id='clsdadr_process_queue:4927:2974305713' type='UNKNOWN' group='CLSDADR'
level='16' host_id='node2' host_addr='192.168.56.12'
pid='6543'>
<txt>2021-02-16 09:58:21.008 [OCTSSD(6543)]CRS-2419: The clock on host node2 differs from mean cluster time by 3066359431 microseconds. The Cluster Time Synchronization Service will not perform time synchronization because the time difference is beyond the permissible offset of 600 seconds. Details in /u01/app/gridbase/19.0.0/grid_1/diag/crs/node2/crs/trace/octssd.trc.
</txt>
</msg>
<msg time='2021-02-16T09:58:21.110+08:00' org_id='oracle' comp_id='crs'
msg_id='clsdadr_process_queue:4927:2974305713' type='UNKNOWN' group='CLSDADR'
level='16' host_id='node2' host_addr='192.168.56.12'
pid='6543'>
<txt>2021-02-16 09:58:21.110 [OCTSSD(6543)]CRS-2402: The Cluster Time Synchronization Service aborted on host node2. Details at (:ctsselect_mstm4:) in /u01/app/gridbase/19.0.0/grid_1/diag/crs/node2/crs/trace/octssd.trc.
</txt>
</msg>
日志显示,是 node2 的时间和 node1 的时间相差太大。
调整时间后,尝试手动启动
# /u01/app/grid/product/19.0.0/grid_1/bin/crsctl start res ora.crsd -init