数据库节点1 主机重启,重启完成后ASM和数据库都未正常启动查看对应的agent
问题排查
1 查看has状态。
[grid@orcldb1 trace]$ ps -ef|grep has
root 60734 1 0 11:21 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
root 70798 1 0 11:22 ? 00:00:07 /u01/app/19c/grid/bin/ohasd.bin reboot
[grid@orcldb1 trace]$
查看agent数量
[grid@orcldb1 trace]$ ps -ef|grep agent
root 72466 1 0 11:22 ? 00:00:02 /u01/app/19c/grid/bin/orarootagent.bin
grid 73261 1 0 11:22 ? 00:00:01 /u01/app/19c/grid/bin/oraagent.bin
root 75573 1 0 11:22 ? 00:00:00 /u01/app/19c/grid/bin/cssdagent ---正常应该为6个agent。has启动3个 crsd启动3个。显然crsd未正常启动导致。
grid 96525 93759 0 11:43 pts/0 00:00:00 grep --color=auto agent \
[root@orcldb1 ~]# lsof -p 65527 |grep "trc"
ohasd.bin 65527 root 1u REG 8,3 6256 201923036 /u01/app/grid/crsdata/orcldb1/output/ohasdOUT.trc
ohasd.bin 65527 root 2u REG 8,3 6256 201923036 /u01/app/grid/crsdata/orcldb1/output/ohasdOUT.trc
ohasd.bin 65527 root 4u REG 8,3 6256 201923036 /u01/app/grid/crsdata/orcldb1/output/ohasdOUT.trc
ohasd.bin 65527 root 66w REG 8,3 20507881 273343167 /u01/app/grid/diag/crs/orcldb1/crs/trace/ohasd.trc
[root@orcldb1 ~]# cd /u01/app/grid/diag/crs/orcldb1/crs/trace
查看alert.log,查看集群资源状态
[grid@orcldb1 trace]$
[grid@orcldb1 trace]$ crsctl status res -t -init[grid@orcldb1 trace]$ crsctl start resource "ora.crsd" -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE STABLE
ora.cluster_interconnect.haip
1 ONLINE ONLINE orcldb1 STABLE
ora.crf
1 ONLINE ONLINE orcldb1 STABLE
**ora.crsd
1 ONLINE OFFLINE STABLE**
ora.cssd
1 ONLINE ONLINE orcldb1 STABLE
ora.cssdmonitor
1 ONLINE ONLINE orcldb1 STABLE
ora.ctssd
1 ONLINE OFFLINE STABLE
ora.diskmon
1 OFFLINE OFFLINE STABLE
ora.drivers.acfs
1 ONLINE ONLINE orcldb1 STABLE
ora.evmd
1 ONLINE ONLINE orcldb1 STABLE
ora.gipcd
1 ONLINE ONLINE orcldb1 STABLE
ora.gpnpd
1 ONLINE ONLINE orcldb1 STABLE
ora.mdnsd
1 ONLINE ONLINE orcldb1 STABLE
ora.storage
1 ONLINE ONLINE orcldb1 STABLE
尝试手动启动crs.d资源
> [grid@orcldb1 trace]$ crsctl start resource "ora.crsd" -initCRS-2672: Attempting to start 'ora.ctssd' on 'orcldb1'
CRS-2676: Start of 'ora.ctssd' on 'orcldb1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'orcldb1'
CRS-2672: Attempting to start 'ora.crsd' on 'orcldb1'
CRS-2676: Start of 'ora.asm' on 'orcldb1' succeeded
CRS-2676: Start of 'ora.crsd' on 'orcldb1' succeeded
> [grid@orcldb1 trace]$ systemctl status ntpd.service
鈼[0m ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2020-11-13 11:32:15 CST; 14min ago
Process: 59267 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 59310 (ntpd)
Tasks: 2
CGroup: /system.slice/ntpd.service
鈹溾攢59310 /usr/sbin/ntpd -u ntp:ntp -x -u ntp:ntp -p /var/run/ntpd.pid
鈹斺攢59386 /usr/sbin/ntpd -u ntp:ntp -x -u ntp:ntp -p /var/run/ntpd.pid
[grid@orcldb1 trace]$
然后数据库和ASM都正常启动,难道crsd被disable了?
> [grid@orcldb1 trace]$ crsctl status resource "ora.crsd" -init -p|grep -i "enable"
ENABLED=1
RESOURCE_USE_ENABLED=1
[grid@orcldb1 trace]$
2020-11-13 12:09:20.897 [OCSSD(67590)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.
2020-11-13 12:09:21.027 [OCTSSD(67944)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 67944
2020-11-13 12:09:21.835 [OCTSSD(67944)]CRS-2403: The Cluster Time Synchronization Service on host eomsdb1 is in observer mode.
2020-11-13 12:09:23.093 [OCTSSD(67944)]CRS-2407: The new Cluster Time Synchronization Service reference node is host eomsdb2.
2020-11-13 12:09:23.093 [OCTSSD(67944)]CRS-2401: The Cluster Time Synchronization Service started on host eomsdb1.
**2020-11-13 12:09:23.135 [OCTSSD(67944)]CRS-2419: The clock on host eomsdb1 differs from mean cluster time by 627364947 microseconds.
The Cluster Time Synchronization Service will not perform time synchronization because the time difference is beyond the permissible offset of 600 seconds. Details in** /u01/app/grid/diag/crs/eomsdb1/crs/trace/octssd.trc.
2020-11-13 12:09:23.831 [OCTSSD(67944)]CRS-2402: The Cluster Time Synchronization Service aborted on host eomsdb1.
Details at (:ctsselect_msm3:) in /u01/app/grid/diag/crs/eomsdb1/crs/trace/octssd.trc.
2020-11-13 14:15:41.203 [OCTSSD(186370)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 186370
2020-11-13 14:15:42.014 [OCTSSD(186370)]CRS-2403: The Cluster Time Synchronization Service on host eomsdb1 is in observer mode.
2020-11-13 14:15:43.276 [OCTSSD(186370)]CRS-2407: The new Cluster Time Synchronization Service reference node is host eomsdb2.