两节点RAC,其中1 节点集群CRS无法启动。经过分析原因为2节点gipcd进程异常,导致节点之间无法正常通信,重启2节点gipcd.bin后问题得以恢复。 从现象来看,是 ora.crsd 和 ora.evmd 无法启动,其他组件正常。
1. 检查和分析
1.1. 节点 1 集群 alert 日志
节点1集群日志13:08分时手动重启内容如下, 关于olsnodes.log无法删除的信息本环境中一直存在,此处信息可忽略。
2018-11-26 13:08:29.521: [client(892)]CRS-0009:log file "/home/u01/app/grid/11.2.0/product/log/sxmms1/client/olsnodes.log" reopened 2018-11-26 13:08:29.521: [client(892)]CRS-0019:file rotation terminated. log file: "/home/u01/app/grid/11.2.0/product/log/sxmms1/client/olsnodes.log" 2018-11-26 13:08:42.421: [ohasd(903)]CRS-2112:The OLR service started on node sxmms1. 2018-11-26 13:08:42.433: [ohasd(903)]CRS-1301:Oracle High Availability Service started on node sxmms1. 2018-11-26 13:08:42.433: [ohasd(903)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred 2018-11-26 13:08:45.864: [/home/u01/app/grid/11.2.0/product/bin/orarootagent.bin(948)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 2018-11-26 13:08:51.238: [gpnpd(1118)]CRS-2328:GPNPD started on node sxmms1. 2018-11-26 13:08:53.710: [cssd(1184)]CRS-1713:CSSD daemon is started in clustered mode 2018-11-26 13:08:55.508: [ohasd(903)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE 2018-11-26 13:08:55.509: [ohasd(903)]CRS-2769:Unable to failover resource 'ora.diskmon'. 2018-11-26 13:09:03.406: [cssd(1184)]CRS-1707:Lease acquisition for node sxmms1 number 1 completed 2018-11-26 13:09:04.658: [cssd(1184)]CRS-1605:CSSD voting file is online: ORCL:OCR2; details in /home/u01/app/grid/11.2.0/product/log/sxmms1/cssd/ocssd.log. 2018-11-26 13:09:07.670: [cssd(1184)]CRS-1601:CSSD Reconfiguration complete. Active nodes are sxmms1 sxmms2 . 2018-11-26 13:09:09.989: [ctssd(1269)]CRS-2407:The new Cluster Time Synchronization Service reference node is host sxmms2. 2018-11-26 13:09:09.990: [ctssd(1269)]CRS-2401:The Cluster Time Synchronization Service started on host sxmms1. 2018-11-26 13:09:11.701: [ohasd(903)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE 2018-11-26 13:09:11.701: [ohasd(903)]CRS-2769:Unable to failover resource 'ora.diskmon'. 2018-11-26 13:10:08.710: [/home/u01/app/grid/11.2.0/product/bin/orarootagent.bin(1129)]CRS-5818:Aborted command 'start' for resource 'ora.ctssd'. Details at (:CRSAGF00113:) {0:0:2} in /home/u01/app/grid/11.2.0/product/log/sxmms1/agent/ohasd/orarootagent_root/orarootagent_root.log. 2018-11-26 13:10:12.714: [ohasd(903)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.ctssd'. Details at (:CRSPE00111:) {0:0:2} in /home/u01/app/grid/11.2.0/product/log/sxmms1/ohasd/ohasd.log. [client(1584)]CRS-10001:26-Nov-18 13:10 ACFS-9391: Checking for existing ADVM/ACFS installation. [client(1589)]CRS-10001:26-Nov-18 13:10 ACFS-9392: Validating ADVM/ACFS installation files for operating system. [client(1591)]CRS-10001:26-Nov-18 13:10 ACFS-9393: Verifying ASM Administrator setup. [client(1594)]CRS-10001:26-Nov-18 13:10 ACFS-9308: Loading installed ADVM/ACFS drivers. [client(1597)]CRS-10001:26-Nov-18 13:10 ACFS-9154: Loading 'oracleoks.ko' driver. [client(1625)]CRS-10001:26-Nov-18 13:10 ACFS-9154: Loading 'oracleadvm.ko' driver. [client(1653)]CRS-10001:26-Nov-18 13:10 ACFS-9154: Loading 'oracleacfs.ko' driver. [client(1764)]CRS-10001:26-Nov-18 13:10 ACFS-9327: Verifying ADVM/ACFS devices. [client(1773)]CRS-10001:26-Nov-18 13:10 ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'. [client(1777)]CRS-10001:26-Nov-18 13:10 ACFS-9156: Detecting control device '/dev/ofsctl'. [client(1782)]CRS-10001:26-Nov-18 13:10 ACFS-9322: completed 2018-11-26 13:10:14.067: [ohasd(903)]CRS-2807:Resource 'ora.asm' failed to start automatically. 2018-11-26 13:10:14.067: [ohasd(903)]CRS-2807:Resource 'ora.crsd' failed to start automatically. 2018-11-26 13:10:14.067: [ohasd(903)]CRS-2807:Resource 'ora.evmd' failed to start automatically. 2018-11-26 13:11:42.738: [ohasd(903)]CRS-2765:Resource 'ora.ctssd' has failed on server 'sxmms1'. 2018-11-26 13:11:45.381: [ctssd(2151)]CRS-2407:The new Cluster Time Synchronization Service reference node is host sxmms2. 2018-11-26 13:11:45.382: [ctssd(2151)]CRS-2401:The Cluster Time Synchronization Service started on host sxmms1. |
1.2. 节点 1 AGENT 分析
日志只截取了部分内容,从日志来看,几乎很多组件在启动时都出现了超时
/home/u01/app/grid/11.2.0/product/log/sxmms1/agent/ohasd/orarootagent_root/orarootagent_root.log
2018-11-26 13:10:06.792: [ora.ctssd][2525660928]{0:0:2} [start] clsdmc_respget return: status=0, ecode=0, returnbuf=[0x7f51780ce0c0], buflen=8 2018-11-26 13:10:06.792: [ora.ctssd][2525660928]{0:0:2} [start] Start: Extended check return buffer: "? with length of 8 2018-11-26 13:10:06.792: [ora.ctssd][2525660928]{0:0:2} [start] translateReturnCodes, return = 0, state detail = Checkcb data [0x7f51780ce0c0]: mode[0xc0] offset[0 ms]. [ clsdmc][2525660928]CLSDMC.C returnbuflen=8, extraDataBuf=C0, returnbuf=7805FCE0 2018-11-26 13:10:07.793: [ora.ctssd][2525660928]{0:0:2} [start] clsdmc_respget return: status=0, ecode=0, returnbuf=[0x7f517805fce0], buflen=8 2018-11-26 13:10:07.793: [ora.ctssd][2525660928]{0:0:2} [start] Start: Extended check return buffer: "? with length of 8 2018-11-26 13:10:07.793: [ora.ctssd][2525660928]{0:0:2} [start] translateReturnCodes, return = 0, state detail = Checkcb data [0x7f517805fce0]: mode[0xc0] offset[0 ms].
|