1 描述
使用vmware workstation搭建 RAC 12c 环境。在第一个结点运行root.sh正常结束后,第二个结点运行root.sh不能通过,具体报错如下:
CRS-2883: Resource 'ora.cssdmonitor' failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-4000: Command Start failed, or completed with errors.
2013/07/10 16:44:25 CLSRSC-117: Failed to start Oracle Clusterware stack
Died at /u01/app/12.1.0.1/grid/crs/install/crsinstall.pm line 941.
The command '/u01/app/12.1.0.1/grid/perl/bin/perl -I/u01/app/12.1.0.1/grid/perl/lib -I/u01/app/12.1.0.1/grid/crs/install /u01/app/12.1.0.1/grid/crs/install/rootcrs.pl ' execution failed
现象与下面链接一模一样。
https://community.oracle.com/thread/2558691
由于上面链接中并未给出 node2 的alert日志内容,在此补充如下:
/u01/app/12.1.0/grid_1/log/node2/alertnode2.log
… …
2014-07-05 00:05:42.793:
[cssd(63139)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/12.1.0/grid_1/log/node2/cssd/ocssd.log
2014-07-05 00:05:42.884:
[cssd(63139)]CRS-1603:CSSD on node node2 shutdown by user.
2014-07-05 00:06:03.574:
[cssd(63285)]CRS-1713:CSSD daemon is started in rim mode
2014-07-05 00:09:06.499:
[cssd(63285)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/12.1.0/grid_1/log/node2/cssd/ocssd.log
2014-07-05 00:09:06.550:
[cssd(63285)]CRS-1603:CSSD on node node2 shutdown by user.
2014-07-05 00:09:26.125:
[ohasd(62943)]CRS-2878:Failed to restart resource 'ora.cssd'
2014-07-05 00:09:29.242:
[cssd(63420)]CRS-1713:CSSD daemon is started in rim mode
2014-07-05 00:12:34.544:
[cssd(63420)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/12.1.0/grid_1/log/node2/cssd/ocssd.log
2014-07-05 00:12:34.617:
[cssd(63420)]CRS-1603:CSSD on node node2 shutdown by user.
2014-07-05 00:12:54.206:
[ohasd(62943)]CRS-2878:Failed to restart resource 'ora.cssd'
… ….
按照提示查询ocssd.log.寻找出错时间点的错误输出如下
2014-07-05 00:05:42.789: [GIPCHDEM][1471661824]gipchaDaemonCreateResolveResponse: creating resolveResponse for host:node1, port:bcm_node-cluster, haname:, ret:1
2014-07-05 00:05:42.789: [GIPCHTHR][1473238784]gipchaWorkerProcessClientResolveResponse: resolve from connect FAILED for host 'node1', port 'bcm_node-cluster' with ret:gipcretFail (1)
2014-07-05 00:05:42.792: [GIPCXCPT][1971111456]gipcInternalConnectSync: failed sync request, ret gipcretFail (1)
2014-07-05 00:05:42.792: [GIPCXCPT][1971111456]gipcConnectSyncF [clssbcm_ActiveConnect : clssbcm.c : 1507]: EXCEPTION[ ret gipcretFail (1) ] failed sync connect endp 0x1c93e30 [0000000000000292] { gipcEndpoint : localAddr 'gipcha://node2:9b60-0940-5594-3072', remoteAddr 'gipcha://node1:bcm_node-cluster', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x1f496b0, sendp (nil)flags 0x200b861a, usrFlags 0x20010 }, addr 0x1c939d0 [0000000000000290] { gipcAddress : name 'gipcha://node1:bcm_node-cluster', objFlags 0x0, addrFlags 0x4 }, flags 0x0
2014-07-05 00:05:42.793: [ CSSD][1971111456]clssscPipeConnect: Failed to connect node with, connection string gipcha://node1:bcm_node-cluster, trying a different node
2014-07-05 00:05:42.793: [ CSSD][1971111456]ASSERT clsssc.c 7202
2014-07-05 00:05:42.793: [ CSSD][1971111456](:CSSSC00014:)clssscPipeConnect: Could not connect to any given hub endpoints Given endpoint count 1
2014-07-05 00:05:42.793: [ CSSD][1971111456]###################################
2014-07-05 00:05:42.793: [ CSSD][1971111456]clssscExit: CSSD aborting from thread Main
2014-07-05 00:05:42.793: [ CSSD][1971111456]###################################
2014-07-05 00:05:42.793: [ CSSD][1971111456](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
2014-07-05 00:05:42.794: [ CSSD][1971111456]####### Begin Diagnostic Dump #######
2014-07-05 00:05:42.794: [ CSSD][1971111456]### Begin diagnostic data for the Core layer ###
2014-07-05 00:05:42.794: [ CSSD][1971111456]Initialization state clssscInitNODENUM (0x00000001) not set
2014-07-05 00:05:42.794: [ CSSD][1971111456]Initialization state clssscInitSKGXN_DONE (0x00000008) not set
2 操作环境
主机名 | IP地址 | 描述 | 系统版本 | 网卡 | 虚拟机网络设置 |
localhost | 192.168.12.100 | RAC共享存储openfiler | openfileresa-2.99.1-x86_64 | eth0 | bridge |
node1 | 192.168.12.11 | RAC 节点1 public IP | rhel-server-6.5-x86_64 | eth0 | bridge |
node1 | 10.12.12.1 | RAC 节点1 private IP | rhel-server-6.5-x86_64 | eth1 | bridge |
node2 | 192.168.12.12 | RAC 节点2 public IP | rhel-server-6.5-x86_64 | eth0 | bridge |
node2 | 10.12.12.2 | RAC 节点2 private IP | rhel-server-6.5-x86_64 | eth1 | bridge |
主机hosts文件内容
[root@node2 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.12.11 node1
192.168.12.101 node1-vip
10.12.12.1 node1-priv1
192.168.70.11 node1-priv2
192.168.12.12 node2
192.168.12.102 node2-vip
10.12.12.2 node2-priv1
192.168.70.12 node2-priv2
192.168.12.201 node-cluster-scan
192.168.12.100 openfiler12
3 解决方法
3.1 问题定位
从ocssd.log的报错中发现是在 gipchaDaemonCreateResolveResponse: creating resolveResponse for host:node1, port:bcm_node-cluster, haname:, ret:1
建立node1主机的解析结果时出错。两主机互相ping各自的主机名可以正常得到返回结果。 两主机相互ping私有IP地址 10.12.12.* 时不能ping通。此问题出在安装时两结点的私有IP通信故障。遂处理私有IP通信故障问题。
3.2 问题解决
待后续4 个人总结
5 资料参考引用
https://community.oracle.com/thread/2558691
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/11780477/viewspace-1210545/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/11780477/viewspace-1210545/