远程登陆系统,检查发现,二节点集群以及实例确实没有启动,而一节点的是启动的,系统仍然能够对外提供服务。
node2:
node1:
检查集群日志:发现在启动CSSD进程时有报错。
检查cssd日志发现:有报has a disk HB, but no network HB,初步判断是由于网络的问题导致节点集群无法启动。
Ocssd.log:
检查系统日志:发现在重启主机的时候,启动
eth1
网卡,即心跳网,启动失败报错改
IP
已被其他主机占用。
node2:
node1:
检查集群日志:发现在启动CSSD进程时有报错。
检查cssd日志发现:有报has a disk HB, but no network HB,初步判断是由于网络的问题导致节点集群无法启动。
Ocssd.log:
点击(此处)折叠或打开
-
- 2016-11-01 11:47:39.244: [ CSSD][88397568]clssscSelect: cookie accept request 0x7f22fc084650
- 2016-11-01 11:47:39.244: [ CSSD][88397568]clssscevtypSHRCON: getting client with cmproc 0x7f22fc084650
- 2016-11-01 11:47:39.244: [ CSSD][88397568]clssgmRegisterClient: proc(4/0x7f22fc084650), client(4/0x7f22fc077720)
- 2016-11-01 11:47:39.244: [ CSSD][88397568]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f22fc084650) client(0x7f22fc077720)
- 2016-11-01 11:47:39.245: [ CSSD][88397568]clssgmDiscEndpcl: gipcDestroy 0x4c9
- 2016-11-01 11:47:39.579: [ CSSD][88397568]clssscSelect: cookie accept request 0x7f22fc06f050
- 2016-11-01 11:47:39.580: [ CSSD][88397568]clssscevtypSHRCON: getting client with cmproc 0x7f22fc06f050
- 2016-11-01 11:47:39.580: [ CSSD][88397568]clssgmRegisterClient: proc(3/0x7f22fc06f050), client(1/0x7f22fc07d2e0)
- 2016-11-01 11:47:39.580: [ CSSD][88397568]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f22fc06f050) client(0x7f22fc07d2e0)
- 2016-11-01 11:47:39.580: [ CSSD][88397568]clssgmDiscEndpcl: gipcDestroy 0x4e1
- 2016-11-01 11:47:39.587: [ CSSD][104158976]clssnmlfmtlease: uniqueness 1477972049, gipc addr gipcha://oproject2:nm2_oprojec-cluster
- 2016-11-01 11:47:39.590: [ CSSD][104158976]clssnmvStatusBlkInit: myinfo nodename oproject2, uniqueness 1477972049
- 2016-11-01 11:47:39.628: [ CSSD][104158976]clssnmlgetslot:lease acquisition for node oproject2/slot 2 completed in 5030 msecs
- 2016-11-01 11:47:39.636: [ CSSD][104158976]clssnmvDHBValidateNcopy: node 1, oproject1, has a disk HB, but no network HB, DHB has rcfg 366471331, wrtcnt, 22240309, LATS 4294065090, lastSeqNo 0, uniqueness 1470976437, timestamp 1477972068/800668394
- 2016-11-01 11:47:39.636: [ CSSD][104158976]clssnmvDHBValidateNcopy: node 2, oproject2, has a disk HB, but no network HB, DHB has rcfg 366471330, wrtcnt, 4686552, LATS 4294065090, lastSeqNo 0, uniqueness 1477972049, timestamp 1473386969/1790883314
- 2016-11-01 11:47:39.659: [ SKGFD][104158976]Lib :UFS:: closing handle 0x2483ab0 for disk :/dev/CRS1:
- 2016-11-01 11:47:39.659: [ CSSD][104158976]clssnmInitNodeDB: Initializing with OCR id 0
- 2016-11-01 11:47:39.666: [ CSSD][88397568]clssscSelect: cookie accept request 0x24b0040
- 2016-11-01 11:47:39.666: [ CSSD][88397568]clssgmAllocProc: (0x7f22fc09a380) allocated
- 2016-11-01 11:47:39.666: [ CSSD][88397568]clssgmClientConnectMsg: properties of cmProc 0x7f22fc09a380 - 1,2,3,4,5
- 2016-11-01 11:47:39.666: [ CSSD][88397568]clssgmClientConnectMsg: Connect from con(0x52d) proc(0x7f22fc09a380) pid(2684) version 11:2:1:4, properties: 1,2,3,4,5
- 2016-11-01 11:47:39.666: [ CSSD][88397568]clssgmClientConnectMsg: msg flags 0x0000
- 2016-11-01 11:47:39.667: [ CSSD][88397568]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(0x7f22fc09a380) client((nil))
- 2016-11-01 11:47:39.667: [ CSSD][88397568]clssgmDeadProc: proc 0x7f22fc09a380
- 2016-11-01 11:47:39.667: [ CSSD][88397568]clssgmDestroyProc: cleaning up proc(0x7f22fc09a380) con(0x52d) skgpid ospid 2684 with 0 clients, refcount 0
使用心跳IP进行SSH,发现无法登陆。后续联系系统工程师检查网络占用情况,了解到38网段是与存储通信,同时作为数据库的心跳,其他服务器与存储连接端口占用。
将占用心跳 IP 的主机修改后,数据库成功启动。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29123031/viewspace-2132267/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/29123031/viewspace-2132267/