oracle10g cssd日志,CSSD无法启动故障解析

Oracle Cluster Synchroniaction Services Daemon(OCSSD)集群同步服务后台程序,在10g中由init——>init.cssd——>cssd/oprocd/cssdmonitor,此进程负责集群同步、集群成员及组成员的管理,此进程会通过network heartbeat 和disk heartbeat两个最基本的心跳机制来保证节点间正常通信,即避免了出现脑裂导致的非同步写不一致问题。每个节点每秒通过私有网络发送心跳信息,且每秒向一个voting disk发起一次disk heartbeat操作。如果两种心跳其一不正常,在规定时间里,通过跳票选举将故障节点驱逐出集群。

–环境:

[root@trsen01 network-scripts]# uname -a

Linux trsen01.com 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

[oracle@trsen01 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 – Production on Wed Dec 3 15:31:15 2014

Copyright (c) 1982, 2010, Oracle. All Rights Reserved.

Connected to:

Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Data Mining

and Real Application Testing options

SQL> select * from v$version;

BANNER

—————————————————————-

Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bi

PL/SQL Release 10.2.0.5.0 – Production

CORE 10.2.0.5.0 Production

TNS for Linux: Version 10.2.0.5.0 – Production

NLSRTL Version 10.2.0.5.0 – Production

出现问题的情景:机房ups损坏异常断电后,集群所有节点起不来,一节点ocr状态不一致,一节点直接主板损坏

—————–disk heartbeat问题—————————

[root@trsen01 bin]#./crsctl start crs

Failure 1 contacting CSS daemon

Cannot communicate with CRS

Cannot communicate with EVM

[root@trsen01 log]# ps -ef | grep d.b

root 15417 16372 0 15:54 pts/0 00:00:00 grep d.b

[root@trsen01 log]#ocrcheck

PROT-602: Failed to retrieve data from the cluster registry

[root@trsen01 log]#

[root@trsen01 log]#crsctl query css votedisk

OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Invalid argument] [2]

这里果断不查看crsd或cssd日志,直接来到OS日志

[root@trsen01 log]# cat messages.1 | grep dependencies. | more

Nov 28 01:07:12 trsen01 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.13715.

Nov 28 01:07:12 trsen01 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.13606.

Nov 28 01:07:12 trsen01 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.13761.

Nov 28 01:08:12 trsen01 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.13715.

Nov 28 01:08:12 trsen01 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.13606.

…….

…..

[root@trsen01 log]# cat /tmp/crsctl.13715

OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]

初步诊断OCR文件损坏或者所在disk出现了故障,

[root@trsen01 log]# ls -trl /etc/oracle/ocr.loc

-rw-r–r– 1 root oinstall 45 Sep 29 2011 /etc/oracle/ocr.loc

[root@trsen01 log]# more /etc/oracle/ocr.loc

ocrconfig_loc=/dev/raw/raw1

local_only=FALSE

[root@trsen01 log]# ls -l /dev/raw/raw*

居然查不到裸设备的存在,尝试重启主机,成功解决,初步怀疑是后端存储和主机都断电了,起动过程中,后端存储的磁盘状态与主机的状态不一致导致。所以一样的现象不一定就是同样的问题所致如ocr损坏

—————–network heartbeat问题—————————

主板损坏导致,维修后,无法启动crs,又是一样的报错,

[root@trsen01 bin]#./crsctl start crs

Failure 1 contacting CSS daemon

Cannot communicate with CRS

Cannot communicate with EVM

[root@trsen01 log]# ps -ef | grep d.b

root 15417 16372 0 15:54 pts/0 00:00:00 grep d.b

[oracle@trsen02 admin]$ ocrcheck

Status of Oracle Cluster Registry is as follows :

Version : 2

Total space (kbytes) : 102184

Used space (kbytes) : 4364

Available space (kbytes) : 97820

ID : 1138124715

Device/File Name : /dev/raw/raw1

Device/File integrity check succeeded

Device/File not configured

Cluster registry integrity check succeeded

[oracle@trsen02 admin]$ ls -trl /etc/oracle/ocr.loc

-rw-r–r– 1 root oinstall 45 Sep 29 2011 /etc/oracle/ocr.loc

[oracle@trsen02 admin]$ more /etc/oracle/ocr.loc

ocrconfig_loc=/dev/raw/raw1

local_only=FALSE

[oracle@trsen02 admin]$ ls -ltr /dev/raw/raw*

crw-rw—- 1 root oinstall 162, 1 Dec 3 09:18 /dev/raw/raw1

crw-rw—- 1 oracle dba 162, 2 Dec 3 16:41 /dev/raw/raw2

没有问题,那就看日志吧

[root@trsen02 log]# more /var/log/messages

Nov 28 01:08:12 trsen01 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16063.

[root@trsen02 log]# ore /tmp/crsctl.16063

Failed 3 to bind listening endpoint: (ADDRESS=(PROTOCOL=tcp)(HOST=trsen02-priv

报错变了,查看网络配置,才发现少了一个private网络的端口,速度做好private网络的端口

一般css起不来,都是ocr文件或disk出现的问题较多,网络的较少

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值