Grid infrastruce资源无法启动,常见原因分析
OHASD无法启动
原因分析:
1 OS的运行等级设置有误
--linux的runlevel参照如下
rc0.d - System Halted
rc1.d - Single User Mode
rc2.d - Single User Mode with Networking
rc3.d - Multi-User Mode - boot up in text mode
rc4.d - Not yet Defined
rc5.d - Multi-User Mode - boot up in X Windows
rc6.d - Shutdown & Reboot
查看 ohasd的运行等级
oracle@> more /etc/inittab | grep ohasd
h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1
查看当前用户的运行等级
oracle@> who -r
. run-level 2 Jul 03 07:46 2 0 S
2 init.ohasd run是否运行
若init.ohasd run没有运行,则ohasd.bin不会启动
oracle@> ps -ef | grep ohasd | grep -v grep
root 7340038 1 10 Jul 03 - 1022:34 /u001/app/11.2.0.2/grid/bin/ohasd.bin reboot
root 9568408 1 0 Jul 03 - 0:00 /bin/sh /etc/init.ohasd run
若init.ohasd不能及时启动,则会收到类似错误"[ohasd()] CRS-0715:Oracle High Availability Service has timed out waiting for init.ohasd to be started."
注:从linux 6起inittab被废弃,init.ohasd配置在/etc/init中
3 clusterware自动重启是否激活
运行$GRID_HOME/bin/crsctl config crs查看crs是否自动启动
OS日志显示如下
Feb 29 16:20:36 racnode1 logger: Oracle Cluster Ready Services startup disabled.
Feb 29 16:20:36 racnode1 logger: Could not access /var/opt/oracle/scls_scr/racnode1/root/ohasdstr
--该文件无法访问或不存在
4 oracle local registry是否可访问
ls –altr $GRID_HOME/cdata/*.olr
若OLR不可访问或损坏,ohasd.log会有类似记录
2010-01-24 22:59:10.470: [ default][1373676464] Initializing OLR
2010-01-24 22:59:10.472: [ OCROSD][1373676464]utopen:6m':failed in stat OCR file/disk /ocw/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory
2010-01-24 22:59:10.472: [ OCROSD][1373676464]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory
2010-01-24 22:59:10.473: [ OCRRAW][1373676464]proprinit: Could not open raw device
5 ohasd.bin无法访问socket文件
Network socket 文件一般位于/tmp或/var/opt目录
Ohasd.log记录如下:
2010-06-29 10:31:01.570: [ COMMCRS][1206901056]clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))
2010-06-29 10:31:01.571: [ OCRSRV][1217390912]th_listen: CLSCLISTEN failed clsc_ret= 3, addr= [(ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))]
2010-06-29 10:31:01.571: [ OCRSRV][3267002960]th_init: Local listener did not reach valid state
6 ohasd.bin无法访问日志路径
查看OS message,syslog显示如下
Feb 20 10:47:08 racnode1 OHASD[9566]: OHASD exiting; Directory /ocw/grid/log/racnode1/ohasd not found.
7 ohasd无法启动
ps -ef| grep ohasd.bin显示ohasd.bin已经启动,但是ohasd.log很长时间没有更新,使用truss跟踪显示
15058/1: 0.1995 close(2147483646) Err#9 EBADF
15058/1: 0.1996 close(2147483645) Err#9 EBADF
Pstack跟踪ohasd.bin则出现
_close sclssutl_closefiledescriptors main ..
此由bug11834289引起,11203已修复
OHASD Agent无法启动
OHASD.bin置换出4个agent
oraagent: responsible for ora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd etc
orarootagent: responsible for ora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs etc
cssdagent / cssdmonitor: responsible for ora.cssd(for ocssd.bin) and ora.cssdmonitor(for cssdmonitor itself)
1
最常见的问题是相应 agent的日志目录没有操作权限
2
Agent binary损坏,agent无法启动,日志记录如下:
2011-05-03 11:11:13.189
[ohasd(25303)]CRS-5828:Could not start agent '/ocw/grid/bin/orarootagent_grid'. Details at (:CRSAGF00130:) {0:0:2} in /ocw/grid/log/racnode1/ohasd/ohasd.log.
OCSSD.bin无法启动
cssd.bin启动需要如下条件
1 GPnP profile可以正常访问
--该profile存储着css的discoverystring,
--voting disk没有存放在ASM中
2 vote disk可以访问
从第一步的GPnp中找出DiscoveryString
3 网络正常
CRSD.bin无法启动
1 ocssd是否启动
2 OCR可否访问
3 crsd.bin pid 文件存在且指向crsd.bin进程
oracle@ justin> pwd
/u001/app/11.2.0.2/grid/crs/init
oracle@ justin> more justin.pid
22347868
oracle@ justin> ps -ef | grep 22347868
root 22347868 1 6 Jul 03 - 1279:53 /u001/app/11.2.0.2/grid/bin/crsd.bin reboot
如改文件不存在或其pid指向非crsd.bin进程,则crsd无法正常启动,详情需要参考orarootagent_root.log
4 CRSD相关的可执行文件权限设置错误
--查看crsd.bin和$GRID_HOME/bin下的crsd
参考文档1050908.1
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/15480802/viewspace-742159/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/15480802/viewspace-742159/