oracle rac node 2 down 了,但启动失败

前两天一套集群的节点2down 了

查看集群alert日志

2017-05-12 15:35:41.738
[cssd(743)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/proddb-2/cssd/ocssd.log
2017-05-12 15:35:56.746
[cssd(743)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/proddb-2/cssd/ocssd.log
2017-05-12 15:36:11.754
[cssd(743)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/proddb-2/cssd/ocssd.log
2017-05-12 15:36:26.761
[cssd(743)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/proddb-2/cssd/ocssd.log


查看ocssd.log

OCSSD LOG
------------------
Filename=ocssd.log

2017-05-05 13:46:56.088: [ CSSD][704857856]clssscGetParameterProfile: buffer passed for parameter VF discovery (2) is too short, required 23, passed 20
2017-05-05 13:46:56.088: [ CSSD][704857856]clssnmReadDiscoveryProfile: voting file discovery string(/u02/oradata/grid/vote)
2017-05-05 13:46:56.088: [ CSSD][704857856]clssnmvDDiscThread: using discovery string /u02/oradata/grid/vote for initial discovery
2017-05-05 13:46:56.088: [ SKGFD][704857856]Discovery with str:/u02/oradata/grid/vote:

2017-05-05 13:46:56.088: [ SKGFD][704857856]UFS discovery with :/u02/oradata/grid/vote:

2017-05-05 13:46:56.089: [ SKGFD][704857856]Fetching UFS disk :/u02/oradata/grid/vote:

2017-05-05 13:46:56.089: [ SKGFD][704857856]OSS discovery with :/u02/oradata/grid/vote:

2017-05-05 13:46:56.089: [ CSSD][704857856]clssnmvDiskVerify: Successful discovery of 0 disks
2017-05-05 13:46:56.089: [ CSSD][704857856]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2017-05-05 13:46:56.089: [ CSSD][704857856]clssnmvFindInitialConfigs: No voting files found

2017-05-05 13:46:56.089: [ CSSD][704857856](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
2017-05-05 13:47:12.576: [ CSSD][4018611968]clsu_load_ENV_levels: Module = CSSD, LogLevel = 2, TraceLevel = 0
2017-05-05 13:47:12.576: [ CSSD][4018611968]clsu_load_ENV_levels: Module = GIPCNM, LogLevel = 2, TraceLevel = 0
2017-05-05 13:47:12.576: [ CSSD][4018611968]clsu_load_ENV_levels: Module = GIPCGM, LogLevel = 2, TraceLevel = 0
2017-05-05 13:47:12.576: [ CSSD][4018611968]clsu_load_ENV_levels: Module = GIPCCM, LogLevel = 2, TraceLevel = 0
2017-05-05 13:47:12.576: [ CSSD][4018611968]clsu_load_ENV_levels: Module = CLSF, LogLevel = 0, TraceLevel = 0
2017-05-05 13:47:12.576: [ CSSD][4018611968]clsu_load_ENV_levels: Module = SKGFD, LogLevel = 0, TraceLevel = 0
2017-05-05 13:47:12.576: [ CSSD][4018611968]clsu_load_ENV_levels: Module = GPNP, LogLevel = 1, TraceLevel = 0
2017-05-05 13:47:12.576: [ CSSD][4018611968]clsu_load_ENV_levels: Module = OLR, LogLevel = 0, TraceLevel = 0
[ CSSD][4018611968]clsugetconf : Configuration type [4].
2017-05-05 13:47:12.576: [ CSSD][4018611968]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (clustered) mode with uniqueness value 1493963232
2017-05-05 13:47:12.576: [ CSSD][4018611968]clssscmain: Environment is production
2017-05-05 13:47:12.576: [ CSSD][4018611968]clssscmain: Core file size limit extended
2017-05-05 13:47:12.579: [ CSSD][4018611968]clssscmain: GIPCHA down 0
2017-05-05 13:47:12.580: [ CSSD][4018611968]clssscGetParameterOLR: OLR fetch for parameter logsize (8) failed with rc 21
2017-05-05 13:47:12.580: [ CSSD][4018611968]clssscExtendLimits: The current soft limit for file descriptors is 65536, hard limit is 65536
2017-05-05 13:47:12.580: [ CSSD][4018611968]clssscExtendLimits: The current soft limit for locked memory is 4294967295, hard limit is 4294967295
2017-05-05 13:47:12.580: [ CSSD][4018611968]clssscGetParameterOLR: OLR fetch for parameter priority (15) failed with rc 21
2017-05-05 13:47:12.580: [ CSSD][4018611968]clssscSetPrivEnv: Setting priority to 4
2017-05-05 13:47:12.586: [ CSSD][4018611968]clssscSetPrivEnv: Can't access local IPMI device--no device configured or driver missing/incompatible. IPMI support may be available with static IP configuration.
2017-05-05 13:47:12.586: [ CSSD][4018611968]clssscmain: Running as user grid
2017-05-05 13:47:12.587: [ CSSD][4018611968]clssscmain: RT queue setting is at default value
2017-05-05 13:47:12.587: [ CSSD][4018611968]clssscGetParameterOLR: OLR fetch for parameter auth rep (9) failed with rc 21
2017-05-05 13:47:12.587: [ CSSD][4018611968]clssscGetParameterOLR: OLR fetch for parameter diagwait (14) failed with rc 21
2017-05-05 13:47:12.590: [ CSSD][4018611968]clssnmInitNMInfoMin: Initializing first-reconfig to (0)
[ clsdmt][4009936640]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=proddb-2DBG_CSSD))
2017-05-05 13:47:12.590: [ clsdmt][4009936640]PID for the Process [17269], connkey 4
2017-05-05 13:47:12.590: [ CSSD][4018611968]clssscmain: initgminfo done
2017-05-05 13:47:12.590: [ CSSD][4003038976]clssgmclientlsnr: Spawned
2017-05-05 13:47:12.590: [ CSSD][4003038976]clssgmEvtInformation: reqtype (13) cmProc ((nil)) client ((nil))
2017-05-05 13:47:12.590: [ CSSD][4003038976]clssgmEvtInformation: reqtype (13) req (0x7fe9e4000920)
2017-05-05 13:47:12.590: [ CSSD][4003038976]clssnmQueueNotification: type (13) 0x7fe9e4000920
2017-05-05 13:47:12.591: [ CSSD][4003038976]clssgmclientlsnr: listening on clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_proddb-2_)(GIPCID=00000000-00000000-17269))
2017-05-05 13:47:12.591: [ GPNP][4018611968]clsgpnp_Init: [at clsgpnp0.c:585] '/u01/app/11.2.0/grid' in effect as GPnP home base.
2017-05-05 13:47:12.591: [ GPNP][4018611968]clsgpnp_Init: [at clsgpnp0.c:619] GPnP pid=17269, GPNP comp tracelevel=1, depcomp tracelevel=0, tlsrc:ORA_DAEMON_LOGGING_LEVELS, apitl:0, complog:1, tstenv:0, devenv:0, envopt:0, flags=3
2017-05-05 13:47:12.613: [ GPNP][4018611968]clsgpnpkwf_initwfloc: [at clsgpnpkwf.c:399] Using FS Wallet Location : /u01/app/11.2.0/grid/gpnp/proddb-2/wallets/peer/

[ CLWAL][4018611968]clsw_Initialize: OLR initlevel [70000]
2017-05-05 13:47:12.628: [ GPNP][4018611968]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2104] get-profile call to url "ipc://GPNPD_proddb-2" disco "" [f=0 claimed- host: cname: seq: auth:]
2017-05-05 13:47:12.634: [ GPNP][4018611968]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2234] Result: (0) CLSGPNP_OK. Successful get-profile CALL to remote "ipc://GPNPD_proddb-2" disco ""
2017-05-05 13:47:12.635: [ CSSD][4018611968]clssscGetParameterProfile: profile fetch failed for parameter ocrid (4) with return code 5
2017-05-05 13:47:12.635: [ CSSD][4018611968]clssscmain: OCRID is 0
2017-05-05 13:47:12.635: [ CSSD][4018611968]clssscmain: Cluster GUID is 208b625386b2df39bfb02751ce50ee56
2017-05-05 13:47:12.635: [ CSSD][4018611968]clssnmNotifyReq: type (12)
2017-05-05 13:47:12.635: [ CSSD][4018611968]clssscmain: last used node number 2
2017-05-05 13:47:12.635: [ CSSD][4018611968]clssscGetParameterProfile: buffer passed for parameter VF discovery (2) is too short, required 23, passed 20
2017-05-05 13:47:12.635: [ CSSD][4018611968]clssnmReadDiscoveryProfile: voting file discovery string(/u02/oradata/grid/vote)
2017-05-05 13:47:12.635: [ CSSD][4018611968]clssnkInit: NK generic layer initializing.
2017-05-05 13:47:12.637: [ SKGFD][4000347904]NOTE: No asm libraries found in the system

2017-05-05 13:47:12.637: [ CLSF][4000347904]Allocated CLSF context
2017-05-05 13:47:12.637: [ CSSD][4000347904]clssnmvDDiscThread: using discovery string /u02/oradata/grid/vote for initial discovery
2017-05-05 13:47:12.637: [ SKGFD][4000347904]Discovery with str:/u02/oradata/grid/vote:

2017-05-05 13:47:12.637: [ SKGFD][4000347904]UFS discovery with :/u02/oradata/grid/vote:

2017-05-05 13:47:12.638: [ SKGFD][4000347904]Fetching UFS disk :/u02/oradata/grid/vote:

2017-05-05 13:47:12.638: [ SKGFD][4000347904]OSS discovery with :/u02/oradata/grid/vote:

2017-05-05 13:47:12.638: [ CSSD][4000347904]clssnmvDiskVerify: Successful discovery of 0 disks
2017-05-05 13:47:12.638: [ CSSD][4000347904]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2017-05-05 13:47:12.638: [ CSSD][4000347904]clssnmvFindInitialConfigs: No voting files found
2017-05-05 13:47:12.639: [ CSSD][4000347904](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds


查看仲裁文件/u02/oradata/grid/vote 是存在的,权限也是正常的,没有人修改过这个文件权限


node 1:

[root@PRODDB-1 ~]# /u01/app/11.2.0/grid/bin/crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 483976892bf34f7ebfdecd4d03533205 (/u02/oradata/grid/vote) []
Located 1 voting disk(s).
[root@PRODDB-1 ~]# ls -l /u02/oradata/grid/
total 23600
-rw-r----- 1 oracle oinstall 272756736 May 11 12:37 ocr
-rw-r----- 1 oracle oinstall 21004800 May 11 14:27 vote

node 2:

[root@PRODDB-2 ~]# /u01/app/11.2.0/grid/bin/crsctl query css votedisk
Unable to communicate with the Cluster Synchronization Services daemon.
[root@PRODDB-2 ~]# ls -l /u02/oradata/grid/
total 23600
-rw-r----- 1 oracle oinstall 272756736 May 11 12:37 ocr
-rw-r----- 1 oracle oinstall 21004800 May 11 14:27 vote


之后就很费解,向oracle 提SR,做了ocssd进程trace
1. crsctl start crs

2. get the ocssd.bin pid
ps -ef|grep ocssd.bin

3. Execute following command:
strace -fto /tmp/ocssd_strace.log -p <PID>

wait 10mins ,and cancel the command 

查看trace log

STRACE OUTPUT
-------------------------
Filename=ocssd_strace.zip

22372 14:54:05 write(4, "2017-05-11 14:54:05.449: [ GP"..., 188) = 188
22594 14:54:05 stat("/u02/oradata/grid/vote", <unfinished ...>
22372 14:54:05 write(4, "2017-05-11 14:54:05.450: [ CS"..., 603) = 603
22372 14:54:05 futex(0xf2ec94, FUTEX_WAIT_PRIVATE, 899, NULL <unfinished ...>
22594 14:54:05 <... stat resumed> {st_mode=S_IFREG|0640, st_size=21004800, ...}) = 0
22594 14:54:05 stat("/u02/oradata/grid/vote", {st_mode=S_IFREG|0640, st_size=21004800, ...}) = 0
22594 14:54:05 access("/u02/oradata/grid/vote", R_OK|W_OK) = -1 EACCES (Permission denied) <=====

跟据错误信息网上搜了一下,有相同错误的文章,只不过下面权限问题在asm磁盘设备,而公司用的是virtas storge foundation
http://www.askmaclean.com/archives/discover-your-missed-asm-disks.html

于是试着修改了/u02/oradata/grid/vote 的权限从640 修改成644 ,查看日志错误依然不能发现仲裁文件,又改成664 后 crs 启动一切正常了

不过费解的是,这个文件权限没有人修改过,如果没有修改,当初创建这个集群怎么启动的,如果被修改了,当然人为原因肯定排除,又是怎么被修改的呢?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值