分析11.2.0.3 rac CRS-1714:Unable to discover any voting files


结论:

   1,11.2.0.3或者说ORACLE不同版本的RAC进程依赖机制一直在发展演化,一定要尽力搞清RAC各进程间依赖关系,到关重要
   2,CRS-1714:Unable to discover any voting files只是表面现象,并非真正是VOTING DISK损坏,具体需要你结合对应的LOG进行分析
   3,如果RAC节点的GPNPD进程所用的配置文件PROFILE.XML(OLR),可能要重建损坏的节点
   4,删除RAC节点以及添加节点,一定要详细查看官方手册,因为里面分类很多
  5,最重要的一点,如果在分析LOG日志,卡住没思路或从未碰过类似问题,一定要查看MOS,搜索关键字,比如本案例的GPNP PROFILE

分析过程:

1,redhat 6.4上面的2节点11.2。0.4 RAC的CRSD进程没有启动,从集群ALERT日志发现,找不到表决磁盘
2015-09-16 16:53:36.138
[cssd(25059)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/grid/11.2.0.4/log/jingfa1/cssd/ocssd.log
2015-09-16 16:53:51.176




2,运行如下命令关闭2个节点的所在ORACLE相关进程
/u01/grid/11.2.0.4/bin/crsctl stop crs






3,确认2个节点的ORACLE进程全部关闭


ps -ef|grep d.bin
root      1077 24425  0 09:00 pts/1    00:00:00 grep d.bin


4,在第1个节点以独占方式启动CRS
/u01/grid/11.2.0.4/bin/crsctl start crs -excl -nocrs




5,在第1个节点查看ASM进程是否启动




6,在第1个节点查看集群进程是否以独占方式启动




7,在第1个节点查看ocr磁盘是否工作正常
/u01/grid/11.2.0.4/bin/ocrcheck






8,如果ocr磁盘工作不正常,且其备份存在,可用备份恢复ocr磁盘
/u01/grid/11.2.0.4/bin/ocrconfig -showbackup




/u01/grid/11.2.0.4/bin/ocrconfig -restore ocr备份文件 




9,在第1个节点以GRID用户查看OCR及VOTING DISK磁盘组是否存在,发现存在
  1* select disk_number,path from v$asm_disk
SQL> /


DISK_NUMBER PATH
----------- --------------------------------------------------
          0 /dev/ocr_vote
          0 /dev/data


SQL> 
SQL> 
SQL> show parameter disk_


NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      DATA
asm_diskstring                       string      /dev/*
SQL> select name,sector_size,block_size,allocation_unit_size/1024/1024 as au_mb from v$asm_diskgroup;


NAME                           SECTOR_SIZE BLOCK_SIZE      AU_MB
------------------------------ ----------- ---------- ----------
DATA                                   512       4096          2
OCRVOTE                                512       4096          2




10,在第1个节点确认VOTING DISK是否工作不正常,确实发现不了
/u01/grid/11.2.0.4/bin/crsctl query css votedisk


11,从上述第9步的asm_diskgroups发现,仅加载一个ASM磁盘组DATA,而没有加载OCRVOTE,所以调整其参数,让ASM实例启动时加载OCRVOTE及DATA磁盘组,这样
    我想就可以在ASM实例启时自动加载VOTING DISK磁盘组了




alter system set asm_diskgroups=data,ocrvote sid='*';






show parameter disk_


12,关闭节点1的CRS集群相关进程
/u01/grid/11.2.0.4/bin/crsctl stop crs


13,重启2个节点的集群进程,确认crsd进程是否正常,发现问题依旧,还是找不到表决磁盘
/u01/grid/11.2.0.4/bin/crsctl start crs


14,关闭2个节点的集群进程,然后在节点1以独占方式启动集群进程


/u01/grid/11.2.0.4/bin/crsctl stop crs


/u01/grid/11.2.0.4/bin/crsctl start crs -excl -nocrs


15,在节点1直接替换ocrvote磁盘组,修复voting disk
/u01/grid/11.2.0.4/bin/crsctl replace votedisk +ocrvote


16,在节点1查看voting disk是否正常
/u01/grid/11.2.0.4/bin/crsctl query css votedisk


17,关闭节点的集群进程,然后在2节点重启集群进程
/u01/grid/11.2.0.4/bin/crsctl stop crs


/u01/grid/11.2.0.4/bin/crsctl start crs






 18,在2个节点确认VOTING DISK是否可以正常工作(如下命令必须CRSD进程启动才有结果,否则为空,且CRSD进程是在集群所有进程最后一个启动),这下节点1正常了,但节点2还是CRSD进程启不来
 /u01/grid/11.2.0.4/bin/crsctl query css votedisk




19,查看节点2的GRID用户的TRC文件,发现节点2的VOTING DISK的CLUSTER GUID标识和GPNP PROFILE不一致,所以最终节点2发现不了VOTING DISK
2015-09-16 17:58:51.847: [    CSSD][1851041536]clssnmvDiskVerify: discovered a potential voting file
2015-09-16 17:58:51.847: [   SKGFD][1851041536]Handle 0x7fd95808f980 from lib :UFS:: for disk :/dev/ocr_vote:




 ---这里GPNP进程发现VOTING DISK的GUID和CLUSTER GUID不相同
2015-09-16 17:58:51.965: [    CSSD][1851041536]clssnmvDiskCreate: Cluster guid 0acef774f25dcfb0bf3d0c7b3db02abe found in voting disk /dev/ocr_vote does not match with the 
cluster guid 7d8026436ade6fe0ff597a0f6df497e1 obtained from the GPnP profile
--移除了VOTING DISK
2015-09-16 17:58:51.965: [    CSSD][1851041536]clssnmvDiskDestroy: removing the voting disk /dev/ocr_vote
2015-09-16 17:58:51.965: [   SKGFD][1851041536]Lib :UFS:: closing handle 0x7fd95808f980 for disk :/dev/ocr_vote:
--找不到VOTING DISK
2015-09-16 17:58:51.965: [    CSSD][1851041536]clssnmvDiskVerify: Successful discovery of 0 disks
2015-09-16 17:58:51.965: [    CSSD][1851041536]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2015-09-16 17:58:51.965: [    CSSD][1851041536]clssnmvFindInitialConfigs: No voting files found
2015-09-16 17:58:51.965: [    CSSD][1851041536](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds


21,我们在第2个节点看看GPNP进程是个什么东西
[grid@jingfa2 jingfa2]$ ps -ef|grep -i gpnp
grid      5238 32255  0 10:02 pts/1    00:00:00 grep -i gpnp
grid     18060     1  0 09:45 ?        00:00:01 /u01/grid/11.2.0.4/bin/gpnpd.bin


22,在第2个节点看看gpnp profile文件在哪儿
[grid@jingfa2 gpnpd]$ locate gpnp|grep -i --color profile
/u01/grid/11.2.0.4/gpnp/profiles
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/pending.xml
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.old
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml  --我估计就是这个文件
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile_orig.xml
/u01/grid/11.2.0.4/gpnp/profiles/peer
/u01/grid/11.2.0.4/gpnp/profiles/peer/profile.xml
/u01/grid/11.2.0.4/gpnp/profiles/peer/profile_orig.xml




23,查看节点2gpnp profile文件的内容,从/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml文件,发现7d8026436ade6fe0ff597a0f6df497e1这个GUID,可见就是这个文件
    同时我对比了节点1的这个文件,发现0acef774f25dcfb0bf3d0c7b3db02abe在此文件可以找到,所以我尝试手工更新GUID,用0acef774f25dcfb0bf3d0c7b3db02abe替换7d8026436ade6fe0ff597a0f6df497e1


0acef774f25dcfb0bf3d0c7b3db02abe


[grid@jingfa2 gpnpd]$ more /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml|grep -i --color 7d8026436ade6fe0ff597a0f6df497e1
<?xml version="1.0" encoding="UTF-8"?><gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile" 
xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile 
gpnp-profile.xsd" ProfileSequence="7" ClusterUId="7d8026436ade6fe0ff597a0f6df497e1" ClusterName="jingfa-scan" PALocation=""><gpnp:Network-Profile><gpnp:HostNetwork id="gen" 
HostName="*"><gpnp:Network id="net1" IP="192.168.0.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="10.0.0.0" Adapter="eth1" 
Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm" 
LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="/dev/ocr*" 
SPFile="+OCRVOTE/jingfa-scan/asmparameterfile/registry.253.849167179"/><ds:Signature 
xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
<ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><ds:Reference URI=""><ds:Transforms>
<ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"> 
<InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="gpnp orcl xsi"/></ds:Transform></ds:Transforms>
<ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>cPtosOiD17nSId/92MTAPaQ+dLU=</ds:DigestValue></ds:Reference>
</ds:SignedInfo><ds:SignatureValue>Ca56sx6DgsCSxrRqPz2ReOzhkf9eYiqVYuj2XLadwuBURX2PL+nYD7LhLFFj27EpuSIx0SfGVhOPm/i016ws7tWATeSKBJDVyTAELgBEYPsMumW4vKm7rVXs
SbVJolycA3pFHtGqZ7FZjzSXxdj5Xq4LlBLGVWR3gYKnqxuRGv0=</ds:SignatureValue>
</ds:Signature></gpnp:GPnP-Profile>
[grid@jingfa2 gpnpd]$ 


24,调整文件前先备份节点2这个文件
cp /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml  /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml.20150917bak


vi /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml


:s/7d8026436ade6fe0ff597a0f6df497e1/0acef774f25dcfb0bf3d0c7b3db02abe/g


保存即可




25,在节点2重启集群进程,发现节点1的集群进程发生了重启,而且奇怪的是我24步改的又回以了原样,再次强行修改,再重启节点2集群进程
     经过反复尝试,说明gpnp进程会对此文件进行恢复,即使你手工改了也没用


26,即使上面的方法行不通,换另一个方法,查查2个节点AGENT进程有何区别


[root@jingfa1 ~]# ps -ef|grep agent|grep grid|grep -v grep
grid      3647     1  0 09:44 ?        00:00:10 /u01/grid/11.2.0.4/bin/oraagent.bin
root      3660     1  0 09:4
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值