分析11.2.0.3 rac CRS-1714:Unable to discover any voting files

最新推荐文章于 2024-01-18 09:53:05 发布

clg10051

最新推荐文章于 2024-01-18 09:53:05 发布

阅读量1.7k

点赞数 1

文章标签：数据库后端运维

本文详细分析了Oracle 11.2.0.3 RAC环境中遇到的CRS-1714错误，即无法发现任何投票文件的问题。文章深入探讨了解决此类问题的过程，并提供了相关资源链接。

摘要由CSDN通过智能技术生成

结论：

1，11.2.0.3或者说ORACLE不同版本的RAC进程依赖机制一直在发展演化，一定要尽力搞清RAC各进程间依赖关系，到关重要
2，CRS-1714:Unable to discover any voting files只是表面现象，并非真正是VOTING DISK损坏，具体需要你结合对应的LOG进行分析
3，如果RAC节点的GPNPD进程所用的配置文件PROFILE.XML（OLR），可能要重建损坏的节点
4，删除RAC节点以及添加节点，一定要详细查看官方手册，因为里面分类很多
5，最重要的一点，如果在分析LOG日志，卡住没思路或从未碰过类似问题，一定要查看MOS，搜索关键字，比如本案例的GPNP PROFILE

分析过程：

1,redhat 6.4上面的2节点11.2。0.4 RAC的CRSD进程没有启动，从集群ALERT日志发现，找不到表决磁盘
2015-09-16 16:53:36.138
[cssd(25059)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/grid/11.2.0.4/log/jingfa1/cssd/ocssd.log
2015-09-16 16:53:51.176

2,运行如下命令关闭2个节点的所在ORACLE相关进程
/u01/grid/11.2.0.4/bin/crsctl stop crs

3，确认2个节点的ORACLE进程全部关闭

ps -ef|grep d.bin
root 1077 24425 0 09:00 pts/1 00:00:00 grep d.bin

4,在第1个节点以独占方式启动CRS
/u01/grid/11.2.0.4/bin/crsctl start crs -excl -nocrs

5,在第1个节点查看ASM进程是否启动

6,在第1个节点查看集群进程是否以独占方式启动

7,在第1个节点查看ocr磁盘是否工作正常
/u01/grid/11.2.0.4/bin/ocrcheck

8，如果ocr磁盘工作不正常，且其备份存在，可用备份恢复ocr磁盘
/u01/grid/11.2.0.4/bin/ocrconfig -showbackup

/u01/grid/11.2.0.4/bin/ocrconfig -restore ocr备份文件

9，在第1个节点以GRID用户查看OCR及VOTING DISK磁盘组是否存在，发现存在
1* select disk_number,path from v$asm_disk
SQL> /

DISK_NUMBER PATH
----------- --------------------------------------------------
0 /dev/ocr_vote
0 /dev/data

SQL>
SQL>
SQL> show parameter disk_

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups string DATA
asm_diskstring string /dev/*
SQL> select name,sector_size,block_size,allocation_unit_size/1024/1024 as au_mb from v$asm_diskgroup;

NAME SECTOR_SIZE BLOCK_SIZE AU_MB
------------------------------ ----------- ---------- ----------
DATA 512 4096 2
OCRVOTE 512 4096 2

10，在第1个节点确认VOTING DISK是否工作不正常，确实发现不了
/u01/grid/11.2.0.4/bin/crsctl query css votedisk

11,从上述第9步的asm_diskgroups发现，仅加载一个ASM磁盘组DATA，而没有加载OCRVOTE，所以调整其参数，让ASM实例启动时加载OCRVOTE及DATA磁盘组，这样
我想就可以在ASM实例启时自动加载VOTING DISK磁盘组了

alter system set asm_diskgroups=data,ocrvote sid='*';

show parameter disk_

12，关闭节点1的CRS集群相关进程
/u01/grid/11.2.0.4/bin/crsctl stop crs

13,重启2个节点的集群进程，确认crsd进程是否正常,发现问题依旧，还是找不到表决磁盘
/u01/grid/11.2.0.4/bin/crsctl start crs

14，关闭2个节点的集群进程，然后在节点1以独占方式启动集群进程

/u01/grid/11.2.0.4/bin/crsctl stop crs

/u01/grid/11.2.0.4/bin/crsctl start crs -excl -nocrs

15，在节点1直接替换ocrvote磁盘组，修复voting disk
/u01/grid/11.2.0.4/bin/crsctl replace votedisk +ocrvote

16,在节点1查看voting disk是否正常
/u01/grid/11.2.0.4/bin/crsctl query css votedisk

17,关闭节点的集群进程，然后在2节点重启集群进程
/u01/grid/11.2.0.4/bin/crsctl stop crs

/u01/grid/11.2.0.4/bin/crsctl start crs

18，在2个节点确认VOTING DISK是否可以正常工作（如下命令必须CRSD进程启动才有结果，否则为空，且CRSD进程是在集群所有进程最后一个启动），这下节点1正常了，但节点2还是CRSD进程启不来
/u01/grid/11.2.0.4/bin/crsctl query css votedisk

19,查看节点2的GRID用户的TRC文件，发现节点2的VOTING DISK的CLUSTER GUID标识和GPNP PROFILE不一致，所以最终节点2发现不了VOTING DISK
2015-09-16 17:58:51.847: [ CSSD][1851041536]clssnmvDiskVerify: discovered a potential voting file
2015-09-16 17:58:51.847: [ SKGFD][1851041536]Handle 0x7fd95808f980 from lib :UFS:: for disk :/dev/ocr_vote:

---这里GPNP进程发现VOTING DISK的GUID和CLUSTER GUID不相同
2015-09-16 17:58:51.965: [ CSSD][1851041536]clssnmvDiskCreate: Cluster guid 0acef774f25dcfb0bf3d0c7b3db02abe found in voting disk /dev/ocr_vote does not match with the
cluster guid 7d8026436ade6fe0ff597a0f6df497e1 obtained from the GPnP profile
--移除了VOTING DISK
2015-09-16 17:58:51.965: [ CSSD][1851041536]clssnmvDiskDestroy: removing the voting disk /dev/ocr_vote
2015-09-16 17:58:51.965: [ SKGFD][1851041536]Lib :UFS:: closing handle 0x7fd95808f980 for disk :/dev/ocr_vote:
--找不到VOTING DISK
2015-09-16 17:58:51.965: [ CSSD][1851041536]clssnmvDiskVerify: Successful discovery of 0 disks
2015-09-16 17:58:51.965: [ CSSD][1851041536]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2015-09-16 17:58:51.965: [ CSSD][1851041536]clssnmvFindInitialConfigs: No voting files found
2015-09-16 17:58:51.965: [ CSSD][1851041536](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds

21,我们在第2个节点看看GPNP进程是个什么东西
[grid@jingfa2 jingfa2]$ ps -ef|grep -i gpnp
grid 5238 32255 0 10:02 pts/1 00:00:00 grep -i gpnp
grid 18060 1 0 09:45 ? 00:00:01 /u01/grid/11.2.0.4/bin/gpnpd.bin

22,在第2个节点看看gpnp profile文件在哪儿
[grid@jingfa2 gpnpd]$ locate gpnp|grep -i --color profile
/u01/grid/11.2.0.4/gpnp/profiles
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/pending.xml
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.old
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml --我估计就是这个文件
/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile_orig.xml
/u01/grid/11.2.0.4/gpnp/profiles/peer
/u01/grid/11.2.0.4/gpnp/profiles/peer/profile.xml
/u01/grid/11.2.0.4/gpnp/profiles/peer/profile_orig.xml

23，查看节点2gpnp profile文件的内容,从/u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml文件，发现7d8026436ade6fe0ff597a0f6df497e1这个GUID，可见就是这个文件
同时我对比了节点1的这个文件，发现0acef774f25dcfb0bf3d0c7b3db02abe在此文件可以找到，所以我尝试手工更新GUID，用0acef774f25dcfb0bf3d0c7b3db02abe替换7d8026436ade6fe0ff597a0f6df497e1

0acef774f25dcfb0bf3d0c7b3db02abe

[grid@jingfa2 gpnpd]$ more /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml|grep -i --color 7d8026436ade6fe0ff597a0f6df497e1
<?xml version="1.0" encoding="UTF-8"?><gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile"
xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile
gpnp-profile.xsd" ProfileSequence="7" ClusterUId="7d8026436ade6fe0ff597a0f6df497e1" ClusterName="jingfa-scan" PALocation=""><gpnp:Network-Profile><gpnp:HostNetwork id="gen"
HostName="*"><gpnp:Network id="net1" IP="192.168.0.0" Adapter="eth0" Use="public"/><gpnp:Network id="net2" IP="10.0.0.0" Adapter="eth1"
Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><orcl:CSS-Profile id="css" DiscoveryString="+asm"
LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="/dev/ocr*"
SPFile="+OCRVOTE/jingfa-scan/asmparameterfile/registry.253.849167179"/><ds:Signature
xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
<ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><ds:Reference URI=""><ds:Transforms>
<ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#">
<InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="gpnp orcl xsi"/></ds:Transform></ds:Transforms>
<ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>cPtosOiD17nSId/92MTAPaQ+dLU=</ds:DigestValue></ds:Reference>
</ds:SignedInfo><ds:SignatureValue>Ca56sx6DgsCSxrRqPz2ReOzhkf9eYiqVYuj2XLadwuBURX2PL+nYD7LhLFFj27EpuSIx0SfGVhOPm/i016ws7tWATeSKBJDVyTAELgBEYPsMumW4vKm7rVXs
SbVJolycA3pFHtGqZ7FZjzSXxdj5Xq4LlBLGVWR3gYKnqxuRGv0=</ds:SignatureValue>
</ds:Signature></gpnp:GPnP-Profile>
[grid@jingfa2 gpnpd]$

24,调整文件前先备份节点2这个文件
cp /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml.20150917bak

vi /u01/grid/11.2.0.4/gpnp/jingfa2/profiles/peer/profile.xml

:s/7d8026436ade6fe0ff597a0f6df497e1/0acef774f25dcfb0bf3d0c7b3db02abe/g

保存即可

25，在节点2重启集群进程，发现节点1的集群进程发生了重启，而且奇怪的是我24步改的又回以了原样，再次强行修改，再重启节点2集群进程
经过反复尝试，说明gpnp进程会对此文件进行恢复，即使你手工改了也没用

26，即使上面的方法行不通，换另一个方法，查查2个节点AGENT进程有何区别

[root@jingfa1 ~]# ps -ef|grep agent|grep grid|grep -v grep
grid 3647 1 0 09:44 ? 00:00:10 /u01/grid/11.2.0.4/bin/oraagent.bin
root 3660 1 0 09:4