http://blog.csdn.net/tianlesoftware/article/details/6728885
一. 概述
在之前的文章:
http://blog.csdn.net/tianlesoftware/article/details/5331067
提到OCSSD 这个进程是Clusterware最关键的进程,如果这个进程出现异常,会导致系统重启,这个进程提供CSS(Cluster Synchronization Service)服务。 CSS 服务通过多种心跳机制实时监控集群状态,提供脑裂保护等基础集群服务功能。
CSS 服务有2种心跳机制: 一种是通过私有网络的Network Heartbeat,另一种是通过Voting Disk的Disk Heartbeat.
这2种心跳都有最大延时,对于Disk Heartbeat, 这个延时叫作IOT (I/O Timeout);对于Network Heartbeat, 这个延时叫MC(Misscount)。 这2个参数都以秒为单位,缺省时IOT大于MC,在默认情况下,这2个参数是Oracle 自动判定的,并且不建议调整。
可以通过如下命令来查看参数值:
$crsctl get css disktimeout
$crsctl get css misscount
如:
[oracle@rac1 ~]$ crsctl get css disktimeout
200
[oracle@rac1 ~]$ crsctl get css misscount
60
这是这2个参数的默认值。
二. MOS 上相关的几篇文章
How to start/stop the 10g CRS ClusterWare[ID 309542.1]
10g RAC: Steps To Increase CSS Misscount,Reboottime and Disktimeout [ID 284752.1]
CSS Timeout Computation in OracleClusterware [ID 294430.1]
RAC Assurance Support Team: RAC and OracleClusterware Starter Kit and Best Practices (Generic) [ID 810394.1]
2.1修改CSS Misscount 步骤:
1)Shut down CRS on all but one node. For exact steps use Note 309542.1
2)Execute crsctl as root to modify the misscount:
$ORA_CRS_HOME/bin/crsctl set css misscount
where is the maximum i/o latency to the voting disk +1 second
3)Reboot the node where adjustment was made
4)Start all other nodes shutdown in step 1
With the Patch:4896338 for 10.2.0.1 thereare two additional settings that can be tuned. This change is incorporated into the 10.2.0.2 and 10.1.0.6patchsets.
These following are only relevant on10.2.0.1 with Patch:4896338,In addition to MissCount, CSS now has two more parameters:
1)reboottime (default 3 seconds) - the amount of time allowed for a node to complete a reboot after the CSS daemon hasbeen evicted. (I.E. how long does ittake for the machine to completely shutdown when you do a reboot)
2)disktimeout (default 200 seconds) - the maximum amount of time allowed for a voting file I/O to complete; if thistime is exceeded the voting disk will be marked as offline. Note that this is also the amount of timethat will be required for initial cluster formation, i.e. when no nodes havepreviously been up and in a cluster.
$CRS_HOME/bin/crsctl set css reboottime [-force] ( is seconds)
$CRS_HOME/bin/crsctl set css disktimeout [-force] (is seconds)
Confirm the new css misscount setting via ocrdump
2.2 CSS Timeout Computation in OracleClusterware
2.2.1 MISSCOUNTDEFINITION AND DEFAULT VALUES
The CSS misscount parameterrepresents the maximum time, in seconds, that a network heartbeat can be missedbefore entering into a cluster reconfiguration to evict the node. The followingare the default values for the misscount parameter and their respectiveversions when using Oracle Clusterware* in seconds:
*CSS misscount default value when using vendor (non-Oracle)clusterware is 600 seconds. This is to allow the vendor clusterwareample time to resolve any possible split brain scenarios.
On AIX platforms with HACMP starting with 10.2.0.3 BP#1, themisscount is 30. This is documented in Note551658.1
2.2.2 CSS HEARTBEATMECHANISMS AND THEIR INTERRELATIONSHIP
The synchronization servicescomponent (CSS) of the Oracle Clusterware maintains two heartbeat mechanisms
1.) the disk heartbeat to the voting deviceand
2.) the network heartbeat across theinterconnect which establish and confirm valid node membership in the cluster.
Bothof these heartbeat mechanisms have an associated timeout value. The diskheartbeat has an internal i/o timeout interval (DTO Disk TimeOut), in seconds,where an i/o to the voting disk must complete. The misscount parameter (MC), asstated above, is the maximum time, in seconds, that a network heartbeat can be missed. The disk heartbeat i/o timeout interval is directly related tothe misscount parameter setting. There has been some variation in thisrelationship
between versions as described below:
9.x.x.x | NOTE, MISSCOUNT WAS A DIFFERENT ENTITY IN THIS RELEASE |
10.1.0.2 | No one should be on this version |
10.1.0.3 | DTO = MC - 15 seconds |
10.1.0.4 | DTO = MC - 15 seconds |
10.1.0.4+Unpublished Bug 3306964 | DTO = MC - 3 seconds |
10.1.0.4 with CRS II Merge patch | DTO =Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration |
10.1.0.5 | IOT = MC - 3 seconds |
10.2.0.1 +Fix for unpublished Bug 4896338 | IOT=Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration |
10.2.0.2 | Same as above (10.2.0.1 with Patch Bug:4896338 |
10.1 - 11.1 | During node join and leave (reconfiguration) in a cluster we need to reconfigure, in that particular case we use Short Disk TimeOut (SDTO) which is in all versions SDTO = MC â |
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/24867586/viewspace-711638/,如需转载,请注明出处,否则将追究法律责任。
<%=items[i].content%>
<%if(items[i].items.items.length) { %><%=items[i].items.items[j].username%> 回复 <%=items[i].items.items[j].tousername%>: <%=items[i].items.items[j].content%>
最新文章
- AutoInstallSoftware【ORACLE】
- Oracle DgFailOver
- shell scripts
- expdp Hit bug( Bug 5879865)
- How to migrate data from Oracle to MSSQLSERVER
- why need Diagnostics Pack + Tuning Pack
- ASM disk group mount fails with ORA-15036
- How to check whether the current database in using Oracle options
- ORA-02049: timeout: distributed transaction waiting for lock
- what the root cause connect db slow
转载于:http://blog.itpub.net/24867586/viewspace-711638/