check if private interconnect CRS can trans net heartbeat-1445075.1

Node reboot or eviction: How to check if your private interconnect CRS can transmit network heartbeats (文档 ID 1445075.1)

In this Document


Applies to:

Oracle Server - Enterprise Edition - Version to [Release 10.1 to 11.2]
Information in this document applies to any platform.


Frequently, in the case of node reboots, the log of the CSS daemon processes (ocssd.log) indicates that the network heartbeat from one or more remote nodes was not received (for example, the message "CRS-1610:Network communication with node xxxxxx (3) missing for 90% of timeout interval.  Removal of this node from cluster in 2.656 seconds" appears in the ocssd.log), and that the node subsequently was rebooted (to avoid a split brain or because it was evicted by another node).

The script. in here performs the network connectivity check using ssh.  This check complements ping or traceroute since ssh uses TCP protocol while ping uses ICMP and traceroute in Linux/Unix uses UDP (traceroute on Windows use ICMP).

The network communication involves both the actual physical connection and the OS layer such as IP, UDP, and TCP.

CRS (10g and 11.1) uses TCP to communicate, so using ssh to test the connection as well as TCP and IP layer is a better test than ping or traceroute. 

Because CRS on 11.2 uses UDP to communicate, using ssh to test TCP is not the optimal test, but this test will complement the traceroute test.

The script. tests the private interconnect once every 5 seconds, so this script. will put an insignificant load on the server.


1) Create a file in a location of your choice and copy and paste the lines in the following note box:


export TODAY=`date "+%Y%m%d"`
while [ $TODAY -lt ] # format needs to be YearMonthDate
export TODAY=`date "+%Y%m%d"`
export LOGFILE=/interconnect_test_${TODAY}.log
ssh "hostname; date" >> $LOGFILE 2>&1
ssh "hostname; date" >> $LOGFILE 2>&1

echo "" >> $LOGFILE
echo "" >> $LOGFILE

sleep 5

2) Replace with real private interconnect IP address or private interconnect host name.  The script. will execute the commands, "hostname" and "date", and output to a log file. 

3) If there are more than two nodes in the cluster, add more lines to issue
sh "hostname; date" >> $LOGFILE 2>&1
Make sure that this script. issues ssh to every node including the local node.

4) Replace with a real directory name where the output of this script. will go.
The script. will likely grow less than one MB every day, so you do not need large amount of space.
You can also regularly delete old log files.

5) Replace with the date and year that you want the script. to stop running.  The format has to be YearMonthDate like 20121231 for December 31, 2012. 

6) Save the file and issue "chmod +x " to make the script. executable.

7) Make sure that the ssh works without asking for any password over the private interconnect.
It is best to first test the ssh connection over the private interconnect from all nodes to every other node including itself (local node).

8) Issue "nohup &" to run the script. in background. 
Run this script. from every node in the cluster.


How to interpret the output in the log file:

When there is a problem with the private interconnect or when the node is down, the date shown in log file will not be once every 5 seconds but longer.

If the difference is more than 10 seconds between succeeding dates when the script. was running, then the network/server is having serious delay in transmitting network heartbeats. If the difference is greater than 30 seconds, the node will reboot, so you will likely not see the difference that is greater than 30 seconds.

Find out the approximate time that the node is rebooted and check when the script. show last output before the node is rebooted.  If the time difference is more than 15 seconds, then the network problem is  the cause of the missing network heartbeats.   Investigate the reason that ssh (a regular OS command) hang.


The following script. is an example from the three node cluster:


export TODAY=`date "+%Y%m%d"`
while [ $TODAY -lt 20121231 ] # format needs to be YearMonthDate
export TODAY=`date "+%Y%m%d"`
export LOGFILE=/tmp/interconnect_test_${TODAY}.log
ssh drrac1-priv "hostname; date" >> $LOGFILE 2>&1
ssh drrac2-priv "hostname; date" >> $LOGFILE 2>&1
ssh drrac3-priv "hostname; date" >> $LOGFILE 2>&1

echo "" >> $LOGFILE
echo "" >> $LOGFILE

sleep 5


Database - RAC/Scalability Community
To discuss this topic further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Database - RAC/Scalability Community


来自 “ ITPUB博客 ” ,链接:,如需转载,请注明出处,否则将追究法律责任。


  • 0
  • 0
    觉得还不错? 一键收藏
  • 0
本火锅店点餐系统采用Java语言和Vue技术,框架采用SSM,搭配Mysql数据库,运行在Idea里,采用小程序模式。本火锅店点餐系统提供管理员、用户两种角色的服务。总的功能包括菜品的查询、菜品的购买、餐桌预定和订单管理。本系统可以帮助管理员更新菜品信息和管理订单信息,帮助用户实现在线的点餐方式,并可以实现餐桌预定。本系统采用成熟技术开发可以完成点餐管理的相关工作。 本系统的功能围绕用户、管理员两种权限设计。根据不同权限的不同需求设计出更符合用户要求的功能。本系统中管理员主要负责审核管理用户,发布分享新的菜品,审核用户的订餐信息和餐桌预定信息等,用户可以对需要的菜品进行购买、预定餐桌等。用户可以管理个人资料、查询菜品、在线点餐和预定餐桌、管理订单等,用户的个人资料是由管理员添加用户资料时产生,用户的订单内容由用户在购买菜品时产生,用户预定信息由用户在预定餐桌操作时产生。 本系统的功能设计为管理员、用户两部分。管理员为菜品管理、菜品分类管理、用户管理、订单管理等,用户的功能为查询菜品,在线点餐、预定餐桌、管理个人信息等。 管理员负责用户信息的删除和管理,用户的姓名和手机号都可以由管理员在此功能里看到。管理员可以对菜品的信息进行管理、审核。本功能可以实现菜品的定时更新和审核管理。本功能包括查询餐桌,也可以发布新的餐桌信息。管理员可以查询已预定的餐桌,并进行审核。管理员可以管理公告和系统的轮播图,可以安排活动。管理员可以对个人的资料进行修改和管理,管理员还可以在本功能里修改密码。管理员可以查询用户的订单,并完成菜品的安排。 当用户登录进系统后可以修改自己的资料,可以使自己信息的保持正确性。还可以修改密码。用户可以浏览所有的菜品,可以查看详细的菜品内容,也可以进行菜品的点餐。在本功能里用户可以进行点餐。用户可以浏览没有预定出去的餐桌,选择合适的餐桌可以进行预定。用户可以管理购物车里的菜品。用户可以管理自己的订单,在订单管理界面里也可以进行查询操作。


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


