一个CRS节点无法安装的故障分析

前些天有一个朋友安装CRS的时候无法找到节点:

 

$./cluvfy/runcluvfy.sh stage -post hwos -n node1,node2 -verbose

notice: JAVA_HOME not set in the environment.

Performingpost-checks for hardware and operating system setup

Checking nodereachability...

Check: Nodereachability from node "node1"

  DestinationNode                     Reachable?              

  ------------------------------------  ------------------------

 node1                                no                      

 node2                                no                      

Result: Node reachability check failed from node "node1".

   

碰到这个问题,首先我问他hostname是否就是Node1,node2,而且这两个地址是否正确的在/etc/hosts里定义了。他回答是这些都没错,然后我问他网卡的顺序是否正确,在HP-UX下有时候私网和公网的网卡在两台服务器上的设备名不对应,也可能会出一些莫名其妙的错误。经过确认都没问题。

检查hosts文件和services文件的属性,也都没问题,world有r的权限。看样子问题不是出在这些地方。那么下面该怎么处理呢?

在常规方法无法找到问题的时候,使用操作系统跟踪工具对某个出问题的配置过程进行跟踪是十分有效的方法,我建议他用tusc跟踪一下,看看能有什么发现。

很快在tusc里面我们发现了疑点:

stat("/usr/sbin/ping",0x742ff5e4) ....................................................................= 0

access("/usr/sbin/ping", X_OK)........................................................................ = 0

pipe() ................................................................................................= 9 (10)

pipe()................................................................................................= 11 (12)

pipe()................................................................................................= 13 (14)

vfork() ............................................................ (returningas child ...) ......... , 2452

execve("/tmp/2437/jre/bin/HPUXChildWrapper", 0x1923d0, 0xf5230)....................................... [entry]

                             argv[0] @ 0x192410: "/tmp/2437/jre/bin/HPUXChildWrapper"

                             argv[1] @ 0x742ff460: "9"

                             argv[2] @ 0x742ff470: "12"

                             argv[3] @ 0x742ff480: "14"

                             argv[4] @ 0xffffffffc1be1140: 

                             argv[5] @ 0x742ff038: "/usr/sbin/ping"

                             argv[6] @ 0x192170: "node1"

                             argv[7] @ 0x192188: "-n"

                             argv[8] @ 0x1921a0: "1"

                             argv[9] @ 0x1921e8: "-m"

                             argv[10] @ 0x192200: "3"

                              env[0] @ 0x77ff00f2: "_=/tmp/2437/jre/bin/java"

                              env[1] @ 0x77ff010b:"MANPATH=/usr/share/man/%L:/usr/share/man:/usr/contrib/man/%L:/usr/contrib/man:/usr/local/man/%L:/usr/local/man:/opt/mx/share/man:/opt/upgrade/share/man/%L:/opt/upgrade/share/man:/opt/pd/share/man/%L:/opt/pd/share/man:/opt/pd/share/man/%L:/opt/pd/share/man:/opt/pd/share/man/%L:/opt/pd/share/man:/opt/resmon/share/man:/opt/gnome/man:/opt/openssl/man:/opt/openssl/prngd/man:/opt/wbem/share/man:/opt/hparray/share/man/%L:/opt/hparray/share/man:/opt/graphics/common/man:/usr/dt/share/man:/opt/samba/man:/opt/perl/man:/opt/ignite/share/man/%L:/opt/ignite/share/man:/opt/ssh/share/man"

                              env[2] @ 0x77ff034e: "SHLIB_PATH=/tmp/2437/jre/lib/PA_RISC2.0:/tmp/2437/jre/lib/PA_RISC2.0/server:/tmp/2437/jre/../lib/PA_RISC2.0:/tmp/2437/lib32:/tmp/2437/srvm/lib32:"

                              env[3] @ 0x77ff03e0: "PATH=/usr/bin:/usr/ccs/bin:/usr/contrib/bin:/opt/hparray/bin:/opt/nettladm/bin:/opt/upgrade/bin:/opt/fcms/bin:/opt/pd/bin:/opt/resmon/bin:/opt/gnome/bin:/usr/bin/X11:/usr/contrib/bin/X11:/opt/mozilla:/opt/wbem/bin:/opt/wbem/sbin:/opt/graphics/common/bin:/usr/sbin/diag/contrib:/opt/mx/bin:/opt/perl/bin:/opt/ssh/bin:."

                              env[4] @ 0x77ff051e: "COLUMNS=125"

                              env[5] @ 0x77ff052a: "ORACLE_BASE=/home/oracle/base"

                             env[6] @ 0x77ff0548: "EDITOR=vi"

                              env[7] @ 0x77ff0552: "LOGNAME=oracle"

                              env[8] @ 0x77ff0561: "CV_DESTLOC=/tmp"

                              env[9] @ 0x77ff0571: "ERASE=^H"

                              env[10] @ 0x77ff057a: "CV_JDKHOME=/tmp/2437/jre"

                              env[11] @ 0x77ff0593: "CRS_HOME=/home/oracle/crsHome"

                              env[12] @ 0x77ff05b1: "SHELL=/sbin/sh"

                              env[13] @ 0x77ff05c0: "HOME=/home/oracle"

                              env[14] @ 0x77ff05d2: "LD_LIBRARY_PATH=/tmp/2437/lib:"

                              env[15] @ 0x77ff05f1: "TERM=vt100"
                              env[16] @ 0x77ff05fc: "CV_HOME=/tmp/2437"
                              env[17] @ 0x77ff060e: "PWD=/home/oracle"
                              env[18] @ 0x77ff061f: "TZ=EAT-8"
                              env[19] @ 0x77ff0628: "LINES=47"
                              env[20] @ 0xffffffffc1be0fa8: 
                              env[21] @ 0xffffffffc1be0fd0:

 

从上述的TRACE我们看出,检查过程中调用了Ping,估计CRS安装时候使用PING命令去检查节点的健康性。从后面的参数来看,我们看到其中有-n -mping的参数,于是我让他执行一下/usr/sbin/pingnode1 -n 1 -m 3 

反馈的信息让我眼前一亮,居然发现ping没有-m选项:

# ping node1 -n1

PING node1: 64 byte packets

64 bytes from 192.168.1.150: icmp_seq=0. time=0. ms

----node1 PINGStatistics----

1 packets transmitted, 1 packets received, 0% packet loss

round-trip (ms)  min/avg/max = 0/0/0

# ping node1 -n 1  -m 3

Usage:  ping [-oprv] [-i address] [-t ttl] host [-n count]

        ping [-oprv] [-i address] [-t ttl]host packet-size [[-n] count]

这下子就很明确了,估计是HP-UX的补丁没有打全,经过检查,发现GOLDQPK11i 没有打,打了补丁后,测试一下/usr/sbin/pingnode1 -n 1 -m 3 ,一切都OK了,于是再安装CRS,一切正常。

这个案例主要让大家学会如何通过TUSC去做深入的诊断,如果发现问题后胡乱的从网上去找解决方案,肯定是不行的。解决问题,思路最重要。

作者:白鳝

  • 13
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值