AIX 6.1下RAC安装问题小结

最近在AIX 6.1上安装了一套RAC,实施得不多,遇到不少问题,记录一下:

OS版本:

HOST_NAM_1:/#oslevel -s
6100-04-02-1007

HA版本:

HOST_NAM_1:/#lslpp -l cluster.*
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
cluster.adt.es.client.include
5.5.0.0 COMMITTED ES Client Include Files
cluster.adt.es.client.samples.clinfo
5.5.0.0 COMMITTED ES Client CLINFO Samples
cluster.adt.es.client.samples.clstat
5.5.0.1 COMMITTED ES Client Clstat Samples


### RSH 报错

#rsh HOST_NAM_2 date
rshd: 0826-813 Permission is denied.

相关文件配置:

#cat .rhosts
HOST_NAM_1 root
HOST_NAM_2 root
HOST_NAM_1 oracle
HOST_NAM_2 oracle
 
#cat /etc/hosts.equiv
HOST_NAM_1 root
HOST_NAM_2 root
HOST_NAM_1 oracle
HOST_NAM_2 oracle

其中HOST_NAM_1、HOST_NAM_2是HOSTNAME。

这里主要是/etc/hosts文件中,HOSTNAME不能当作别名,或者,”.rhosts”、”hosts.equiv”里不要配别名,应该是跟解析有关。

原HOSTS配置:

175.16.1.11    HOST_NAM_1_boot1  HOST_NAM_1
175.16.1.12 HOST_NAM_2_boot1 HOST_NAM_2
 
192.168.10.17 HOST_NAM_1_boot2
192.168.10.18 HOST_NAM_2_boot2
 
192.168.10.16 HOST_NAM_2_vip
192.168.10.15 HOST_NAM_1_vip

改为:

175.16.1.11    HOST_NAM_1_boot1  
175.16.1.12 HOST_NAM_2_boot1
 
192.168.10.17 HOST_NAM_1 HOST_NAM_1_boot2
192.168.10.18 HOST_NAM_2 HOST_NAM_2_boot2
 
192.168.10.16 HOST_NAM_2_vip
192.168.10.15 HOST_NAM_1_vip

### rootpre.sh报错

这个安装前在文档中有到看到,作为注意事项记录一下:
The Oracle 10gR2 OUI and configuration assistant programs do not recognize AIX 6 V6.1 as a supported release.
执行rootpre.sh时会报:

Configuring Asynchronous I/O....
 
Asynchronous I/O is not installed on this system.
 
You will need to install it, and either configure it yourself using
 
'smit aio' or rerun the Oracle root installation procedure.
 
 
 
Configuring POSIX Asynchronous I/O....
 
Posix Asynchronous I/O is not installed on this system.
 
You will need to install it, and either configure it yourself using
 
'smit aio' or rerun the Oracle root installation procedure.

解决方法:下载6718715补丁,执行里面的rootpre.sh

参考文档:282036.1

### VIPCA 报错

VIPCA时,VIP起不来,日志报错信息:

Interface en4 checked failed (host=HOST_NAM_1)
Invalid parameters, or failed to bring up VIP (host=HOST_NAM_1)

原因:VIP绑定的是小机集成的网卡 Logical Host Ethernet Port (lp-hea)
The entstat output for LHEA is different from a regular adapter

解决方法:
修改racgvip脚本,找到
$ENTSTAT -d $_IF 这行,修改为:

$ENTSTAT -d $_IF | $GREP -iEq '.*lan.*state.*:.*operational.*|.*link.*status.*:.*up.*|.*port.*operational.*state.*:.*up.*|.*driver.*flags.*:.*up.*'

参考文档:959746.1

### ONS 起不来

日志报错信息:

Failed to get IP for localhost (0)
Failed to get IP for localhost (0)
Failed to get IP for localhost (0)
onsctl: ons failed to start

解决方法:
原hosts文件中找不到localhost:

127.0.0.1    loopback

改为:

127.0.0.1    loopback localhost

### CRS 升级10.2.0.4报错

升级完成后执行root102.sh,报:

# ./root102.sh
Error : Please change the CRS_ORACLE_USER id oracle
to have the following OS capabilities :
< CAP_PROPAGATE CAP_BYPASS_RAC_VMM CAP_NUMA_ATTACH >

解决方法:

#chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE,CAP_NUMA_ATTACH oracle
#lsuser -f oracle | grep capabilities
capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE,CAP_NUMA_ATTACH

这个报错之前遇到过,升级的文档中,也有提到。

### CRS 升级10.2.0.4后,VIP起不来

这次的有点难搞,日志中没有太多的信息,只有一行:

Invalid parameters, or failed to bring up VIP (host=HOST_NAM_1)

后来使用crsctl对VIP进行debug,收集更多的信息:

#crsctl debug log res "ora.host_nam_2.vip:5" 
Set Resource Debug Module: ora.host_nam_2.vip Level: 5
#srvctl start nodeapps -n host_nam_2
CRS-0233: Resource or relatives are currently involved with another operation.
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:25 GMT+08:00 2010 [ 360824 ] Checking interface existance
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:25 GMT+08:00 2010 [ 360824 ] Calling getifbyip
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:25 GMT+08:00 2010 [ 360824 ] getifbyip: started for 192.168.10.16
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:26 GMT+08:00 2010 [ 360824 ] getifbyip: checking if failover is happening ()
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:26 GMT+08:00 2010 [ 360824 ] getifbyip: failover is not happening ()
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:26 GMT+08:00 2010 [ 360824 ] Completed getifbyip
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:26 GMT+08:00 2010 [ 360824 ] ping_vip 192.168.10.16 started
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:26 GMT+08:00 2010 [ 360824 ] About to execute : /usr/sbin/ping -c 1 -w 1 192.168.10.16
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:27 GMT+08:00 2010 [ 360824 ] ping_vip: 192.168.10.16 is not pingable, _count = 1
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:27 GMT+08:00 2010 [ 360824 ] Completed with initial interface test
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:27 GMT+08:00 2010 [ 360824 ] Broadcast = 192.168.10.255
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:27 GMT+08:00 2010 [ 360824 ] Interface tests
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:27 GMT+08:00 2010 [ 360824 ] checkIf: start for if=en4
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:27 GMT+08:00 2010 [ 360824 ] IsIfAlive: start for if=en4
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:27 GMT+08:00 2010 [ 360824 ] defaultgw: started
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:27 GMT+08:00 2010 [ 360824 ] defaultgw: completed with 192.168.10.254
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:27 GMT+08:00 2010 [ 360824 ] About to execute command: /usr/sbin/ping -S 192.168.10.18 -c 1 -w 1 192.168.10.254
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:28 GMT+08:00 2010 [ 360824 ] About to execute command: /usr/sbin/ping -S 192.168.10.18 -c 1 -w 1 192.168.10.254
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:29 GMT+08:00 2010 [ 360824 ] IsIfAlive: RX packets checked if=en4 failed
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:29 GMT+08:00 2010 [ 360824 ] Interface en4 checked failed (host=HOST_NAM_2)
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:29 GMT+08:00 2010 [ 360824 ] IsIfAlive: end for if=en4
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:29 GMT+08:00 2010 [ 360824 ] checkIf: end for if=en4
host_nam_2:ora.host_nam_2.vip:Invalid parameters, or failed to bring up VIP (host=HOST_NAM_2)
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:30 GMT+08:00 2010 [ 307376 ] Checking interface existance
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:30 GMT+08:00 2010 [ 307376 ] Calling getifbyip
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:30 GMT+08:00 2010 [ 307376 ] getifbyip: started for 192.168.10.16
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:30 GMT+08:00 2010 [ 307376 ] getifbyip: checking if failover is happening ()
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:30 GMT+08:00 2010 [ 307376 ] getifbyip: failover is not happening ()
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:30 GMT+08:00 2010 [ 307376 ] Completed getifbyip
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:30 GMT+08:00 2010 [ 307376 ] ping_vip 192.168.10.16 started
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:30 GMT+08:00 2010 [ 307376 ] About to execute : /usr/sbin/ping -c 1 -w 1 192.168.10.16
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:31 GMT+08:00 2010 [ 307376 ] ping_vip: 192.168.10.16 is not pingable, _count = 1
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:31 GMT+08:00 2010 [ 307376 ] Completed with initial interface test
host_nam_2:ora.host_nam_2.vip:Thu Mar 25 13:57:31 GMT+08:00 2010 [ 307376 ] Broadcast = 192.168.10.255
CRS-1006: No more members to consider
CRS-0215: Could not start resource 'ora.host_nam_2.vip'.
CRS-0210: Could not find resource ora.host_nam_2.LISTENER_HOST_NAM_2.lsnr.

这才搜索到了相关信息:

Bug 8413088: VIP CANNOT START ON AIX 6.1 BECAUSE NETSTAT HAS A NEW COLUMN.
Bug 9157855: DURING RESTART OR WHEN ONE OF THE TWO NODE CLUSTER IS DOWN, VIP RESOURCE FAILS

这个问题,在打完CRS PSU后,也同样有可能存在,可以通过修改racvip脚本来解决:

10.2.0.4:

_O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`
_O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"`

打完PSU后:

_O1=`$NETSTAT -n -I $_IF -p tcp | $GREP -iE ".*packets received$" | $AWK "{print \\$1; exit}"`
_O2=`$NETSTAT -n -I $_IF -p tcp | $GREP -iE ".*packets received$" | $AWK "{print \\$1; exit}"`

最终改为:

_O1=`$NETSTAT -n -I $_IF -p ip | $GREP -iE ".*packets received$" | $AWK "{print \\$1; exit}"`
_O2=`$NETSTAT -n -I $_IF -p ip | $GREP -iE ".*packets received$" | $AWK "{print \\$1; exit}"`

这个问题很杯具,花了很多时间,安装前,阅读相关文档时,我就注意到了这个BUG,两次VIP起不来,我都拿去对比,看看是不是这个BUG。结果还是没发现,一直到debug出来。

### RDBMS升10204时,报java进程没停

Oracle Universal Installer has detected that
there are processes running in the
currently selected Oracle Home. The
following processes need to be shutdown
before continuing:
java

刚开始时,还有CRS的进程,停掉CRS后,还有一个java始终过不去,用fuser查使用$ORACLE_HOME目录进程,全kill,把ps -ef | grep java 出来的进程,除了安装的进程外都杀了,还是不行。

最后在Metalink上找到了解决方法,升级前:

cd /usr/sbin/
mv fuser fuser.orig
touch /usr/sbin/fuser
chmod +x /usr/sbin/fuser

升级完成后,再改回来:

cd /usr/sbin/
cp fuser.orig fuser

这招很阴啊。。

参考文档:975597.1

### 数据库打完PSU补丁后,启CRS报错

在CRS、database都升级、打补丁完成后,启CRS、VIP等资源时,报错:

HOST_NAM_1:/#crsctl start crs
exec(): 0509-036 Cannot load program /app/oracle/product/10204/db_1/bin/crsctl.bin because of the following errors:
0509-150 Dependent module libhasgen10.a(shr_hasgen10.o) could not be loaded.
0509-022 Cannot load module libhasgen10.a(shr_hasgen10.o).
0509-026 System error: A file or directory in the path name does not exist.
 
HOST_NAM_1:/app/oracle/product/10204/db_1/bin#./srvctl stop nodeapps -n host_nam_1
./srvctl[187]: %s_jreLocation%/bin/java: not found.
 
HOST_NAM_1:/app/oracle/product/10204/crs_1/lib#srvctl stop nodeapps -n host_nam_1
/app/oracle/product/10204/db_1/bin/srvctl[187]: %s_jreLocation%/bin/java: not found.

解决方法:改变环境变量,使从crs_/bin/目录下运行这些命令。

### DBCA时报错

DBCA时,在创建实例这步时,报错:

ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 59
ORA-27301: OS failure message: Message too long
ORA-27302: failure occurred at: sskgxpsnd1
ORA-27303: additional information: MTU verification failed to send msg

原因:The problem was caused by incorrect UDP and TCP packet settings.

解决方法:修改以下参数:

no -o tcp_sendspace=262144
no -o tcp_recvspace=262144
no -o udp_sendspace=65536
no -o udp_recvspace=262144
no -o rfc1323=1

之前的参数都偏小:

tcp_sendspace 131072 
tcp_recvspace 131072

使用no -a查看参数设置

参考文档:300956.1

— The End —

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/777981/viewspace-670607/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/777981/viewspace-670607/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值