吐槽一下,PowerHA 7.1借助CAA进行群集管理配置,实在是太多Bug。原来要测试一下cl_rsh命令,竟然卡在第一步定义cluster上。
AIX版本:6100-08-02-1316;HA版本:7.1.2.0
ERROR: failed to create the cluster according to the provided specifications. Running the delete code to attempt to ensure that no partial instance of the cluster remains... Host not found, try again. Warning: could not resolve "test42" to an FQDN and an alias. Might be a private, non-resolvable label ("test42"). Warning: a valid entry is missing from the /etc/cluster/rhosts file. A boot IP address or fully qualified host name for each node must be entered in that file on all nodes in the cluster. Please consider adding either "test42" or "192.168.3.42" to all the /etc/cluster/rhosts files in your cluster, then restart clcomd on each node. For example: echo "192.168.3.42" >>/etc/cluster/rhosts stopsrc -s clcomd; sleep 2; startsrc -s clcomd |
查询了一下,主要是FQDN(Full Qualified Domain Name)解析的问题。CAA依赖的clcomd后台进程会要求对主机名反向解析,而这个解析借助的是host -n <hostname>命令。这是什么鬼,执行host不带参数看usage:
Usage: host [-n] HostName The -n uses a newer version of the host command |
只是说新版本,不知道有什么高级功能,用truss跟踪了一下,调用了/usr/bin/hostnew,而这个hostnew必须访问/etc/resolv.conf,原来就是新版本的nslookup。
创建空的/etc/resolv.conf,host -n会等待直到超时报错,修改/etc/netsvc.conf指定解析顺序,也没有用(但是这个对原始的host命令有效)。
#!/bin/ksh if [ $1 = "-n" ] then cmd_par=$2 else cmd_par=$1 fi /usr/bin/host.orig $cmd_par |
然后重启clcomd(注意不是clcomdES)。这下子消停了,cluster定义顺利完成。
【顺便】在/etc/cluster/rhosts加入授权节点后,clrsh可以跟rsh一样好使,免密码远程root身份执行,这样能干的事情就多了。