这套环境是为客户搭建的,一个比较小的应用使用tomcat发布,本来打算使用手动切换的HA,但是客户嫌这样麻烦,非要配置成自动切换的HA。由 于系统是在redhat系统上运行的,我首先想到的是rhcs。但是在搭建这个rhcs过程中碰到了很多问题。先是碰到 http://www.itlaowu.com/post/100.html,IP资源提示冲突,不能生效的问题,现在又是添加tomcat运行脚本资源 是报错。
在不断的测试过程过程中碰到了,如下日志内容:
Oct 25 02:36:19 redhat52 kernel: dlm: Using TCP for communications
Oct 25 02:36:19 redhat52 clurgmgrd[15723]: <notice> Resource Group Manager Starting
Oct 25 02:36:20 redhat52 clurgmgrd[15723]: <notice> stop on script "tomcat.sh" returned 5 (program not installed)
Oct 25 02:36:26 redhat52 clurgmgrd[15723]: <notice> Starting stopped service service:hatest
Oct 25 02:36:26 redhat52 clurgmgrd[15723]: <notice> Starting stopped service service:channel
Oct 25 02:36:26 redhat52 clurgmgrd[15723]: <notice> start on script "tomcat.sh" returned 5 (program not installed)
Oct 25 02:36:26 redhat52 clurgmgrd[15723]: <warning> #68: Failed to start service:channel; return value: 1
Oct 25 02:36:26 redhat52 clurgmgrd[15723]: <notice> Stopping service service:channel
Oct 25 02:36:26 redhat52 clurgmgrd[15723]: <notice> stop on script "tomcat.sh" returned 5 (program not installed)
Oct 25 02:36:26 redhat52 clurgmgrd[15723]: <crit> #12: RG service:channel failed to stop; intervention required
Oct 25 02:36:26 redhat52 clurgmgrd[15723]: <notice> Service service:channel is failed
Oct 25 02:36:26 redhat52 clurgmgrd[15723]: <crit> #13: Service service:channel failed to stop cleanly
Oct 25 02:36:28 redhat52 avahi-daemon[3290]: Registering new address record for 192.168.183.111 on bond0.
Oct 25 02:36:29 redhat52 clurgmgrd[15723]: <notice> Service service:hatest started

Oct 25 03:41:24 redhat52 clurgmgrd[15723]: <notice> Starting disabled service service:channel
Oct 25 03:41:24 redhat52 clurgmgrd: [15723]: <err> script:tomcat: start of /app/tomcat/tomcat5528/tomcat5528/bin/tomcat.sh failed (returned 127)
Oct 25 03:41:24 redhat52 clurgmgrd[15723]: <notice> start on script "tomcat" returned 1 (generic error)
Oct 25 03:41:25 redhat52 clurgmgrd[15723]: <warning> #68: Failed to start service:channel; return value: 1
Oct 25 03:41:25 redhat52 clurgmgrd[15723]: <notice> Stopping service service:channel
Oct 25 03:41:25 redhat52 clurgmgrd: [15723]: <err> script:tomcat: stop of /app/tomcat/tomcat5528/tomcat5528/bin/tomcat.sh failed (returned 127)
Oct 25 03:41:25 redhat52 clurgmgrd[15723]: <notice> stop on script "tomcat" returned 1 (generic error)
Oct 25 03:41:25 redhat52 clurgmgrd[15723]: <crit> #12: RG service:channel failed to stop; intervention required
Oct 25 03:41:25 redhat52 clurgmgrd[15723]: <notice> Service service:channel is failed
Oct 25 03:41:25 redhat52 clurgmgrd[15723]: <crit> #13: Service service:channel failed to stop cleanly

Oct 25 03:47:06 redhat52 clurgmgrd[15723]: <err> #43: Service service:channel has failed; can not start.
Oct 25 03:47:06 redhat52 clurgmgrd[15723]: <crit> #13: Service service:channel failed to stop cleanly
Oct 25 03:47:11 redhat52 clurgmgrd[15723]: <err> #43: Service service:channel has failed; can not start.
Oct 25 03:47:11 redhat52 clurgmgrd[15723]: <crit> #13: Service service:channel failed to stop cleanly
在网上搜索,也没有找到一个好的解决方案,于是自己重新整理了几个脚本,添加到资源里面能够启动正常。脚本内容如下:
more /app/tomcat/tomcat5528/tomcat5528/bin/qidong.sh
#!/bin/sh
# Startup script for Tomcat

JAVA_HOME=/usr/java/jdk1.6.0_13
export JAVA_HOME
CATALINA_HOME=/app/tomcat/tomcat5528/tomcat5528
TOMCAT_USER=tomcat
export CATALINA_HOME
start_tomcat=$CATALINA_HOME/bin/kaishi.sh
stop_tomcat=$CATALINA_HOME/bin/guanbi.sh

start() {
echo -n "Starting tomcat: "
${start_tomcat}
echo "tomcat start ok."
}
stop() {
echo -n "Shutting down tomcat: "
${stop_tomcat}
echo "tomcat stop ok."
}

# See how we were called

case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
sleep 10
start
;;
*)
echo "Usage: $0 {start|stop|restart}"
esac

exit 0
#
kaishi.sh脚本内容:
more /app/tomcat/tomcat5528/tomcat5528/bin/kaishi.sh
#!/bin/sh
JAVA_HOME=/usr/java/jdk1.6.0_13
CD=/app/tomcat/tomcat5528/tomcat5528/bin
CATALINA_HOME=${CD}/..
DAEMON_HOME=${CD}
TOMCAT_USER=tomcat

# for multi instances adapt those lines.
TMP_DIR=/tmp
PID_FILE=${CD}/jsvc.pid
CATALINA_BASE=${CD}/..

#CATALINA_OPTS="-Djava.library.path=/home/jfclere/jakarta-tomcat-connectors/jni/native/.libs"
CLASSPATH=\
$JAVA_HOME/lib/tools.jar:\
$CATALINA_HOME/bin/commons-daemon.jar:\
$CATALINA_HOME/bin/bootstrap.jar

#
# Start Tomcat
#
$DAEMON_HOME/jsvc \
-user $TOMCAT_USER \
-home $JAVA_HOME \
-Dcatalina.home=$CATALINA_HOME \
-Dcatalina.base=$CATALINA_BASE \
-Djava.io.tmpdir=$TMP_DIR \
-wait 10 \
-pidfile $PID_FILE \
-outfile $CATALINA_HOME/logs/catalina.out \
-errfile '&1' \
$CATALINA_OPTS \
-cp $CLASSPATH \
org.apache.catalina.startup.Bootstrap
#
# To get a verbose JVM
#-verbose \
# To get a debug of jsvc.
#-debug \
exit $?

guanbi.sh脚本内容;
more /app/tomcat/tomcat5528/tomcat5528/bin/guanbi.sh
#!/bin/sh
JAVA_HOME=/usr/java/jdk1.6.0_13
CD=/app/tomcat/tomcat5528/tomcat5528/bin
CATALINA_HOME=${CD}/..
DAEMON_HOME=${CD}
TOMCAT_USER=tomcat

# for multi instances adapt those lines.
TMP_DIR=/tmp
PID_FILE=${CD}/jsvc.pid
CATALINA_BASE=${CD}/..

#CATALINA_OPTS="-Djava.library.path=/home/jfclere/jakarta-tomcat-connectors/jni/native/.libs"
CLASSPATH=\
$JAVA_HOME/lib/tools.jar:\
$CATALINA_HOME/bin/commons-daemon.jar:\
$CATALINA_HOME/bin/bootstrap.jar
$DAEMON_HOME/jsvc \
-stop \
-pidfile $PID_FILE \
org.apache.catalina.startup.Bootstrap
exit $?

cluster配置文件如下:
more /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster alias="hatest" config_version="26" name="hatest">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="redhat52" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="redhat6" nodeid="2" votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="hatest" ordered="0" restricted="0">
<failoverdomainnode name="redhat52" priority="1"/>
<failoverdomainnode name="redhat6" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<script file="/app/tomcat/tomcat5528/tomcat5528/bin/qidong.sh" name="tomcat"/>
</resources>
<service autostart="1" name="hatest" recovery="relocate">
<ip address="192.168.183.111/24" monitor_link="1"/>
</service>
<service autostart="1" domain="hatest" name="channel" recovery="relocate">
<script ref="tomcat"/>
</service>
</rm>
</cluster>

通 过以上的设置,tomcat脚本资源和IP地址资源都能够自动的切换了。但是还有一个没能解决的问题,因为没有使用fence devices设备,但只有单节点启动cman服务时,当启动到fence时就会停留很长时间,甚至报错。如果两个节点同时启动cman服务,则cman 启动很快,不会有任何的停留。不知道怎么配置能让集群在启动的时候检测节点状态时能速度快一点。