hacmp application process monitor

http://echo.sharera.com/blog/BlogTopic/22668.htm


现在终于苦尽甘来,把这个折磨我许久的问题搞定了。

首先,为了避免oracle的listener和主进程出问题时互相干扰,造成不必要的重启。将对Listener和oracle主进程的监控分开,分别是ListenerMonitor和OraMonitor

Listener的monitor就偷懒不写脚本了,直接用hacmp的application process monitor的功能,采用进程来监控。具体的配置如下:

[Entry Fields]
* Monitor Name ListenerMonitor
Application Server(s) to Monitor oracle +
* Monitor Mode [Long-running monitoring] +
* Processes to Monitor [tnslsnr]
* Process Owner [oracle]
Instance Count [1] #
* Stabilization Interval [60] #
* Restart Count [3] #
Restart Interval [198] #
* Action on Application Failure [fallover] +
Notify Method []
Cleanup Method [/home/script/oracle/ListenerClear.sh]
Restart Method [/home/script/oracle/ListernerRestart.sh]

clearup和restart的脚本很简单了,就是关闭和启动listener的2条命令而已。在此不赘述了。

由于对Oracle的监控,不单只是oracle进程是否活着这么简单了,还需要对oracle是否能响应请求进行监控,因此需要用Custom Monitor的方式来做。 HACMP恼人的一点就是,根本没有对通用软件的监控agent可以利用,所有的东东都必须自己写。所以费老劲儿了。。。

配置还是很简单的,主要如下:

[Entry Fields]
* Monitor Name oraMonitor
Application Server(s) to Monitor oracle +
* Monitor Mode [Long-running monitoring] +
* Monitor Method [/home/script/oracle/SqlTest.sh]
Monitor Interval [30] #
Hung Monitor Signal [9] #
* Stabilization Interval [180] #
Restart Count [3] #
Restart Interval [693] #
* Action on Application Failure [fallover] +
Notify Method []
Cleanup Method [/home/script/oracle/oraclear.sh]
Restart Method [/home/script/oracle/orarestart.sh]

SqlTest.sh的内容如下:

#!/bin/sh
ORACLE_HOME=/oracle/product/9.2.0.4
LOGFILE=/tmp/oracle.log

SU=/bin/su
OWNER=oracle

to_log()
{
echo $1 >> $LOGFILE
}

show_time()
{
echo "[`date '+%Y-%m-%d %H:%M:%S'`]"
}

export ORACLE_HOME OWNER SU

#1.0 check the ORACLE_HOME
if [ ! -d $ORACLE_HOME ]
then
to_log "`show_time` Oracle home directory $ORACLE_HOME does not exist"
exit 99
fi

#2.0 check the SQLPLUS execute program
if [ ! -f $ORACLE_HOME/bin/sqlplus ]
then
to_Log "`show_time` sqlplus does not exist"
exit 99
fi

#3.0 check the ORAClE version
SQLRESULT=`$SU - $OWNER -c "$ORACLE_HOME/bin/sqlplus /nolog'" <<EOF
connect / as sysdba
select version,status,logins from v\\\$instance;
exit;
EOF`

if [ `echo $SQLRESULT | awk -F"SQL>" '{print $3}' | awk '/.*ORA-/' | wc -l` -ne 0 ]
then
to_log "`show_time` Monitoring returned the output :\n$SQLRESULT"
exit 99
fi
to_log "`show_time` the Oracle Process is normal!"
exit 0

基本上就是这些东西了。希望能给后来者一点儿启发,细细

阅读更多
个人分类: HACMP 双机管理
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

关闭
关闭
关闭