环境:
AIX5L+HACMP+ORACLE9I
分析alert,没有发现错误:
Fri Apr 27 17:08:05 2007
Thread 1 advanced to log sequence 612
Current log# 2 seq# 612 mem# 0: /oracle2/oracle/oradata/mssdb2/redo02.log
Fri Apr 27 17:08:05 2007
ARC0: Evaluating archive log 1 thread 1 sequence 611
ARC0: Beginning to archive log 1 thread 1 sequence 611
Creating archive destination LOG_ARCHIVE_DEST_1: '/oracle2/oracle/oradata/oraclearchivelog/1_611.dbf'
ARC0: Completed archiving log 1 thread 1 sequence 611
Fri Apr 27 17:18:52 2007
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 117
Waiting for dispatcher 'D000' to shutdown
All dispatchers and shared servers shutdown
Fri Apr 27 17:20:49 2007
ALTER DATABASE CLOSE NORMAL
Fri Apr 27 17:20:49 2007
SMON: disabling tx recovery
SMON: disabling cache recovery
Fri Apr 27 17:20:49 2007
Shutting down archive processes
Archiving is disabled
Fri Apr 27 17:20:49 2007
ARCH shutting down
Fri Apr 27 17:20:49 2007
ARCH shutting down
Fri Apr 27 17:20:49 2007
ARC1: Archival stopped
Fri Apr 27 17:20:49 2007
ARC0: Archival stopped
Fri Apr 27 17:20:49 2007
Thread 1 closed at log sequence 612
Successful close of redo thread 1.
Fri Apr 27 17:20:50 2007
Completed: ALTER DATABASE CLOSE NORMAL
Fri Apr 27 17:20:50 2007
ALTER DATABASE DISMOUNT
Completed: ALTER DATABASE DISMOUNT
ARCH: Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
ARCH: Archiving is disabled
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Fri Apr 27 17:22:39 2007
检查hacmp的日志,发现出错时刻hacmp正在执行切换:
hacmp的out文件:
:server_restart[144] :server_restart[144] odmget -qmonitor=ora2_db and name=NOTIFY_METHOD HACMPmonitor
:server_restart[144] cut -d" -f2
:server_restart[144] grep value =
NOTIFY=
:server_restart[146] :server_restart[146] cut -d -f1
:server_restart[146] echo
NOTIFY_SCRIPT=
:server_restart[146] [[ -n ]]
:server_restart[162] :server_restart[162] odmget -qmonitor=ora2_db and name=CLEANUP_METHOD HACMPmonitor
:server_restart[162] cut -d" -f2
:server_restart[162] egrep value =
CLEANUP=/hacmpscript/ora2stop-db.sh
:server_restart[163] :server_restart[163] cut -d -f1
:server_restart[163] echo /hacmpscript/ora2stop-db.sh
CLEANUP_SCRIPT=/hacmpscript/ora2stop-db.sh
:server_restart[163] [[ -n /hacmpscript/ora2stop-db.sh ]]
:server_restart[163] [[ -x /hacmpscript/ora2stop-db.sh ]]
:server_restart[167] dspmsg scripts.cat 9334 server_restart: Calling user specified cleanup method /hacmpscript/ora2stop-db.shn server_restart /hacmpscript/ora2stop-db.sh
server_restart: Calling user specified cleanup method /hacmpscript/ora2stop-db.sh
:server_restart[168] /hacmpscript/ora2stop-db.sh
:server_restart[168] ODMDIR=/etc/objrepos
SQL*Plus: Release 9.2.0.1.0 - Production on Wed May 9 16:26:13 2007
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> Connected.
SQL> Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> Disconnected from Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
检查hacmp的test脚本,没有发现问题:
test-db-mssdb2.sh的内容:
#!/bin/sh
#
# Copyright (c) MDCL Inc. All Rights Reserved.
# Created by Li Han Guang 2006.01.16
#
# Should return 0 if the service was start succesfully; otherwise,
# return non-zero
#
# check the following process:
# ora_smon_$sid
# ora_pmon_$sid
# ora_dbw0_$sid
# ora_lgwr_$sid
# ora_reco_$sid
# database connection
ORACLE_SID="mssdb2"
ORACLE_HOME=/oracle2/oracle/product/9.2.0
ORA_USER="oracle2"
#ISQL=svrmgrl
ISQL="sqlplus /nolog"
LOG_FILE="/tmp/test_mssdb2.log"
#
# log messages, need var $LOG_FILE
#
log()
{
echo [`date +"%Y-%m-%d %H:%M:%S"`] $1 >> $LOG_FILE
echo [`date +"%Y-%m-%d %H:%M:%S"`] $1
}
get_status()
{
str=$1
# ps -ef | cut -c52- | grep $str | grep -v grep > /dev/null 2>&1
ps -ef | grep $str | grep -v grep > /dev/null 2>&1
STATUS=$?
case $STATUS in
0) return 0;;
*) return 1;;
esac
}
#
#checks for smon process
#
check_smon()
{
sid=$1
get_status ora_smon_$sid
return $?
}
#
#checks for pmon process
#
check_pmon()
{
sid=$1
get_status ora_pmon_$sid
return $?
}
#
#checks for dbw0 process
#
check_dbw0()
{
sid=$1
get_status ora_dbw0_$sid
return $?
}
#checks for lgwr process
#
check_lgwr()
{
sid=$1
get_status ora_lgwr_$sid
status=$?
return ${status}
}
#
#checks for reco process
#
check_reco()
{
sid=$1
get_status ora_reco_$sid
return $?
}
####################################
# start to check oracle processes
####################################
#checks for database connection
#
su - $ORA_USER -c "${ISQL} > /tmp/$$.out 2> /tmp/$$.err " < connect / as sysdba;
select sysdate from dual;
exit;
EOF
grep SYSDATE /tmp/$$.out >/dev/null 2>&1
if test "$?" -ne 0
then
# cat /tmp/$$.*
log "ORACLE: Cannot connect to database for ORACLE_SID=${ORACLE_SID}."
exit 1
else
log "Oracle instance ${ORACLE_SID} can be connected"
fi
rm -f /tmp/$$.*
check_smon ${ORACLE_SID}
if test "$?" -ne 0
then
log "ORACLE: ora_smon_${ORACLE_SID} is NOT running."
exit 1
else
log "ORACLE: ora_smon_${ORACLE_SID} is running"
fi
check_pmon ${ORACLE_SID}
if test "$?" -ne 0
then
log "ORACLE: ora_pmon_${ORACLE_SID} is NOT running."
exit 1
else
log "ORACLE: ora_pmon_${ORACLE_SID} is running"
fi
check_dbw0 ${ORACLE_SID}
if test "$?" -ne 0
then
log "ORACLE: ora_dbw0_${ORACLE_SID} is NOT running."
exit 1
else
log "ORACLE: ora_dbw0_${ORACLE_SID} is running"
fi
check_lgwr ${ORACLE_SID}
if test "$?" -ne 0
then
log "ORACLE: ora_lgwr_${ORACLE_SID} is NOT running."
exit 1
else
log "ORACLE: ora_lgwr_${ORACLE_SID} is running"
fi
check_reco ${ORACLE_SID}
if test "$?" -ne 0
then
log "ORACLE: ora_reco_${ORACLE_SID} is NOT running."
exit 1
else
log "ORACLE: ora_reco_${ORACLE_SID} is running"
fi
exit 0
根据hacmp的test脚本检查test脚本对应的日志778500.out发现连接数据库存在问题:
SQL*Plus: Release 9.2.0.1.0 - Production on Fri Apr 27 17:12:32 2007
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SQL> ERROR:
ORA-12540: TNS:internal limit restriction exceeded
SQL> SP2-0640: Not connected
SQL>
继续检查ORACLE数据库的监听日志LISTERNER.LOG,发现如下错误:
27-APR-2007 17:10:33 * (CONNECT_DATA=(SID=mssdb2)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.192.139.194)(PORT=40707)) * establish * mssdb2 * 12500
TNS-12500: TNS:listener failed to start a dedicated server process
TNS-12540: TNS:internal limit restriction exceeded
TNS-12560: TNS:protocol adapter error
TNS-00510: Internal limit restriction exceeded
IBM/AIX RISC System/6000 Error: 11: Resource temporarily unavailable
27-APR-2007 17:12:25 * 12502
TNS-12502: TNS:listener received no CONNECT_DATA from client
No longer listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.192.139.202)(PORT=1521)))
27-APR-2007 17:14:43 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=Unicom_ora2)(USER=oracle2))(COMMAND=stop)(ARGUMENTS=64)(SERVICE=LISTENER)(VERSION=153092352)) * stop * 0
TNSLSNR for IBM/AIX RISC System/6000: Version 9.2.0.1.0 - Production on 27-APR-2007 17:22:47
Copyright (c) 1991, 2002, Oracle Corporation. All rights reserved.
最后分析该问题引起的可能原因:
一、 AIX操作系统参数maxuproc:
执行相应的检查,发现该参数使用的默认值,需要调整:
$ lsattr -El sys0|grep maxuproc
maxuproc 128 Maximum number of PROCESSES allowed per user True
二、 ORACLE数据库的参数PROCESSES和PGA_AGGREGATE_TARGET
执行相应的检查,发现该2个参数都为默认值,需要调整:
SQL> show parameter processes
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 1
db_writer_processes integer 1
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 150
SQL> show parameter pga
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 25165824
汇总建议,提交客户。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/7916042/viewspace-915337/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/7916042/viewspace-915337/