oracle宕机原因排查,oracle不定期的出现宕机的问题诊断

环境:

AIX5L+HACMP+ORACLE9I

分析alert,没有发现错误:

Fri Apr 27 17:08:05 2007

Thread 1 advanced to log sequence 612

Current log# 2 seq# 612 mem# 0: /oracle2/oracle/oradata/mssdb2/redo02.log

Fri Apr 27 17:08:05 2007

ARC0: Evaluating archive log 1 thread 1 sequence 611

ARC0: Beginning to archive log 1 thread 1 sequence 611

Creating archive destination LOG_ARCHIVE_DEST_1: '/oracle2/oracle/oradata/oraclearchivelog/1_611.dbf'

ARC0: Completed archiving log 1 thread 1 sequence 611

Fri Apr 27 17:18:52 2007

Shutting down instance: further logons disabled

Shutting down instance (immediate)

License high water mark = 117

Waiting for dispatcher 'D000' to shutdown

All dispatchers and shared servers shutdown

Fri Apr 27 17:20:49 2007

ALTER DATABASE CLOSE NORMAL

Fri Apr 27 17:20:49 2007

SMON: disabling tx recovery

SMON: disabling cache recovery

Fri Apr 27 17:20:49 2007

Shutting down archive processes

Archiving is disabled

Fri Apr 27 17:20:49 2007

ARCH shutting down

Fri Apr 27 17:20:49 2007

ARCH shutting down

Fri Apr 27 17:20:49 2007

ARC1: Archival stopped

Fri Apr 27 17:20:49 2007

ARC0: Archival stopped

Fri Apr 27 17:20:49 2007

Thread 1 closed at log sequence 612

Successful close of redo thread 1.

Fri Apr 27 17:20:50 2007

Completed: ALTER DATABASE CLOSE NORMAL

Fri Apr 27 17:20:50 2007

ALTER DATABASE DISMOUNT

Completed: ALTER DATABASE DISMOUNT

ARCH: Archiving is disabled

Shutting down archive processes

Archiving is disabled

Archive process shutdown avoided: 0 active

ARCH: Archiving is disabled

Shutting down archive processes

Archiving is disabled

Archive process shutdown avoided: 0 active

Fri Apr 27 17:22:39 2007

检查hacmp的日志,发现出错时刻hacmp正在执行切换:

hacmp的out文件:

:server_restart[144] :server_restart[144] odmget -qmonitor=ora2_db and name=NOTIFY_METHOD HACMPmonitor

:server_restart[144] cut -d" -f2

:server_restart[144] grep value =

NOTIFY=

:server_restart[146] :server_restart[146] cut -d -f1

:server_restart[146] echo

NOTIFY_SCRIPT=

:server_restart[146] [[ -n ]]

:server_restart[162] :server_restart[162] odmget -qmonitor=ora2_db and name=CLEANUP_METHOD HACMPmonitor

:server_restart[162] cut -d" -f2

:server_restart[162] egrep value =

CLEANUP=/hacmpscript/ora2stop-db.sh

:server_restart[163] :server_restart[163] cut -d -f1

:server_restart[163] echo /hacmpscript/ora2stop-db.sh

CLEANUP_SCRIPT=/hacmpscript/ora2stop-db.sh

:server_restart[163] [[ -n /hacmpscript/ora2stop-db.sh ]]

:server_restart[163] [[ -x /hacmpscript/ora2stop-db.sh ]]

:server_restart[167] dspmsg scripts.cat 9334 server_restart: Calling user specified cleanup method /hacmpscript/ora2stop-db.shn server_restart /hacmpscript/ora2stop-db.sh

server_restart: Calling user specified cleanup method /hacmpscript/ora2stop-db.sh

:server_restart[168] /hacmpscript/ora2stop-db.sh

:server_restart[168] ODMDIR=/etc/objrepos

SQL*Plus: Release 9.2.0.1.0 - Production on Wed May 9 16:26:13 2007

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

SQL> Connected.

SQL> Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> Disconnected from Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production

With the Partitioning, OLAP and Oracle Data Mining options

JServer Release 9.2.0.1.0 - Production

检查hacmp的test脚本,没有发现问题:

test-db-mssdb2.sh的内容:

#!/bin/sh

#

# Copyright (c) MDCL Inc. All Rights Reserved.

# Created by Li Han Guang 2006.01.16

#

# Should return 0 if the service was start succesfully; otherwise,

# return non-zero

#

# check the following process:

# ora_smon_$sid

# ora_pmon_$sid

# ora_dbw0_$sid

# ora_lgwr_$sid

# ora_reco_$sid

# database connection

ORACLE_SID="mssdb2"

ORACLE_HOME=/oracle2/oracle/product/9.2.0

ORA_USER="oracle2"

#ISQL=svrmgrl

ISQL="sqlplus /nolog"

LOG_FILE="/tmp/test_mssdb2.log"

#

# log messages, need var $LOG_FILE

#

log()

{

echo [`date +"%Y-%m-%d %H:%M:%S"`] $1 >> $LOG_FILE

echo [`date +"%Y-%m-%d %H:%M:%S"`] $1

}

get_status()

{

str=$1

# ps -ef | cut -c52- | grep $str | grep -v grep > /dev/null 2>&1

ps -ef | grep $str | grep -v grep > /dev/null 2>&1

STATUS=$?

case $STATUS in

0) return 0;;

*) return 1;;

esac

}

#

#checks for smon process

#

check_smon()

{

sid=$1

get_status ora_smon_$sid

return $?

}

#

#checks for pmon process

#

check_pmon()

{

sid=$1

get_status ora_pmon_$sid

return $?

}

#

#checks for dbw0 process

#

check_dbw0()

{

sid=$1

get_status ora_dbw0_$sid

return $?

}

#checks for lgwr process

#

check_lgwr()

{

sid=$1

get_status ora_lgwr_$sid

status=$?

return ${status}

}

#

#checks for reco process

#

check_reco()

{

sid=$1

get_status ora_reco_$sid

return $?

}

####################################

# start to check oracle processes

####################################

#checks for database connection

#

su - $ORA_USER -c "${ISQL} > /tmp/$$.out 2> /tmp/$$.err " < connect / as sysdba;

select sysdate from dual;

exit;

EOF

grep SYSDATE /tmp/$$.out >/dev/null 2>&1

if test "$?" -ne 0

then

# cat /tmp/$$.*

log "ORACLE: Cannot connect to database for ORACLE_SID=${ORACLE_SID}."

exit 1

else

log "Oracle instance ${ORACLE_SID} can be connected"

fi

rm -f /tmp/$$.*

check_smon ${ORACLE_SID}

if test "$?" -ne 0

then

log "ORACLE: ora_smon_${ORACLE_SID} is NOT running."

exit 1

else

log "ORACLE: ora_smon_${ORACLE_SID} is running"

fi

check_pmon ${ORACLE_SID}

if test "$?" -ne 0

then

log "ORACLE: ora_pmon_${ORACLE_SID} is NOT running."

exit 1

else

log "ORACLE: ora_pmon_${ORACLE_SID} is running"

fi

check_dbw0 ${ORACLE_SID}

if test "$?" -ne 0

then

log "ORACLE: ora_dbw0_${ORACLE_SID} is NOT running."

exit 1

else

log "ORACLE: ora_dbw0_${ORACLE_SID} is running"

fi

check_lgwr ${ORACLE_SID}

if test "$?" -ne 0

then

log "ORACLE: ora_lgwr_${ORACLE_SID} is NOT running."

exit 1

else

log "ORACLE: ora_lgwr_${ORACLE_SID} is running"

fi

check_reco ${ORACLE_SID}

if test "$?" -ne 0

then

log "ORACLE: ora_reco_${ORACLE_SID} is NOT running."

exit 1

else

log "ORACLE: ora_reco_${ORACLE_SID} is running"

fi

exit 0

根据hacmp的test脚本检查test脚本对应的日志778500.out发现连接数据库存在问题:

SQL*Plus: Release 9.2.0.1.0 - Production on Fri Apr 27 17:12:32 2007

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

SQL> ERROR:

ORA-12540: TNS:internal limit restriction exceeded

SQL> SP2-0640: Not connected

SQL>

继续检查ORACLE数据库的监听日志LISTERNER.LOG,发现如下错误:

27-APR-2007 17:10:33 * (CONNECT_DATA=(SID=mssdb2)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.192.139.194)(PORT=40707)) * establish * mssdb2 * 12500

TNS-12500: TNS:listener failed to start a dedicated server process

TNS-12540: TNS:internal limit restriction exceeded

TNS-12560: TNS:protocol adapter error

TNS-00510: Internal limit restriction exceeded

IBM/AIX RISC System/6000 Error: 11: Resource temporarily unavailable

27-APR-2007 17:12:25 * 12502

TNS-12502: TNS:listener received no CONNECT_DATA from client

No longer listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.192.139.202)(PORT=1521)))

27-APR-2007 17:14:43 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=Unicom_ora2)(USER=oracle2))(COMMAND=stop)(ARGUMENTS=64)(SERVICE=LISTENER)(VERSION=153092352)) * stop * 0

TNSLSNR for IBM/AIX RISC System/6000: Version 9.2.0.1.0 - Production on 27-APR-2007 17:22:47

Copyright (c) 1991, 2002, Oracle Corporation. All rights reserved.

最后分析该问题引起的可能原因:

一、 AIX操作系统参数maxuproc:

执行相应的检查,发现该参数使用的默认值,需要调整:

$ lsattr -El sys0|grep maxuproc

maxuproc 128 Maximum number of PROCESSES allowed per user True

二、 ORACLE数据库的参数PROCESSES和PGA_AGGREGATE_TARGET

执行相应的检查,发现该2个参数都为默认值,需要调整:

SQL> show parameter processes

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

aq_tm_processes integer 1

db_writer_processes integer 1

job_queue_processes integer 10

log_archive_max_processes integer 2

processes integer 150

SQL> show parameter pga

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

pga_aggregate_target big integer 25165824

汇总建议,提交客户。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值