TNS-12518 TNS-12549 TNS-12560 TNS-00519

416 篇文章 0 订阅
162 篇文章 0 订阅

 TNS-12518: TNS:listener could not hand off client connection
TNS-12549: TNS:operating system resource quota exceeded
TNS-12560: TNS:protocol adapter error
TNS-00519: Operating system resource quota exceeded
IBM/AIX RISC System/6000 Error: 12: Not enough space

 

ORACLE 10G RAC 监听服务异常中断(offline)分析

作者:shenguokun
http://sinobestdba.itpub.net
1 问题描述
管理人员发现Oracle10g RAC集群第一个节点主机上的 ora.apprac1.LISTENER_apprac1.lsnr与ora.apprac1.ons服务处于offline状态,把这两个服务手工重启后隔一段这两个服务又自动变成offline。


2 故障分析
2.1 检查数据库运行状态
oracle@apprac1$ crstat
HA Resource Target State
----------- ------ -----
ora.apprac1.LISTENER_apprac1.lsnr OFFLINE OFFLINE
ora.apprac1.ons ONLINE OFFLINE
ora.apprac1.vip ONLINE ONLINE on apprac1
ora.apprac2.LISTENER_apprac2.lsnr ONLINE ONLINE on apprac2
ora.apprac2.ons ONLINE ONLINE on apprac2
ora.apprac2.vip ONLINE ONLINE on apprac2
ora.dgyd.db ONLINE ONLINE on apprac1
ora.dgyd.inst1.inst ONLINE ONLINE on apprac1
ora.dgyd.dgyd2.inst ONLINE ONLINE on apprac2
故障现象是ora.apprac1.LISTENER_apprac1.lsnr服务与ora.apprac1.ons处于offline状态。
2.2 数据库报警日志分析
alert_inst1.log
Sun Feb 28 21:58:31 2010
Process startup failed, error stack:
Sun Feb 28 21:58:31 2010
Errors in file /oracle/admin/dgyd/bdump/inst1_psp0_356718.trc:
ORA-27300: OS system dependent operation:fork failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgpspawn5
ORA-27303: additional information: skgpspawn5
Sun Feb 28 21:58:32 2010
Process m000 died, see its trace file
Sun Feb 28 21:58:32 2010
ksvcreate: Process(m000) creation failed
Sun Feb 28 21:59:58 2010
Process startup failed, error stack:
Sun Feb 28 21:59:58 2010
Errors in file /oracle/admin/dgyd/bdump/inst1_psp0_356718.trc:
ORA-27300: OS system dependent operation:fork failed with status: 2
ORA-27301: OS failure message: No such file or directory
ORA-27302: failure occurred at: skgpspawn5
ORA-27303: additional information: skgpspawn5
Sun Feb 28 21:59:59 2010
Process J000 died, see its trace file
Sun Feb 28 21:59:59 2010
kkjcre1p: unable to spawn jobq slave process
Sun Feb 28 21:59:59 2010
Errors in file /oracle/admin/dgyd/bdump/inst1_cjq0_184864.trc:
从数据库日志可以看出:在故障发生期间,由于OS资源原因,oracle数据库的jobq slave进程无法启动。
2.3 操作系统资源使用情况分析
oracle@apprac1$ vmstat 1 20
System configuration: lcpu=12 mem=45312MB
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
3 0 9861313 9273 0 0 0 0 0 0 809 101745 6087 18 3 79 0
0 0 9861327 9259 0 0 0 0 0 0 1694 119757 9918 21 3 75 0
1 0 9834443 36207 0 39 0 44 543 0 411 44502 4505 7 3 87 2
0 0 9836958 33116 0 6 0 0 0 0 741 271949 25123 15 5 79 1
1 0 9834389 35685 0 0 0 0 0 0 376 92789 9585 17 2 81 0
1 0 9834382 35691 0 0 0 0 0 0 553 95995 5002 17 2 80 0
2 0 9834381 35692 0 0 0 0 0 0 259 86042 4031 14 2 84 0
1 0 9834378 35599 0 6 0 0 0 0 700 120049 6419 33 3 64 0
1 0 9834377 35600 0 0 0 0 0 0 282 99036 4170 16 2 82 0
0 0 9834371 35606 0 0 0 0 0 0 433 100106 5049 20 2 78 0
1 0 9834377 35536 0 4 0 0 0 0 791 111374 5407 24 3 73 0
0 0 9834377 35536 0 0 0 0 0 0 1143 111069 6707 26 3 70 0
2 0 9834379 35502 0 2 0 0 0 0 1162 103518 6228 20 3 75 2
1 0 9834379 35470 0 2 0 0 0 0 1178 101210 5693 19 3 73 5
从vmstat结果可以看到:在故障发生期间,操作系统可用内存非常少。
2.4 检查主机配置
prtconf
System Model: IBM,9117-570
Machine Serial Number: 65DCAEC
Processor Type: PowerPC_POWER5
Number Of Processors: 6
Processor Clock Speed: 1900 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 2 apprac1
Memory Size: 45312 MB
Good Memory Size: 45312 MB
Platform Firmware level: SF240_284
Firmware Version: IBM,SF240_284
Console Login: enable
Auto Restart: true
Full Core: false
主机内存:45312 MB
2.5 数据库内存参数
sga_target = 32212254720
pga_aggregate_target = 4294967296
数据库SGA+PGA = 30G+4G = 34G
2.6 数据库监听日志
listener_apprac1.log
(CONNECT_DATA=(SID=inst1)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(
PROTOCOL=tcp)(HOST=192.168.100.8)(PORT=3476)) * establish * inst1 * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12549: TNS:operating system resource quota exceeded
TNS-12560: TNS:protocol adapter error
TNS-00519: Operating system resource quota exceeded
IBM/AIX RISC System/6000 Error: 12: Not enough space
01-MAR-2010 16:59:52 * (CONNECT_DATA=(SID=inst1)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(
PROTOCOL=tcp)(HOST=192.168.100.8)(PORT=3477)) * establish * inst1 * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12549: TNS:operating system resource quota exceeded
TNS-12560: TNS:protocol adapter error
TNS-00519: Operating system resource quota exceeded
IBM/AIX RISC System/6000 Error: 12: Not enough space
从数据库监听日志可以看到:在故障发生期间,操作系统可用内存不足导致监听程序无法spawn服务器进程。
2.7 ora.apprac1.LISTENER_apprac1.lsnr服务日志
ora.apprac1.LISTENER_apprac1.lsnr.log
2010-02-28 20:40:16.504: [ RACG][1] [582038][1][ora.apprac1.LISTENER_apprac1.lsnr]: clsrcexecut: scls_process_spawn() error, status = 1, command = /oracle/product/db/bin/racgeut
Category: 1234
Operation: scls_process_spawn
Location: read_pipe
Other: EOF on read pipe
OS Error: Not enough space
2010-02-28 20:40:16.508: [ RACG][1] [582038][1][ora.apprac1.LISTENER_apprac1.lsnr]: end for resource = ora.apprac1.LISTENER_apprac1.lsnr, action = check, status = 1, time = 0.136s
从ora.apprac1.LISTENER_apprac1.lsnr.log日志可以看到:在故障发生期间,由于OS资源原因,监听程序无法spawn服务器进程。
2.8 数据库集群进程crsd.log日志
crsd.log
2010-02-28 20:38:18.396: [ CRSEVT][629]32CAAMonitorHandler :: 0:Could not execute /oracle/product/crs/bin/racgwrap(check) for ora.dgyd.db
category: 1234, operation: scls_process_spawn, loc: read_pipe, OS error: 12, other: EOF on read pipe
2010-02-28 20:38:18.406: [ CRSAPP][629]32CheckResource error for ora.dgyd.db error code = -1
2010-02-28 20:40:16.514: [ CRSAPP][636]32CheckResource error for ora.apprac1.LISTENER_apprac1.lsnr error code = 1
2010-02-28 20:40:16.754: [ CRSRES][636]32In stateChanged, ora.apprac1.LISTENER_apprac1.lsnr target is ONLINE
2010-02-28 20:40:16.796: [ CRSRES][636]32ora.apprac1.LISTENER_apprac1.lsnr on apprac1 went OFFLINE unexpectedly
……
2010-02-28 20:45:51.989: [ CRSAPP][653]32CheckResource error for ora.apprac1.ons error code = 1
2010-02-28 20:45:51.991: [ CRSRES][653]32In stateChanged, ora.apprac1.ons target is ONLINE
2010-02-28 20:45:51.991: [ CRSRES][653]32ora.apprac1.ons on apprac1 went OFFLINE unexpectedly
……
从crsd.log日志可以看出:在故障发生期间,监听程序无法spawn服务器进程导致ora.apprac1.LISTENER_apprac1.lsnr服务异常中断offline,由于该服务offline又进一步导致ora.apprac1.ons服务异常中断offline 。

3 故障解决方法
故障解决思路:降低数据库SGA内存占用量,以留出更多的内存给数据库服务器进程使用,并重启数据库及操作系统,以释放文件型内存。
操作步骤如下:
1. 修改sga_target参数,从30G修改为26G
alter system set sga_target=26G scope=spfile sid='inst1';
2. 关闭数据库
shutdown immediate
3. 关闭数据库集群
crsctl stop crs
4. 关闭HACMP
smitty clstop
5. 重启数据库服务器
shutdown -Fr
6. 启动HACMP
smitty clstart
7. 启动数据库集群
crsctl start crs
8. 启动数据库
startup
进行以上处理后,ora.apprac1.LISTENER_apprac1.lsnr与ora.apprac1.ons服务恢复正常。
4 总结及建议
通过对出现故障时数据库运行状态、数据库日志、数据库集群日志、主机性能、主机日志等综合分析后认为:故障发生时操作系统可用内存(free)非常少,数据库监听程序没能申请到足够的内存去spawn服务器进程,导致ora.apprac1.LISTENER_apprac1.lsnr服务异常中断(offline),由于该服务offline又进一步导致了ora.apprac1.ons服务异常中断(offline) 。
通常情况下,在数据库服务器中占用系统内存的组件包括:数据库SGA、数据库PGA、数据库服务器进程、文件系统缓存、操作系统内核等。必须要合理设置数据库SGA及PGA参数,以留出足够的内存给数据库服务器进程(与客户端连接请求一一对应)。
metalink工程师给出的建议如下:
ACTION PLAN
=========
Hi,
1. From the current collected diagnose information...the cause of this issue may be that there is not enough free memory on OS.
2. The possible reasons that OS memory is not enough are the below:
-- Memory leak occurred.
-- OS memory was occupied by some other processes which used a lot of memory.
3. I have checked and researched furtherly...but regret that can not find the relevant known issues or Bugs which reported the issue about memory leak on 10.2.0.4 version.
So please help to try to follow up the below action plans in order to avoid this issue occurring again in the future:
1). Try to decrease the SGA size...so that more memory can be given to OS.
2). Monitor the memory usage to check and find which processes are costing a lot of memory by the below methods:
-- ps -efo "vsz,pid,user,cpu,thcount,comm" | sort -n
In addition...the following 2 Metalink/MOS Notes are for your reference...You can review them from Metalink/MOS website:
-- Note 477522.1:How To Troubleshoot Memory Leaks on UNIX
-- Note 835254.1:Script To Monitor RDBMS Session UGA and PGA Current And

 

from:http://blog.itpub.net/post/42827/511862

TNS-12541: TNS: 无监听程序 TNS-12560: TNS: 协议适配器错误 TNS-00511: 无监听程序 文章转自:http://www.luocs.com/archives/464.html 此文版权归作者 – yaogang所有,转载请注明yaogang©www.luocs.com。 Luocs说:这是我一个朋友的一个监听器问题解决案例,这是昨天发生的事情,我一直跟朋友一起Troubleshooting,折腾了半天最后是BUG所致。再次汗颜,Windows平台惹不起啊!好,那么下面开始分享我朋友的案例! 环境描述: OS : Windows Server 2008 64Bit (做了HA) DB : 11.1.0.7.0 排错过程: 前天应用不能访问数据库了 (后台应用能访问数据库),故障发生。 马上登录到服务器里查看监听状态,发现有TNS-12541 ,TNS-12560等错误 Luocs补充:我跟朋友要了错误代码,但他没有保存,就直接贴图。 从计算器的管理 –> 服务选项 –> 检查oracle 监听服务程序,发现该服务已经停止。 手动把监听服务启动,这时候服务状态上显示为已启动,但在CMD窗口执行lsnrctl status的时候依然返回错误信息: C:\>lsnrctl status LSNRCTL for 64-bit Windows: Version 11.1.0.7.0 - Production on 12-11月-2012 18:1 8:32 Copyright (c) 1991, 2008, Oracle. All rights reserved. 正在连接到 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.10.203.218)(PORT=1521))) TNS-12541: TNS: 无监听程序 TNS-12560: TNS: 协议适配器错误 TNS-00511: 无监听程序 64-bit Windows Error: 61: Unknown error 正在连接到 (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521))) 过段时间回显非常慢。 然后我检查了下告警日志,大量的ora错误 Fatal NI connect error 12170. VERSION INFORMATION: TNS for 64-bit Windows: Version 11.1.0.7.0 - Production Oracle Bequeath NT Protocol Adapter for 64-bit Windows: Version 11.1.0.7.0 - Production Windows NT TCP/IP NT Protocol Adapter for 64-bit Windows: Version 11.1.0.7.0 - Production Time: 12-11月-2012 15:23:33 Tracing not turned on. Tns error struct: ns main err code: 12535 TNS-12535: TNS: 操作超时 ns secondary err code: 12560 nt main err code: 0 nt secondary err code: 0 nt OS err code: 0 Client address: ORA-609 : opiodr aborting process unknown ospid (4116_6104) 这时候朋友怀疑是不是监听器配置问题,就把原先的监听器删除重建了下,问题依然。 网上有个解决TNS-12535错误的案例,平台和版本都很类似,如下: 1、在 sqlnet.ora文件中 增加如下行: DIAG_ADR_ENABLED = OFF 2、在listener.ora文件中增加如下行: DIAG_ADR_ENABLED_ = OFF 如何监听是listener时,则前面的名称为:DIAG_ADR_ENABLED_LISTENER = OFF 3、重新启动windows服务管理中的监听程序.先停止,然后再重新启动. 4、检查结果.发现可以了,返回的值在10毫秒.有时为0毫秒.成功!! 但这并不是问题发生原因,在继续排查过程中偶然发现监听日志大小居然为4G。然后把这现象告诉了Luocs。 过了一会儿,Luocs回应是Oracle一个BUG,BUG号为9879101 : THE CONNECT THROUGH LISTENER WAS SLOW WHEN LISTNER LOG GROWED 4GB。 Luocs还提供了MOS上一篇文章,ID 1319797.1 : WINDOWS: Listener Hangs & Lsnrctl Commands Are Slow or Hang,里面给出了解决方法: You can solve this problem by deleting the large listener in $ORACLE_BASE\diag\tnslsnr\\listener\trace\.log 1) Stop the listener process using the command line or Control Panel Service. 2) Delete the log file(s) that are at or approaching the 4G size limit at this location: $ORACLE_BASE\diag\tnslsnr\\listener\trace\.log 3) Issue any lsnrctl command and you will see a new listener.log in its place under: $ORACLE_BASE\diag\tnslsnr\\listener\trace\ Since ADR Diagnostics are enabled for this listener these steps cannot be done dynamically using the lsnrctl utility. e.g. LSNRCTL>set log_file mylog Will yield: TNS-01251: Cannot set trace/log directory under ADR. However, it is possible to disable the flat file listener logging using the following commands: LSNRCTL>set current_listener LSNRCTL>set log_status OFF LSNRCTL>save_config 我就按照以上说明如下进行: 1)LSNRCTL进入交互模式 2)执行set current_listener LISTENER 3)set log_status off 4)stop 停止监听器 5)手工删除ADR指定的监听日志路径下的listener.log文件 6)start重启监听器 7)status查看状态 到此问题解决。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值