引言:
客户反映通过数据库监控平台发现某数据库监听listener好像在某几个点有问题,现象如下:
分析过程:
1,登陆数据库,查看监听状态是否正常,监听一切正常
[vsxx][/oracle]$lsnrctl status
LSNRCTL for IBM/AIX RISC System/6000: Version 10.2.0.5.0 - Production on 02-SEP-2015 12:00:13
Copyright (c) 1991, 2010, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER_VSOPDB1
Version TNSLSNR for IBM/AIX RISC System/6000: Version 10.2.0.5.0 - Production
Start Date 10-DEC-2014 23:46:56
Uptime 265 days 12 hr. 13 min. 18 sec
Trace Level off
Security ON: Local OS Authentication
SNMP ON
Listener Parameter File /oracle/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Listener Log File /oracle/app/oracle/product/10.2.0/db_1/network/log/listener_vsopdb1.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=&ip1)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=&ip2)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...
Service "+ASM_XPT" has 1 instance(s).
Instance "+ASM1", status BLOCKED, has 1 handler(s) for this service...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
Service "VSOPDB" has 2 instance(s).
Instance "VSOPDB1", status READY, has 2 handler(s) for this service...
Instance "VSOPDB2", status READY, has 1 handler(s) for this service...
Service "VSOPDB_XPT" has 2 instance(s).
Instance "VSOPDB1", status READY, has 2 handler(s) for this service...
Instance "VSOPDB2", status READY, has 1 handler(s) for this service...
The command completed successfully
[vsopdb1][/oracle]$
2,查看客户反映问题时间点的监听日志,也是正常的
^C[vsopdb1][/oracle]$tail -100f /oracle/app/oracle/product/10.2.0/db_1/network/log/listener_vsopdb1.log
02-SEP-2015 11:28:49 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=VSOPDB)(INSTANCE_NAME=VSOPDB1)(CID=(PROGRAM=E:\ETL_CLOUD_81_7\ETL_CLOUD_81_7.exe)(HOST=CLOUD2)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.19.11.81)(PORT=4923)) * establish * VSOPDB * 0
02-SEP-2015 11:28:49 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=VSOPDB)(INSTANCE_NAME=VSOPDB1)(CID=(PROGRAM=D:\ETL_CLOUD_81_1\ETL_CLOUD_81_1.exe)(HOST=CLOUD2)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.19.11.81)(PORT=4924)) * establish * VSOPDB * 0
02-SEP-2015 11:28:49 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=VSOPDB)(INSTANCE_NAME=VSOPDB1)(CID=(PROGRAM=E:\ETL_CLOUD_81_7\ETL_CLOUD_81_7.exe)(HOST=CLOUD2)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.19.11.81)(PORT=4926)) * establish * VSOPDB * 0
02-SEP-2015 11:29:17 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=vsopdb1)(USER=patrol))(COMMAND=status)(ARGUMENTS=64)(SERVICE=LISTENER_VSOPDB1)(VERSION=169870592)) * status * 0
02-SEP-2015 11:29:42 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))(SERVER=DEDICATED)(SERVICE_NAME=VSOPDB)(INSTANCE_NAME=VSOPDB1)) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.19.37.23)(PORT=64194)) * establish * VSOPDB * 0
02-SEP-2015 11:29:42 * service_update * VSOPDB1 * 0
02-SEP-2015 11:29:42 * service_update * VSOPDB1 * 0
02-SEP-2015 11:29:45 * service_update * VSOPDB1 * 0
02-SEP-2015 11:29:45 * service_update * VSOPDB1 * 0
02-SEP-2015 11:29:49 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=VSOPDB)(CID=(PROGRAM=tclsh)(HOST=crm2sch)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.19.31.127)(PORT=46757)) * establish * VSOPDB * 0
02-SEP-2015 11:29:50 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=VSOPDB)(CID=(PROGRAM=tclsh)(HOST=crm2sch)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.19.31.127)(PORT=46767)) * establish * VSOPDB * 0
02-SEP-2015 11:30:09 * service_update * VSOPDB2 * 0
02-SEP-2015 11:32:51 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=VSOPDB)(CID=(PROGRAM=tclsh)(HOST=crm2sch)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.19.31.127)(PORT=46802)) * establish * VSOPDB * 0
02-SEP-2015 11:33:03 * service_update * +ASM1 * 0
02-SEP-2015 11:35:05 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=vsopdb1)(USER=oracle))(COMMAND=status)(ARGUMENTS=64)(SERVICE=LISTENER_VSOPDB1)(VERSION=169870592)) * status * 0
02-SEP-2015 11:35:54 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=VSOPDB)(CID=(PROGRAM=tclsh)(HOST=crm2sch)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.19.31.127)(PORT=46875)) * establish * VSOPDB * 0
02-SEP-2015 11:36:59 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=vsopdb1)(USER=oracle))(COMMAND=status)(ARGUMENTS=64)(SERVICE=LISTENER)(VERSION=169870592)) * status * 0
02-SEP-2015 11:37:15 * service_update * +ASM1 * 0
02-SEP-2015 11:38:28 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=VSOPDB)(CID=(PROGRAM=sqlplus@vsopdb1)(HOST=vsopdb1)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.19.11.131)(PORT=61887)) * establish * VSOPDB * 0
3,我们再把时间扩展下往前推,看看监听日志是否有报错信息,无错误信息
more alert|grep ora_
more alert|grep tns-
more alert|grep refuse
4,查看告警日志也未发现相关错误,到此可见数据库监听是正常的,通知客户数据库运行平稳,后续继续监控监听运行状态
Wed Sep 02 08:02:45 BEIST 2015
SUCCESS: diskgroup DGARCH was dismounted
Wed Sep 02 08:02:52 BEIST 2015
SUCCESS: diskgroup DGARCH was mounted
Wed Sep 02 08:02:59 BEIST 2015
SUCCESS: diskgroup DGARCH was dismounted
Wed Sep 02 08:03:12 BEIST 2015
ALTER DATABASE BACKUP CONTROLFILE TO '/usr/omni/tmp/ctrl_vsopdb1.dbf' REUSE
Wed Sep 02 08:03:13 BEIST 2015
Completed: ALTER DATABASE BACKUP CONTROLFILE TO '/usr/omni/tmp/ctrl_vsopdb1.dbf' REUSE
Wed Sep 02 11:37:15 BEIST 2015
Thread 1 advanced to log sequence 31277 (LGWR switch)
Current log# 1 seq# 31277 mem# 0: +DGDATA/vsopdb/onlinelog/group_1.257.740791841
Wed Sep 02 11:37:15 BEIST 2015
SUCCESS: diskgroup DGARCH was mounted
Wed Sep 02 11:37:41 BEIST 2015
SUCCESS: diskgroup DGARCH was dismounted
Wed Sep 02 12:02:41 BEIST 2015
5,随路我们学习下监听日志相关知识,监听日志文件包括如下5种类型的信息
6,我们先看audit trail部分,下为其记录的格式类型
可见:上述信息以*分隔各个节,
timestamp表示客户连接的具体时间
connect data表示客户端一些标识信息,包括客户端的机器名称以及客户端的操作系统用户
protoco info为可选部分,表示客户连接采用的通讯协议
event表示表示事件类型
[sid|service]为可选部分,表示客户端通过哪个数据库SID或SERVICE连接
return code表示客户端连接是否成功,0表示连接成功,非0表示不成功
如下为一个真实的示例
14-MAY-2009 00:29:54 *
(connect_data=(cid=(program=)(host=sales-server)(user=jdoe))(command=reload)
(arguments=64)(service=listener)(version=135290880))
* reload * 0
7,继续看service registration event部分,这是ORACLE PMON定期和监听进程交互的注册信息
pmon会向监听进程注册的信息如下:
数据库每个实例的服务名service
数据库的实例名称instance
数据库服务对应的句柄可用(分为专用服务器模式和共享服务器模式)
数据库实例以及调度进程和节点的压力负载情况,便于分发到不同的RAC数据库实例以及控制客户端的连接
动态监听端口信息
PMON服务注册包括事件:
service_register PMON向监听器注册
service_update PMON更新向监听器注册的数据库信息
service_died 监听和PMON的通讯中断,所有数据库在监听的注册信息全要丢弃,且客户端不能连接到数据库,除非PMON
再次注册数据库信息到监听
PMON注册服务的记录类型
相关含义略,因同上
pmon注册服务的实例
14-MAY-2009 15:28:43 * service_register * sales * 0
14-MAY-2009 15:28:43 * service_register * sales * 0
8,继续看direct hand-off event
下面为其事件的类型信息(适用于共享服务器模式),表示监听把客户端请求转发给调度器进程
相关含义略,同上
21-MAY-2009 10:54:55 * oracle.aurora.net.SALESHttp2 * handoff * 0
9,接着我们再看subscription for ons node down event
如果数据库启动时配置了ONS配置文件可用的话,监听会订阅ons(oracle notification service)节点故障事件
这种订阅的作用在于,如果监听接收到源自ONS进程说某个某个数据库节点DOWN了,就会自动移除这个数据库节点的服务信息
注意:监听采用异步订阅这种事件
如下为此事件的实例(发现如下事件的原因于ONS进程可能未在主机上运行,因为监听是通过ONS订阅这些信息的)
WARNING: Subscription for node down event still pending
如果监听订阅ONS相关事件PENDING,不会影响监听的其它功能正常运行
10,最后来看oracle clusterware notification
如果ORACLE RAC集群进程启动时,数据库监听将会通知ORACLE集群进程,它会把集群进程启动的相关不同运行状态记录到监听日志中,注意必须是成功通知,才会记录,如果通知失败,不会记录
如下为实例
Listener completed notification to CRS on start
Listener completed notification to CRS on stop