前天在业务高峰期收到监控系统的告警邮件,说生产库(2节点的rac)的实例1连不上了,部分应用报ORA-12516, TNS:listener could not find available handler with matching protocol stack错误。
看到这个错误后,立即登录到数据库服务器上查看crs的各资源的状态:
[oracle@xxxxdb-01 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE xxxx-db-01
ora....01.lsnr application ONLINE ONLINE xxxx-db-01
ora....-01.gsd application ONLINE ONLINE xxxx-db-01
ora....-01.ons application ONLINE ONLINE xxxx-db-01
ora....-01.vip application ONLINE ONLINE xxxx-db-01
ora....SM2.asm application ONLINE ONLINE xxxx-db-02
ora....02.lsnr application ONLINE ONLINE xxxx-db-02
ora....-02.gsd application ONLINE ONLINE xxxx-db-02
ora....-02.ons application ONLINE ONLINE xxxx-db-02
ora....-02.vip application ONLINE ONLINE xxxx-db-02
ora.emspda.db application ONLINE ONLINE xxxx-db-01
ora....a1.inst application ONLINE ONLINE xxxx-db-01
ora....a2.inst application ONLINE ONLINE xxxx-db-02
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE xxxx-db-01
ora....01.lsnr application ONLINE ONLINE xxxx-db-01
ora....-01.gsd application ONLINE ONLINE xxxx-db-01
ora....-01.ons application ONLINE ONLINE xxxx-db-01
ora....-01.vip application ONLINE ONLINE xxxx-db-01
ora....SM2.asm application ONLINE ONLINE xxxx-db-02
ora....02.lsnr application ONLINE ONLINE xxxx-db-02
ora....-02.gsd application ONLINE ONLINE xxxx-db-02
ora....-02.ons application ONLINE ONLINE xxxx-db-02
ora....-02.vip application ONLINE ONLINE xxxx-db-02
ora.emspda.db application ONLINE ONLINE xxxx-db-01
ora....a1.inst application ONLINE ONLINE xxxx-db-01
ora....a2.inst application ONLINE ONLINE xxxx-db-02
所有的资源的状态都是正常;
立刻查看数据库的sessions和processes参数的设置:
SQL> show parameter sessions
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
java_max_sessionspace_size integer 0
java_soft_sessionspace_limit integer 0
license_max_sessions integer 0
license_sessions_warning integer 0
logmnr_max_persistent_sessions integer 1
sessions integer 1105
shared_server_sessions integer
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
java_max_sessionspace_size integer 0
java_soft_sessionspace_limit integer 0
license_max_sessions integer 0
license_sessions_warning integer 0
logmnr_max_persistent_sessions integer 1
sessions integer 1105
shared_server_sessions integer
SQL> show parameter process
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 2
gcs_server_processes integer 4
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 1000
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
aq_tm_processes integer 0
db_writer_processes integer 2
gcs_server_processes integer 4
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 1000
再查看数据库的链接的session数量:
SQL> SELECT t.INST_ID,COUNT(*) FROM gv$session t GROUP BY t.INST_ID;
INST_ID COUNT(*)
---------- ----------
1 991
2 349
INST_ID COUNT(*)
---------- ----------
1 991
2 349
通过这个分析,原因基本找到了,实例1上的session数接近极值了,所以部分应用连接的时候报ora-12516错误。
现在来分析为何rac两个节点的链接数量这么不均衡呢?当初他们配置应用程序链接的时候我都是让他们采用负载均衡的方式配的,按理他们是没有直接链接实例1的。
先查看究竟是哪些session链接到实例1上:
SELECT t.INST_ID,t.MACHINE,COUNT(*) FROM gv$session t GROUP BY t.INST_ID,t.MACHINE;
发现有两台机器(server1,server2)链接实例1极多。80%都是通过这两台机器(server1,server2)连上去,而这两台机器(server1,server2)恰好是另外一套rac的两个节点,而且这两台机器(server1,server2)是通过dblink访问实例1的。
查看(server1,server2)上配置dblink在tnsnames下的连接串配置,发现这里面当初那个开发人员在建dblink时只写上了实例1这一台机器的主机地址,原因找到了,下面通过修改tnsnames的连接串配置,改为rac的配置方式,并且在server1上将实例1的ip地址写在上面,实例2的ip地址写在下面,在server2上调换一下,将实例2的ip写在上面,将实例1的ip写在下面:
具体如下:
server1:
proc =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = inst1ip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = inst2ip)(PORT = 1521))
(LOADBALANCE = yes)
(FAILOVER = ON)
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = abc)
(FAILOVER_MODE =
(TYPE = select)
(METHOD = basic)
)
)
)
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = inst1ip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = inst2ip)(PORT = 1521))
(LOADBALANCE = yes)
(FAILOVER = ON)
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = abc)
(FAILOVER_MODE =
(TYPE = select)
(METHOD = basic)
)
)
)
server2:
proc =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = inst2ip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = inst1ip)(PORT = 1521))
(LOADBALANCE = yes)
(FAILOVER = ON)
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = abc)
(FAILOVER_MODE =
(TYPE = select)
(METHOD = basic)
)
)
)
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = inst2ip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = inst1ip)(PORT = 1521))
(LOADBALANCE = yes)
(FAILOVER = ON)
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = abc)
(FAILOVER_MODE =
(TYPE = select)
(METHOD = basic)
)
)
)
改完后,重启链接server1,server2上的服务,重启完后,生产库上的session链接情况如下:
SQL> SELECT t.INST_ID,COUNT(*) FROM gv$session t GROUP BY t.INST_ID;
INST_ID COUNT(*)
---------- ----------
1 410
2 533
INST_ID COUNT(*)
---------- ----------
1 410
2 533
基本均衡了,问题解决!
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/12129601/viewspace-718664/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/12129601/viewspace-718664/