今天发现集群掉了一个监听,但是rac1,rac2都能连上DB。检查发现是grid用户监听掉了。
检查状态发现监听不支持服务,猜测是监听服务问题。一一排查。
1.检查集群监听状态:
[grid@rac101 ~]$ lsnrctl status
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date 10-JAN-2023 15:21:20
Uptime 0 days 0 hr. 28 min. 38 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/12.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/grid/diag/tnslsnr/rac101/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
The listener supports no services
The command completed successfully
最后两排发现The listener supports no services。
2.检查集群资源
[grid@rac101 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr
ONLINE ONLINE rac101 STABLE
ONLINE ONLINE rac102 STABLE
ora.DATA.dg
ONLINE ONLINE rac101 STABLE
ONLINE ONLINE rac102 STABLE
ora.FLARC.dg
ONLINE ONLINE rac101 STABLE
ONLINE ONLINE rac102 STABLE
ora.LISTENER.lsnr
ONLINE INTERMEDIATE rac101 Not All Endpoints Re
gistered,STABLE
ONLINE ONLINE rac102 STABLE
ora.MGMT.dg
ONLINE ONLINE rac101 STABLE
ONLINE ONLINE rac102 STABLE
ora.OCR.dg
ONLINE ONLINE rac101 STABLE
ONLINE ONLINE rac102 STABLE
ora.chad
ONLINE ONLINE rac101 STABLE
ONLINE ONLINE rac102 STABLE
ora.net1.network
ONLINE ONLINE rac101 STABLE
ONLINE ONLINE rac102 STABLE
ora.ons
ONLINE ONLINE rac101 STABLE
ONLINE ONLINE rac102 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac102 STABLE
ora.MGMTLSNR
1 ONLINE ONLINE rac101 169.254.178.129 192.
168.1.101,STABLE
ora.asm
1 ONLINE ONLINE rac101 Started,STABLE
2 ONLINE ONLINE rac102 Started,STABLE
3 OFFLINE OFFLINE STABLE
ora.cvu
1 ONLINE ONLINE rac101 STABLE
ora.mgmtdb
1 ONLINE ONLINE rac101 Open,STABLE
ora.qosmserver
1 ONLINE ONLINE rac101 STABLE
ora.rac10.db
1 ONLINE ONLINE rac101 Open,HOME=/u01/app/o
racle/product/12.2.0
/db_1,STABLE
2 ONLINE ONLINE rac102 Open,HOME=/u01/app/o
racle/product/12.2.0
/db_1,STABLE
ora.rac101.vip
1 ONLINE ONLINE rac101 STABLE
ora.rac102.vip
1 ONLINE ONLINE rac102 STABLE
ora.scan1.vip
1 ONLINE ONLINE rac102 STABLE
发现ora.LISTENER.lsnr 状态详细显示:Not All Endpoints Registered.stable
3.检查告警日志:
[grid@rac101 alert]$ pwd
/u01/app/grid/diag/tnslsnr/rac101/listener/alert
[grid@rac101 alert]$ vi log.xml
<msg time='2023-01-06T16:05:25.769+08:00' org_id='oracle' comp_id='tnslsnr'
type='UNKNOWN' level='16' host_id='rac101'
host_addr='**.**.**.**' pid='86228'>
<txt>TNS-12542: TNS: 地址已被占用
TNS-12560: TNS: 协议适配器错误
TNS-00512: 地址已在使用
Linux Error: 98: Address already in use
TNS-12542: TNS: 地址已被占用
TNS-12560: TNS: 协议适配器错误
TNS-00512: 地址已在使用
Linux Error: 98: Address already in use
</txt>
</msg>
在日志中发现上述报错,怀疑是其他服务启用了占用了该地址。
4.检查服务tns服务是否又被占用。
[grid@rac101 ~]$ ps -ef|grep tns
root 22 2 0 2022 ? 00:00:00 [netns]
oracle 15085 1 0 2022 ? 00:05:01 /u01/app/oracle/product/12.2.0/db_1/bin/tnslsnr LISTENER -inherit
grid 86194 1 0 2022 ? 00:50:43 /u01/app/12.2.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
grid 161529 1 0 2022 ? 00:04:15 /u01/app/12.2.0/grid/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit
grid 174439 1 0 16:01 ? 00:00:00 /u01/app/12.2.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid 180539 179284 0 16:21 pts/1 00:00:00 grep --color=auto tns
发现有个Oracle用户启动的监听(其它人使用了oracle用户启动监听),kill进程15085,杀死进程后再查看监听状态。发现正常。
[grid@rac101 ~]$ ps -ef|grep tns
root 22 2 0 2022 ? 00:00:00 [netns]
grid 86194 1 0 2022 ? 00:50:43 /u01/app/12.2.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit
grid 161529 1 0 2022 ? 00:04:15 /u01/app/12.2.0/grid/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit
grid 174439 1 0 16:01 ? 00:00:00 /u01/app/12.2.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
grid 181503 179284 0 16:25 pts/1 00:00:00 grep --color=auto tns
[grid@rac101 ~]$ lsnrctl status
LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 10-JAN-2023 16:28:07
Copyright (c) 1991, 2016, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 12.2.0.1.0 - Production
Start Date 10-JAN-2023 16:01:58
Uptime 0 days 0 hr. 26 min. 9 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/12.2.0/grid/network/admin/listener.ora
Listener Log File /u01/app/grid/diag/tnslsnr/rac101/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=**.**.**.**)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=**.**.**.**)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=rac101)(PORT=5500))(Security=(my_wallet_directory=/u01/app/oracle/product/12.2.0/db_1/admin/RAC10/xdb_wallet))(Presentation=HTTP)(Session=RAW))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac101)(PORT=8088))(Presentation=HTTP)(Session=RAW))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "+ASM_DATA" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "+ASM_FLARC" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "+ASM_MGMT" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "+ASM_OCR" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "RAC10" has 3 instance(s).
Instance "RAC101", status UNKNOWN, has 1 handler(s) for this service...
Instance "RAC101", status READY, has 1 handler(s) for this service...
Instance "RAC102", status UNKNOWN, has 1 handler(s) for this service...
Service "RAC10XDB" has 1 instance(s).
Instance "RAC101", status READY, has 1 handler(s) for this service...
The command completed successfully
再检查集群资源状态是否正常。