pacemaker的HA集群无法启动listener监听故障的处理

项目场景:

提示:这里简述项目相关背景:

之前帮客户在Oracle Linux 7.9下搭建了一套HA集群(pacemaker),最近做测试时候发现了一个集群切换后的BUG。
因为软件安装完全在共享存储上,所以监听日志只配置vip即可。


问题①描述(主机名已脱敏)

提示:这里描述项目中遇到的问题:

在实现切换后节点二的监听和数据库实例状态是STOP,HA提示比较明显:listener_orac_start_0 on cxl-pcs02 ‘unknown error’

[root@cxl-pcs01 ~]# pcs status
Cluster name: cluster01
Stack: corosync
Current DC: cxl-pcs01 (version 1.1.23-1.0.1.el7-9acf116022) - partition with quorum
Last updated: Wed Sep 14 19:03:29 2022
Last change: Wed Sep 14 18:40:13 2022 by root via cibadmin on cxl-pcs01

2 nodes configured
8 resource instances configured

Online: [ cxl-pcs01 cxl-pcs02 ]

Full list of resources:

 Resource Group: oracle
     clustervip01	(ocf::heartbeat:IPaddr2):	Started cxl-pcs02
     vg01	(ocf::heartbeat:LVM):	Started cxl-pcs02
     vg02	(ocf::heartbeat:LVM):	Started cxl-pcs02
     data1	(ocf::heartbeat:Filesystem):	Started cxl-pcs02
     data2	(ocf::heartbeat:Filesystem):	Started cxl-pcs02
     listener_orac	(ocf::heartbeat:oralsnr):	Stopped
     orac	(ocf::heartbeat:oracle):	Stopped
 sbd_fencing	(stonith:fence_sbd):	Started cxl-pcs01

Failed Resource Actions:
* listener_orac_start_0 on cxl-pcs01 'unknown error' (1): call=48, status=complete, exitreason='Listener listener appears to have started, but is not running properly: ',
    last-rc-change='Wed Sep 14 19:01:46 2022', queued=0ms, exec=60196ms
* listener_orac_start_0 on cxl-pcs02 'unknown error' (1): call=44, status=complete, exitreason='Listener listener appears to have started, but is not running properly: ',
    last-rc-change='Wed Sep 14 19:00:38 2022', queued=0ms, exec=61821ms

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
  sbd: active/enabled

拿到以上提示就去看看listener.ora配置,果然发现了问题,之前为了测试主机加了一行本机的IP地址

[oracle@cxl-pcs01 admin]$ cat listener.ora
# listener.ora Network Configuration File: /data1/app/oracle/product/11.2.0/db_1/network/admin/listener.ora
# Generated by Oracle configuration tools.
LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
      #本机IP:
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.84.173)(PORT = 1521)) 
      #虚拟IP:
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.125.231)(PORT = 1521))
    )
  )
SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = orac)
      (ORACLE_HOME = /data1/app/oracle/product/11.2.0/db_1)
      (SID_NAME = orac)
    )
  )


ADR_BASE_LISTENER = /data1/app/oracle

问题②描述(主机名已脱敏)

当我去掉cxl-pcs01主机名的监听IP后还是有问题:VIP居然和物理IP不是一个网段的,之前没有考虑到此方面原因,遂将虚拟IP更改为了同网段IP地址

[root@cxl-pcs01 ~]# pcs resource update clustervip01 Ipaddr2 ip=192.168.84.167 cidr_netmask=24 op monitor interval=30s

[oracle@cxl-pcs01 admin]$ cat listener.ora
# listener.ora Network Configuration File: /data1/app/oracle/product/11.2.0/db_1/network/admin/listener.ora
# Generated by Oracle configuration tools.
LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
      #虚拟IP:
      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.84.167)(PORT = 1521))
    )
  )
SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (GLOBAL_DBNAME = orac)
      (ORACLE_HOME = /data1/app/oracle/product/11.2.0/db_1)
      (SID_NAME = orac)
    )
  )


ADR_BASE_LISTENER = /data1/app/oracle

问题③描述(主机名已脱敏)

但是还是不好使,没道理啊!我又删除了资源重新创建并更改了资源依赖和启动顺序,但和这个没啥关系

[root@cxl-pcs01 ~]#pcs resource delete orac
[root@cxl-pcs01 ~]#pcs resource delete listener_orac
[root@cxl-pcs01 ~]# pcs resource create orac oracle sid="orac" --group=oracle
Assumed agent name 'ocf:heartbeat:oracle' (deduced from 'oracle')
[root@cxl-pcs01 ~]# pcs constraint colocation add orac with listener_orac
[root@cxl-pcs01 ~]# pcs constraint order start listener_orac then start orac
Adding listener_orac orac (kind: Mandatory) (Options: first-action=start then-action=start)

[root@cxl-pcs01 ~]# pcs constraint show --full
Location Constraints:
  Resource: clustervip01
    Enabled on: cxl-pcs01 (score:INFINITY) (role: Started) (id:cli-prefer-clustervip01)
  Resource: vg01
    Enabled on: cxl-pcs01 (score:INFINITY) (role: Started) (id:cli-prefer-vg01)
  Resource: vg02
    Enabled on: cxl-pcs01 (score:INFINITY) (role: Started) (id:cli-prefer-vg02)
Ordering Constraints:
  start clustervip01 then start vg01 (kind:Mandatory) (id:order-clustervip01-vg01-mandatory)
  start clustervip01 then start vg02 (kind:Mandatory) (id:order-clustervip01-vg02-mandatory)
  start vg01 then start data1 (kind:Mandatory) (id:order-vg01-data1-mandatory)
  start vg02 then start data2 (kind:Mandatory) (id:order-vg02-data2-mandatory)
  start data1 then start orac (kind:Mandatory) (id:order-data1-listener_orac-mandatory)
  start orac then start listener_orac (kind:Mandatory) (id:order-listener_orac-orac-mandatory)
Colocation Constraints:
  vg01 with clustervip01 (score:INFINITY) (id:colocation-vg01-clustervip01-INFINITY)
  vg02 with clustervip01 (score:INFINITY) (id:colocation-vg02-clustervip01-INFINITY)
  data1 with vg01 (score:INFINITY) (id:colocation-data1-vg01-INFINITY)
  data2 with vg02 (score:INFINITY) (id:colocation-data2-vg02-INFINITY)
  orac with data1 (score:INFINITY) (id:colocation-listener_orac-data1-INFINITY)
  listener_orac with orac (score:INFINITY) (id:colocation-orac-listener_orac-INFINITY)
Ticket Constraints:

更改资源顺序后发现确实只是监听有问题,HA主动给停止了!!!我手动debug启动监听资源发现以下问题:
> stderr: Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.125.231)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orac)))

[root@cxl-pcs01 ~]# pcs resource debug-start listener_orac
Operation start for listener_orac (ocf:heartbeat:oralsnr) returned: 'unknown error' (1)
 >  stderr: ocf-exit-reason:tnsping orac failed: 
 >  stderr: TNS Ping Utility for Linux: Version 11.2.0.4.0 - Production on 14-SEP-2022 18:34:36
 >  stderr: 
 >  stderr: Copyright (c) 1997, 2013, Oracle.  All rights reserved.
 >  stderr: 
 >  stderr: Used parameter files:
 >  stderr: /data1/app/oracle/product/11.2.0/db_1/network/admin/sqlnet.ora
 >  stderr: 
 >  stderr: 
 >  stderr: Used TNSNAMES adapter to resolve the alias
 >  stderr: Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.125.231)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orac)))
 >  stderr: TNS-12535: TNS:operation timed out
 >  stderr: ocf-exit-reason:Listener listener appears to have started, but is not running properly: 
 >  stderr: LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 14-SEP-2022 18:34:36
 >  stderr: 
 >  stderr: Copyright (c) 1991, 2013, Oracle.  All rights reserved.
 >  stderr: 
 >  stderr: Starting /data1/app/oracle/product/11.2.0/db_1/bin/tnslsnr: please wait...
 >  stderr: 
 >  stderr: TNSLSNR for Linux: Version 11.2.0.4.0 - Production
 >  stderr: System parameter file is /data1/app/oracle/product/11.2.0/db_1/network/admin/listener.ora
 >  stderr: Log messages written to /data1/app/oracle/diag/tnslsnr/cxl-pcs01/listener/alert/log.xml
 >  stderr: Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
 >  stderr: Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.84.167)(PORT=1521)))
 >  stderr: 
 >  stderr: Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))
 >  stderr: STATUS of the LISTENER
 >  stderr: ------------------------
 >  stderr: Alias                     listener
 >  stderr: Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
 >  stderr: Start Date                14-SEP-2022 18:34:36
 >  stderr: Uptime                    0 days 0 hr. 0 min. 0 sec
 >  stderr: Trace Level               off
 >  stderr: Security                  ON: Local OS Authentication
 >  stderr: SNMP                      OFF
 >  stderr: Listener Parameter File   /data1/app/oracle/product/11.2.0/db_1/network/admin/listener.ora
 >  stderr: Listener Log File         /data1/app/oracle/diag/tnslsnr/cxl-pcs01/listener/alert/log.xml
 >  stderr: Listening Endpoints Summary...
 >  stderr:   (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
 >  stderr:   (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.84.167)(PORT=1521)))
 >  stderr: Services Summary...
 >  stderr: Service "orac" has 1 instance(s).
 >  stderr:   Instance "orac", status UNKNOWN, has 1 handler(s) for this service...
 >  stderr: The command completed successfully
 >  stderr: Last login: Wed Sep 14 18:28:25 CST 2022 on pts/0
 >  stderr: Sep 14 18:35:36 ERROR: Probable Oracle configuration error

我更改了网段,怎么还有125网段的IP提示,一拍脑袋,tnsname.ora里的配置没有改!
修改后结果:

[oracle@cxl-pcs01 admin]$ cat tnsnames.ora 
# tnsnames.ora Network Configuration File: /data1/app/oracle/product/11.2.0/db_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

ORAC =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.84.167)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orac)
    )
  )

这次最后重启pacemaker服务后没问题了。


原因分析:

提示:这里填写问题的分析:

在这里插入图片描述
原因是pacemaker在启监听时候,oraLsnr会去tnsping服务名,如果不通就宕掉,主动关闭了监听服务。所以要更改tnsname.ora的配置文件。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

DBA狗剩儿

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值