ORACLE集群管理-DNS故障处理

最新推荐文章于 2023-04-18 09:14:22 发布

执笔画情ora

最新推荐文章于 2023-04-18 09:14:22 发布

阅读量389

点赞数

分类专栏： ORACLE数据库RAC管理

原文链接：https://yq.aliyun.com/articles/283742

版权

ORACLE数据库RAC管理专栏收录该内容

132 篇文章 5 订阅

订阅专栏

一、环境介绍：

这是一套四年前部署的RAC系统，之前运行一直很好，没有出过问题，平时基本处于无人管的状态。

OS:Redhat EnterPrise Linux 5.8 x86_x64

DB:Oracle Database EnterPrise 11.2.0.4

GI:Oracle Grid Infrastructure 11.2.0.4

二、问题描述：

昨天临近下班接到现场人员故障请求，描述为数据库无法连接，报ORA-12547:TNS: lost CONNECT。当时第一反应是网络和监听故障，让现场人员进行tnsping和ping都是正常的。

三、问题现象：

我到达现场后，首先查看了数据库的状态，发现数据库实例是停止运行状态，并且从日志中看不出明显报错；

Starting up:

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Data Mining

and Real Application Testing options.

ORACLE_HOME = /u01/app/oracle/11.2.0.4/product/db_1

System name: Linux

Node name: db01

Release: 3.8.13-44.1.1.el6uek.x86_64

Version: #2 SMP Wed Sep 10 06:10:25 PDT 2014

Machine: x86_64

VM name: VMWare Version: 6

Using parameter settings in server-side pfile /u01/app/oracle/11.2.0.4/product/db_1/dbs/initwoo1.ora

System parameters with non-default values:

processes = 600

sessions = 922

spfile = "+DATA/woo/spfilewoo.ora"

nls_language = "SIMPLIFIED CHINESE"

nls_territory = "CHINA"

memory_target = 1584M

control_files = "+DATA/woo/controlfile/current.260.930748953"

control_files = "+FRA01/woo/controlfile/current.256.930748953"

db_block_size = 8192

compatible = "11.2.0.4.0"

cluster_database = TRUE

db_create_file_dest = "+DATA"

db_recovery_file_dest = "+FRA01"

db_recovery_file_dest_size= 4407M

thread = 1

undo_tablespace = "UNDOTBS1"

instance_number = 1

remote_login_passwordfile= "EXCLUSIVE"

db_domain = ""

dispatchers = "(PROTOCOL=TCP) (SERVICE=wooXDB)"

remote_listener = "scan.prudentwoo.com:1521"

audit_file_dest = "/u01/app/oracle/admin/woo/adump"

audit_trail = "DB"

db_name = "woo"

open_cursors = 300

diagnostic_dest = "/u01/app/oracle"

Cluster communication is configured to use the following interface(s) for this instance

169.254.51.38

169.254.243.157

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

Fri Dec 16 15:24:55 2016

USER (ospid: 4044): terminating the instance due to error 119

Instance terminated by USER, pid = 404

数据库状态：

[oracle@db01 ~]$ crsctl status res -t

[oracle@db01 ~]$ srvctl status database -d woo

Instance woo1 is not running on node db01

Instance woo2 is not running on node db02

四、手工带起数据库：

[oracle@db01 trace]$ srvctl start database -d woo

PRCR-1079 : Failed to start resource ora.woo.db

CRS-5017: The resource action "ora.woo.db start" encountered the following error:

ORA-00119: invalid specification for system parameter REMOTE_LISTENER

ORA-00132: syntax error or unresolved network name 'scan.prudentwoo.com:1521'

. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0.4/product/grid/log/db02/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-5017: The resource action "ora.woo.db start" encountered the following error:

ORA-00119: invalid specification for system parameter REMOTE_LISTENER

ORA-00132: syntax error or unresolved network name 'scan.prudentwoo.com:1521'

. For details refer to "(:CLSN00107:)" in "/u01/app/11.2.0.4/product/grid/log/db01/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-2674: Start of 'ora.woo.db' on 'db02' failed

CRS-2674: Start of 'ora.woo.db' on 'db01' failed

CRS-2632: There are no more servers to try to place resource 'ora.woo.db' on that would satisfy its placement policy

五、问题分析：

我从启动数据库来看，发现数据库此时无法正常启动，并随着报ORA-00132，日志报error 119。

根据启动提示可以将问题定位到scan，因scan故障引起数据库无法正常启动。

六、检查scan配置信息：

#check scan info:

[oracle@db01 ~]$ srvctl config scan

SCAN name: scan.prudentwoo.com, Network: 1/192.168.84.0/255.255.255.0/eth0

SCAN VIP name: scan1, IP: /scan.prudentwoo.com/192.168.84.21

SCAN VIP name: scan2, IP: /scan.prudentwoo.com/192.168.84.22

SCAN VIP name: scan3, IP: /scan.prudentwoo.com/192.168.84.20

[oracle@db01 ~]$ ping 192.168.84.20 -c 2

PING 192.168.84.20 (192.168.84.20) 56(84) bytes of data.

64 bytes from 192.168.84.20: icmp_seq=1 ttl=64 time=0.032 ms

64 bytes from 192.168.84.20: icmp_seq=2 ttl=64 time=0.039 ms

--- 192.168.84.20 ping statistics ---

2 packets transmitted, 2 received, 0% packet loss, time 1000ms

rtt min/avg/max/mdev = 0.032/0.035/0.039/0.006 ms

[oracle@db01 ~]$ ping 192.168.84.21 -c 2

PING 192.168.84.21 (192.168.84.21) 56(84) bytes of data.

64 bytes from 192.168.84.21: icmp_seq=1 ttl=64 time=0.231 ms

64 bytes from 192.168.84.21: icmp_seq=2 ttl=64 time=0.292 ms

--- 192.168.84.21 ping statistics ---

2 packets transmitted, 2 received, 0% packet loss, time 1001ms

rtt min/avg/max/mdev = 0.231/0.261/0.292/0.034 ms

[oracle@db01 ~]$ ping 192.168.84.22 -c 2

PING 192.168.84.22 (192.168.84.22) 56(84) bytes of data.

64 bytes from 192.168.84.22: icmp_seq=1 ttl=64 time=0.024 ms

64 bytes from 192.168.84.22: icmp_seq=2 ttl=64 time=0.034 ms

--- 192.168.84.22 ping statistics ---

2 packets transmitted, 2 received, 0% packet loss, time 999ms

rtt min/avg/max/mdev = 0.024/0.029/0.034/0.005 ms

[oracle@db01 ~]$ ping scan.prudentwoo.com -c 2

ping: unknown host scan.prudentwoo.com

我们可以看到，现在scan对应的三个地址都是通的，说明SCAN的服务是好的，但是ping scan所对应的域名的时候报无法找到主机，无法解析域名，那么下一步可以定位应该是域名服务出问题了。

七、在两台数据库服务器上检查域名(dns)服务，结果是域名服务器没有在这两台数据服务器上：

#check dns client and server:

[oracle@db01 ~]$ /sbin/chkconfig --list|grep named

[oracle@db01 ~]$ ssh db02 '/sbin/chkconfig --list|grep named'

[oracle@db01 ~]$

check dns client:

[oracle@db01 ~]$ cat /etc/resolv.conf

search prudentwoo.com

nameserver 192.168.84.15

[oracle@db01 ~]$ ping 192.168.84.15 -c 2

PING 192.168.84.15 (192.168.84.15) 56(84) bytes of data.

From 192.168.84.11 icmp_seq=1 Destination Host Unreachable

From 192.168.84.11 icmp_seq=2 Destination Host Unreachable

--- 192.168.84.15 ping statistics ---

2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 3007ms

pipe 2

九、修复域名服务器，现在可以正常解析：

[oracle@db01 ~]$ ping scan.prudentwoo.com -c 2

PING scan.prudentwoo.com (192.168.84.21) 56(84) bytes of data.

64 bytes from scan.prudentwoo.com (192.168.84.21): icmp_seq=1 ttl=64 time=0.494 ms

64 bytes from scan.prudentwoo.com (192.168.84.21): icmp_seq=2 ttl=64 time=0.289 ms

--- scan.prudentwoo.com ping statistics ---

2 packets transmitted, 2 received, 0% packet loss, time 1001ms

rtt min/avg/max/mdev = 0.289/0.391/0.494/0.104 ms

十、再次启动数据库：

[oracle@db01 ~]$ srvctl start database -d woo

[oracle@db01 ~]$ srvctl status database -d woo

Instance woo1 is running on node db01

Instance woo2 is running on node db02

执笔画情ora

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ORACLE集群管理-DNS故障处理

一、环境介绍：这是一套四年前部署的RAC系统，之前运行一直很好，没有出过问题，平时基本处于无人管的状态。OS:Redhat EnterPrise Linux 5.8 x86_x64DB:Oracle Database EnterPrise 11.2.0.4GI:Oracle Grid Infrastructure 11.2.0.4二、问题描述：昨天临近下...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。