服务器节点状态down,节点2 服务器 宕机,节点1 数据库 无法提供服务

该楼层疑似违规已被系统折叠 隐藏此楼查看此楼

系统:Red Hat 4

数据库:10.2.0.5

环境:RAC+ASM

问题描述:节点2 服务器 宕机,节点1 数据库 也没法使用。SQLPLUS进去提示 连接到空闲实例。然后,我尝试了一下重启CRS,发现实例hung在NOMOUNT状态。最后启动节点2服务器,节点1,节点2都正常启动,并且可以正常提供服务。 以下是节点1 alter日志

(不好意思,等级不够,没法上传附件,辛苦大家将就看下)

Sun Dec 06 01:53:23 CST 2015

IPC Send timeout detected.Sender: ospid 23060

Receiver: inst 2 binc 429457193 ospid 28587

Sun Dec 06 01:53:55 CST 2015

IPC Send timeout to 1.0 inc 4 for msg type 8 from opid 20

Sun Dec 06 01:53:55 CST 2015

Communications reconfiguration: instance_number 2

Sun Dec 06 01:54:25 CST 2015

Trace dumping is performing id=[cdmp_20151206015355]

Sun Dec 06 01:54:26 CST 2015

Trace dumping is performing id=[cdmp_20151206015355]

Sun Dec 06 01:55:45 CST 2015

Evicting instance 2 from cluster

Sun Dec 06 01:56:09 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 01:56:29 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 01:56:32 CST 2015

IPC Send timeout detected.Sender: ospid 23052

Receiver: inst 2 binc 429457193 ospid 28587

Sun Dec 06 01:56:49 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 01:57:00 CST 2015

IPC Send timeout to 1.0 inc 4 for msg type 12 from opid 19

Sun Dec 06 01:57:09 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 01:57:29 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 01:57:49 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 01:58:09 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 01:58:12 CST 2015

IPC Send timeout detected.Sender: ospid 23011

Receiver: inst 2 binc 429457193 ospid 28587

Sun Dec 06 01:58:29 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 01:58:38 CST 2015

IPC Send timeout to 1.0 inc 4 for msg type 65521 from opid 6

Sun Dec 06 01:58:49 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 01:59:09 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 01:59:29 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 01:59:49 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:00:08 CST 2015

IPC Send timeout detected.Sender: ospid 1757

Receiver: inst 2 binc 429457241 ospid 28589

Sun Dec 06 02:00:09 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:00:29 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:00:39 CST 2015

IPC Send timeout to 1.1 inc 4 for msg type 65521 from opid 28

Sun Dec 06 02:00:49 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:01:09 CST 2015

Waiting for instances to leave:

2

IPC Send timeout detected.Sender: ospid 23009

Receiver: inst 2 binc 429457241 ospid 28589

Sun Dec 06 02:01:39 CST 2015

IPC Send timeout to 1.1 inc 4 for msg type 2 from opid 5

IPC Send timeout detected.Sender: ospid 23009

Receiver: inst 2 binc 429457349 ospid 28580

Sun Dec 06 02:02:07 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:02:27 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:02:47 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:03:07 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:03:10 CST 2015

IPC Send timeout detected.Sender: ospid 12974

Receiver: inst 2 binc 429457241 ospid 28589

Sun Dec 06 02:03:27 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:03:41 CST 2015

IPC Send timeout to 1.1 inc 4 for msg type 32 from opid 50

Sun Dec 06 02:03:47 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:04:07 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:04:25 CST 2015

IPC Send timeout detected.Sender: ospid 23035

Receiver: inst 2 binc 429457241 ospid 28589

Sun Dec 06 02:04:27 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:04:47 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:04:50 CST 2015

IPC Send timeout to 1.1 inc 4 for msg type 65521 from opid 12

Sun Dec 06 02:05:07 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:05:24 CST 2015

IPC Send timeout detected.Sender: ospid 23018

Receiver: inst 2 binc 429457241 ospid 28589

Sun Dec 06 02:05:27 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:05:36 CST 2015

IPC Send timeout detected.Sender: ospid 23013

Receiver: inst 2 binc 429457241 ospid 28589

Sun Dec 06 02:05:39 CST 2015

IPC Send timeout detected.Sender: ospid 29391

Receiver: inst 2 binc 429457193 ospid 28587

Sun Dec 06 02:05:47 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:05:49 CST 2015

IPC Send timeout to 1.1 inc 4 for msg type 65521 from opid 8

Sun Dec 06 02:06:03 CST 2015

IPC Send timeout to 1.1 inc 4 for msg type 73 from opid 7

Sun Dec 06 02:06:07 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:06:11 CST 2015

IPC Send timeout to 1.0 inc 4 for msg type 16 from opid 42

Sun Dec 06 02:06:27 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:06:47 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:07:07 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:07:12 CST 2015

IPC Send timeout detected.Sender: ospid 15618

Receiver: inst 2 binc 429457193 ospid 28587

Sun Dec 06 02:07:27 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:07:39 CST 2015

IPC Send timeout to 1.0 inc 4 for msg type 12 from opid 48

Sun Dec 06 02:07:47 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:08:07 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:08:27 CST 2015

Waiting for instances to leave:

2

……

Errors in file /oracle/product/db_1/admin/odi/bdump/odi1_q001_973.trc:

ORA-00018: maximum number of sessions exceeded

Sun Dec 06 02:44:49 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:45:09 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 02:45:23 CST 2015

Errors in file /oracle/product/db_1/admin/odi/bdump/odi1_q000_29469.trc:

ORA-00018: maximum number of sessions exceeded

……

Sun Dec 06 07:39:46 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 07:40:06 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 07:40:26 CST 2015

Waiting for instances to leave:

2

……

Sun Dec 06 11:53:12 CST 2015

>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=52

Sun Dec 06 11:53:19 CST 2015

Waiting for instances to leave:

2System State dumped to trace file /oracle/product/db_1/admin/odi/udump/odi1_ora_385.trc

Sun Dec 06 11:53:19 CST 2015

Waiting for instances to leave:

2

Sun Dec 06 11:53:40 CST 2015

Waiting for instances to leave:

2

……

Sun Dec 06 13:01:04 CST 2015

Shutting down instance (abort)

License high water mark = 269

Sun Dec 06 13:01:04 CST 2015

LGWR waiting for instance termination

Sun Dec 06 13:01:06 CST 2015

Instance terminated by USER, pid = 10631

Sun Dec 06 13:16:15 CST 2015

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Interface type 1 eth1 100.100.100.0 configured from OCR for use as a cluster interconnect

Interface type 1 eth0 10.150.241.0 configured from OCR for use as a public interface

Picked latch-free SCN scheme 3

Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST

LICENSE_MAX_USERS = 0

SYS auditing is enabled

ksdpec: called for event 13740 prior to event group initialization

Starting up ORACLE RDBMS Version: 10.2.0.5.0.

System parameters with non-default values:

processes = 500

sessions = 555

sga_max_size = 12884901888

__shared_pool_size = 2466250752

__large_pool_size = 33554432

__java_pool_size = 33554432

__streams_pool_size = 16777216

spfile = +DATA/odi/spfileodi.ora

nls_language = AMERICAN

sga_target = 12884901888

control_files = +DATA/odi/controlfile/current.260.713399465, +RECOVERY/odi/controlfile/current.256.713399465

db_block_size = 8192

__db_cache_size = 10317987840

compatible = 10.2.0.3.0

db_file_multiblock_read_count= 16

cluster_database = TRUE

cluster_database_instances= 2

db_create_file_dest = +DATA

db_recovery_file_dest = +RECOVERY

db_recovery_file_dest_size= 1073741824

thread = 1

instance_number = 1

undo_management = AUTO

undo_tablespace = UNDOTBS1

_smu_debug_mode = 0

remote_login_passwordfile= EXCLUSIVE

audit_sys_operations = TRUE

db_domain =

dispatchers =

local_listener = odi1

remote_listener = LISTENERS_ODI

job_queue_processes = 10

background_dump_dest = /oracle/product/db_1/admin/odi/bdump

user_dump_dest = /oracle/product/db_1/admin/odi/udump

core_dump_dest = /oracle/product/db_1/admin/odi/cdump

audit_file_dest = /oracle/product/db_1/admin/odi/adump

audit_trail = OS

db_name = odi

open_cursors = 300

pga_aggregate_target = 3363831808

Cluster communication is configured to use the following interface(s) for this instance

100.100.100.125

Sun Dec 06 13:16:16 CST 2015

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

PMON started with pid=2, OS id=18901

DIAG started with pid=3, OS id=18903

PSP0 started with pid=4, OS id=18910

LMON started with pid=5, OS id=18912

LMD0 started with pid=6, OS id=18914

LMS0 started with pid=7, OS id=18916

LMS1 started with pid=8, OS id=18920

LMS2 started with pid=9, OS id=18924

LMS3 started with pid=10, OS id=18928

LMS4 started with pid=11, OS id=18932

LMS5 started with pid=12, OS id=18936

MMAN started with pid=13, OS id=18940

DBW0 started with pid=14, OS id=18958

DBW1 started with pid=15, OS id=19148

DBW2 started with pid=16, OS id=19207

LGWR started with pid=17, OS id=19211

CKPT started with pid=18, OS id=19214

SMON started with pid=19, OS id=19218

RECO started with pid=20, OS id=19220

CJQ0 started with pid=21, OS id=19222

MMON started with pid=22, OS id=19224

MMNL started with pid=23, OS id=19227

Sun Dec 06 13:16:17 CST 2015

lmon registered with NM - instance id 1 (internal mem no 0)

Sun Dec 06 13:18:48 CST 2015

oracle@ypodidb1 (LMON) (ospid: 18912) detects hung instances during IMR reconfiguration

oracle@ypodidb1 (LMON) (ospid: 18912) tries to kill the instance 2.

Please check instance 2's alert log and LMON trace file for more details.

Sun Dec 06 13:20:03 CST 2015

Remote instance kill is issued with system inc 0 and reason 0x20000000

Remote instance kill map (size 1) : 2

Sun Dec 06 13:21:17 CST 2015

Error: KGXGN polling error (15)

Sun Dec 06 13:21:17 CST 2015

Errors in file /oracle/product/db_1/admin/odi/udump/odi1_ora_18345.trc:

ORA-00600: internal error code, arguments: [ksqsgn:join], [error in lmon process], [32], [], [], [], [], []

Sun Dec 06 13:21:17 CST 2015

Errors in file /oracle/product/db_1/admin/odi/bdump/odi1_lmon_18912.trc:

ORA-29702: error occurred in Cluster Group Service operation

LMON: terminating instance due to error 29702

Sun Dec 06 13:21:25 CST 2015

Shutting down instance (abort)

License high water mark = 1

Sun Dec 06 13:21:27 CST 2015

Instance terminated by LMON, pid = 18912

Sun Dec 06 13:21:30 CST 2015

Instance terminated by USER, pid = 19385

Sun Dec 06 13:30:46 CST 2015

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Interface type 1 eth1 100.100.100.0 configured from OCR for use as a cluster interconnect

Interface type 1 eth0 10.150.241.0 configured from OCR for use as a public interface

Picked latch-free SCN scheme 3

Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST

LICENSE_MAX_USERS = 0

SYS auditing is enabled

ksdpec: called for event 13740 prior to event group initialization

Starting up ORACLE RDBMS Version: 10.2.0.5.0.

System parameters with non-default values:

processes = 500

sessions = 555

sga_max_size = 12884901888

__shared_pool_size = 2466250752

__large_pool_size = 33554432

__java_pool_size = 33554432

__streams_pool_size = 16777216

spfile = +DATA/odi/spfileodi.ora

nls_language = AMERICAN

sga_target = 12884901888

control_files = +DATA/odi/controlfile/current.260.713399465, +RECOVERY/odi/controlfile/current.256.713399465

db_block_size = 8192

__db_cache_size = 10317987840

compatible = 10.2.0.3.0

db_file_multiblock_read_count= 16

cluster_database = TRUE

cluster_database_instances= 2

db_create_file_dest = +DATA

db_recovery_file_dest = +RECOVERY

db_recovery_file_dest_size= 1073741824

thread = 1

instance_number = 1

undo_management = AUTO

undo_tablespace = UNDOTBS1

_smu_debug_mode = 0

remote_login_passwordfile= EXCLUSIVE

audit_sys_operations = TRUE

db_domain =

dispatchers =

local_listener = odi1

remote_listener = LISTENERS_ODI

job_queue_processes = 10

background_dump_dest = /oracle/product/db_1/admin/odi/bdump

user_dump_dest = /oracle/product/db_1/admin/odi/udump

core_dump_dest = /oracle/product/db_1/admin/odi/cdump

audit_file_dest = /oracle/product/db_1/admin/odi/adump

audit_trail = OS

db_name = odi

open_cursors = 300

pga_aggregate_target = 3363831808

Cluster communication is configured to use the following interface(s) for this instance

100.100.100.125

Sun Dec 06 13:30:47 CST 2015

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

PMON started with pid=2, OS id=6220

DIAG started with pid=3, OS id=6222

PSP0 started with pid=4, OS id=6224

LMON started with pid=5, OS id=6226

LMD0 started with pid=6, OS id=6228

LMS0 started with pid=7, OS id=6230

LMS1 started with pid=8, OS id=6234

LMS2 started with pid=9, OS id=6238

LMS3 started with pid=10, OS id=6242

LMS4 started with pid=11, OS id=6246

LMS5 started with pid=12, OS id=6250

MMAN started with pid=13, OS id=6259

DBW0 started with pid=14, OS id=6261

DBW1 started with pid=15, OS id=6263

DBW2 started with pid=16, OS id=6265

LGWR started with pid=17, OS id=6267

CKPT started with pid=18, OS id=6269

SMON started with pid=19, OS id=6271

RECO started with pid=20, OS id=6273

CJQ0 started with pid=21, OS id=6275

MMON started with pid=22, OS id=6277

MMNL started with pid=23, OS id=6279

Sun Dec 06 13:30:49 CST 2015

lmon registered with NM - instance id 1 (internal mem no 0)

Sun Dec 06 13:33:20 CST 2015

oracle@ypodidb1 (LMON) (ospid: 6226) detects hung instances during IMR reconfiguration

oracle@ypodidb1 (LMON) (ospid: 6226) tries to kill the instance 2.

Please check instance 2's alert log and LMON trace file for more details.

Sun Dec 06 13:34:35 CST 2015

Remote instance kill is issued with system inc 0 and reason 0x20000000

Remote instance kill map (size 1) : 2

Sun Dec 06 13:35:49 CST 2015

Error: KGXGN polling error (15)

Sun Dec 06 13:35:49 CST 2015

Errors in file /oracle/product/db_1/admin/odi/bdump/odi1_lmon_6226.trc:

ORA-29702: error occurred in Cluster Group Service operation

LMON: terminating instance due to error 29702

Instance terminated by LMON, pid = 6226

日志写到这里就没有继续了

接下来是节点1 crsd日志:

2015-12-06 01:25:14.947: [ OCRSRV][1220598112]Failure in renaming file [/oracle/product/crs_1/cdata/crs/13263954] to [/oracle/product/crs_1/cdata/crs/backup00.ocr]

2015-12-06 05:25:17.718: [ OCRSRV][1220598112]Failure in renaming file [/oracle/product/crs_1/cdata/crs/41982163] to [/oracle/product/crs_1/cdata/crs/backup00.ocr]

2015-12-06 09:25:18.526: [ OCRSRV][1220598112]Failure in renaming file [/oracle/product/crs_1/cdata/crs/35192301] to [/oracle/product/crs_1/cdata/crs/backup00.ocr]

2015-12-06 12:53:16.047: [ CRSRES][1522735456]0ora.ypodidb1.gsd target set to OFFLINE before stop action

2015-12-06 12:53:16.047: [ CRSRES][1522735456]0StopResource: setting CLI values

2015-12-06 12:53:16.050: [ CRSRES][1545816416]0ora.ypodidb1.ons target set to OFFLINE before stop action

2015-12-06 12:53:16.050: [ CRSRES][1545816416]0StopResource: setting CLI values

2015-12-06 12:53:16.053: [ CRSRES][1522735456]0Attempting to stop `ora.ypodidb1.gsd` on member `ypodidb1`

2015-12-06 12:53:16.066: [ CRSRES][1493350752]0ora.ypodidb2.gsd target set to OFFLINE before stop action

2015-12-06 12:53:16.066: [ CRSRES][1493350752]0StopResource: setting CLI values

2015-12-06 12:53:16.069: [ CRSRES][1495452000]0ora.ypodidb2.ons target set to OFFLINE before stop action

2015-12-06 12:53:16.069: [ CRSRES][1495452000]0StopResource: setting CLI values

2015-12-06 12:53:16.074: [ CRSRES][1497553248]0ora.odi.db target set to OFFLINE before stop action

2015-12-06 12:53:16.074: [ CRSRES][1497553248]0StopResource: setting CLI values

2015-12-06 12:53:16.083: [ CRSRES][1545816416]0Attempting to stop `ora.ypodidb1.ons` on member `ypodidb1`

2015-12-06 12:53:16.086: [ CRSRES][1497553248]0Attempting to stop `ora.odi.db` on member `ypodidb1`

2015-12-06 12:53:16.174: [ CRSRES][1550018912]0ora.odi.odi1.inst target set to OFFLINE before stop action

2015-12-06 12:53:16.174: [ CRSRES][1550018912]0StopResource: setting CLI values

2015-12-06 12:53:16.179: [ CRSRES][1554221408]0ora.odi.odi2.inst target set to OFFLINE before stop action

2015-12-06 12:53:16.179: [ CRSRES][1554221408]0StopResource: setting CLI values

2015-12-06 12:53:16.194: [ CRSRES][1550018912]0Attempting to stop `ora.odi.odi1.inst` on member `ypodidb1`

2015-12-06 12:53:16.330: [ CRSRES][1522735456]0Stop of `ora.ypodidb1.gsd` on member `ypodidb1` succeeded.

2015-12-06 12:53:16.355: [ CRSRES][1545816416]0Stop of `ora.ypodidb1.ons` on member `ypodidb1` succeeded.

2015-12-06 12:53:23.600: [ CRSAPP][1550018912]0StopResource error for ora.odi.odi1.inst error code = 1

2015-12-06 12:53:23.608: [ CRSRES][1550018912][ALERT]0`ora.odi.odi1.inst` on member `ypodidb1` has experienced an unrecoverable failure.

2015-12-06 12:53:23.608: [ CRSRES][1550018912]0Human intervention required to resume its availability.

2015-12-06 13:01:01.927: [ CRSRES][1545816416]0StopResource: setting CLI values

2015-12-06 13:01:01.947: [ CRSRES][1545816416]0Attempting to stop `ora.odi.odi1.inst` on member `ypodidb1`

2015-12-06 13:01:40.045: [ CRSRES][1545816416]0Stop of `ora.odi.odi1.inst` on member `ypodidb1` succeeded.

2015-12-06 13:03:46.182: [ CRSEVT][1497553248]0CAAMonitorHandler :: 0:Could not join /oracle/product/crs_1/bin/racgwrap(stop)

category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child

2015-12-06 13:03:46.182: [ CRSEVT][1497553248]0CAAMonitorHandler :: 0:Action Script /oracle/product/crs_1/bin/racgwrap(stop) timed out for ora.odi.db! (timeout=600)

2015-12-06 13:03:46.182: [ CRSAPP][1497553248]0StopResource error for ora.odi.db error code = -2

2015-12-06 13:03:46.186: [ CRSRES][1497553248][ALERT]0`ora.odi.db` on member `ypodidb1` has experienced an unrecoverable failure.

2015-12-06 13:03:46.186: [ CRSRES][1497553248]0Human intervention required to resume its availability.

2015-12-06 13:16:13.534: [ CRSRES][1545816416]0startRunnable: setting CLI values

2015-12-06 13:16:13.544: [ CRSRES][1545816416]0Attempting to start `ora.odi.odi1.inst` on member `ypodidb1`

2015-12-06 13:21:25.196: [ CRSAPP][1545816416]0StartResource error for ora.odi.odi1.inst error code = 1

2015-12-06 13:21:31.706: [ CRSRES][1545816416]0Start of `ora.odi.odi1.inst` on member `ypodidb1` failed.

2015-12-06 13:25:20.297: [ OCRSRV][1220598112]Failure in renaming file [/oracle/product/crs_1/cdata/crs/18000758] to [/oracle/product/crs_1/cdata/crs/backup00.ocr]

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值