该楼层疑似违规已被系统折叠 隐藏此楼查看此楼
系统:Red Hat 4
数据库:10.2.0.5
环境:RAC+ASM
问题描述:节点2 服务器 宕机,节点1 数据库 也没法使用。SQLPLUS进去提示 连接到空闲实例。然后,我尝试了一下重启CRS,发现实例hung在NOMOUNT状态。最后启动节点2服务器,节点1,节点2都正常启动,并且可以正常提供服务。 以下是节点1 alter日志
(不好意思,等级不够,没法上传附件,辛苦大家将就看下)
Sun Dec 06 01:53:23 CST 2015
IPC Send timeout detected.Sender: ospid 23060
Receiver: inst 2 binc 429457193 ospid 28587
Sun Dec 06 01:53:55 CST 2015
IPC Send timeout to 1.0 inc 4 for msg type 8 from opid 20
Sun Dec 06 01:53:55 CST 2015
Communications reconfiguration: instance_number 2
Sun Dec 06 01:54:25 CST 2015
Trace dumping is performing id=[cdmp_20151206015355]
Sun Dec 06 01:54:26 CST 2015
Trace dumping is performing id=[cdmp_20151206015355]
Sun Dec 06 01:55:45 CST 2015
Evicting instance 2 from cluster
Sun Dec 06 01:56:09 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 01:56:29 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 01:56:32 CST 2015
IPC Send timeout detected.Sender: ospid 23052
Receiver: inst 2 binc 429457193 ospid 28587
Sun Dec 06 01:56:49 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 01:57:00 CST 2015
IPC Send timeout to 1.0 inc 4 for msg type 12 from opid 19
Sun Dec 06 01:57:09 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 01:57:29 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 01:57:49 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 01:58:09 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 01:58:12 CST 2015
IPC Send timeout detected.Sender: ospid 23011
Receiver: inst 2 binc 429457193 ospid 28587
Sun Dec 06 01:58:29 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 01:58:38 CST 2015
IPC Send timeout to 1.0 inc 4 for msg type 65521 from opid 6
Sun Dec 06 01:58:49 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 01:59:09 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 01:59:29 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 01:59:49 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:00:08 CST 2015
IPC Send timeout detected.Sender: ospid 1757
Receiver: inst 2 binc 429457241 ospid 28589
Sun Dec 06 02:00:09 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:00:29 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:00:39 CST 2015
IPC Send timeout to 1.1 inc 4 for msg type 65521 from opid 28
Sun Dec 06 02:00:49 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:01:09 CST 2015
Waiting for instances to leave:
2
IPC Send timeout detected.Sender: ospid 23009
Receiver: inst 2 binc 429457241 ospid 28589
Sun Dec 06 02:01:39 CST 2015
IPC Send timeout to 1.1 inc 4 for msg type 2 from opid 5
IPC Send timeout detected.Sender: ospid 23009
Receiver: inst 2 binc 429457349 ospid 28580
Sun Dec 06 02:02:07 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:02:27 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:02:47 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:03:07 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:03:10 CST 2015
IPC Send timeout detected.Sender: ospid 12974
Receiver: inst 2 binc 429457241 ospid 28589
Sun Dec 06 02:03:27 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:03:41 CST 2015
IPC Send timeout to 1.1 inc 4 for msg type 32 from opid 50
Sun Dec 06 02:03:47 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:04:07 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:04:25 CST 2015
IPC Send timeout detected.Sender: ospid 23035
Receiver: inst 2 binc 429457241 ospid 28589
Sun Dec 06 02:04:27 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:04:47 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:04:50 CST 2015
IPC Send timeout to 1.1 inc 4 for msg type 65521 from opid 12
Sun Dec 06 02:05:07 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:05:24 CST 2015
IPC Send timeout detected.Sender: ospid 23018
Receiver: inst 2 binc 429457241 ospid 28589
Sun Dec 06 02:05:27 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:05:36 CST 2015
IPC Send timeout detected.Sender: ospid 23013
Receiver: inst 2 binc 429457241 ospid 28589
Sun Dec 06 02:05:39 CST 2015
IPC Send timeout detected.Sender: ospid 29391
Receiver: inst 2 binc 429457193 ospid 28587
Sun Dec 06 02:05:47 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:05:49 CST 2015
IPC Send timeout to 1.1 inc 4 for msg type 65521 from opid 8
Sun Dec 06 02:06:03 CST 2015
IPC Send timeout to 1.1 inc 4 for msg type 73 from opid 7
Sun Dec 06 02:06:07 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:06:11 CST 2015
IPC Send timeout to 1.0 inc 4 for msg type 16 from opid 42
Sun Dec 06 02:06:27 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:06:47 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:07:07 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:07:12 CST 2015
IPC Send timeout detected.Sender: ospid 15618
Receiver: inst 2 binc 429457193 ospid 28587
Sun Dec 06 02:07:27 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:07:39 CST 2015
IPC Send timeout to 1.0 inc 4 for msg type 12 from opid 48
Sun Dec 06 02:07:47 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:08:07 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:08:27 CST 2015
Waiting for instances to leave:
2
……
Errors in file /oracle/product/db_1/admin/odi/bdump/odi1_q001_973.trc:
ORA-00018: maximum number of sessions exceeded
Sun Dec 06 02:44:49 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:45:09 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 02:45:23 CST 2015
Errors in file /oracle/product/db_1/admin/odi/bdump/odi1_q000_29469.trc:
ORA-00018: maximum number of sessions exceeded
……
Sun Dec 06 07:39:46 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 07:40:06 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 07:40:26 CST 2015
Waiting for instances to leave:
2
……
Sun Dec 06 11:53:12 CST 2015
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=52
Sun Dec 06 11:53:19 CST 2015
Waiting for instances to leave:
2System State dumped to trace file /oracle/product/db_1/admin/odi/udump/odi1_ora_385.trc
Sun Dec 06 11:53:19 CST 2015
Waiting for instances to leave:
2
Sun Dec 06 11:53:40 CST 2015
Waiting for instances to leave:
2
……
Sun Dec 06 13:01:04 CST 2015
Shutting down instance (abort)
License high water mark = 269
Sun Dec 06 13:01:04 CST 2015
LGWR waiting for instance termination
Sun Dec 06 13:01:06 CST 2015
Instance terminated by USER, pid = 10631
Sun Dec 06 13:16:15 CST 2015
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 eth1 100.100.100.0 configured from OCR for use as a cluster interconnect
Interface type 1 eth0 10.150.241.0 configured from OCR for use as a public interface
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST
LICENSE_MAX_USERS = 0
SYS auditing is enabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.5.0.
System parameters with non-default values:
processes = 500
sessions = 555
sga_max_size = 12884901888
__shared_pool_size = 2466250752
__large_pool_size = 33554432
__java_pool_size = 33554432
__streams_pool_size = 16777216
spfile = +DATA/odi/spfileodi.ora
nls_language = AMERICAN
sga_target = 12884901888
control_files = +DATA/odi/controlfile/current.260.713399465, +RECOVERY/odi/controlfile/current.256.713399465
db_block_size = 8192
__db_cache_size = 10317987840
compatible = 10.2.0.3.0
db_file_multiblock_read_count= 16
cluster_database = TRUE
cluster_database_instances= 2
db_create_file_dest = +DATA
db_recovery_file_dest = +RECOVERY
db_recovery_file_dest_size= 1073741824
thread = 1
instance_number = 1
undo_management = AUTO
undo_tablespace = UNDOTBS1
_smu_debug_mode = 0
remote_login_passwordfile= EXCLUSIVE
audit_sys_operations = TRUE
db_domain =
dispatchers =
local_listener = odi1
remote_listener = LISTENERS_ODI
job_queue_processes = 10
background_dump_dest = /oracle/product/db_1/admin/odi/bdump
user_dump_dest = /oracle/product/db_1/admin/odi/udump
core_dump_dest = /oracle/product/db_1/admin/odi/cdump
audit_file_dest = /oracle/product/db_1/admin/odi/adump
audit_trail = OS
db_name = odi
open_cursors = 300
pga_aggregate_target = 3363831808
Cluster communication is configured to use the following interface(s) for this instance
100.100.100.125
Sun Dec 06 13:16:16 CST 2015
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=18901
DIAG started with pid=3, OS id=18903
PSP0 started with pid=4, OS id=18910
LMON started with pid=5, OS id=18912
LMD0 started with pid=6, OS id=18914
LMS0 started with pid=7, OS id=18916
LMS1 started with pid=8, OS id=18920
LMS2 started with pid=9, OS id=18924
LMS3 started with pid=10, OS id=18928
LMS4 started with pid=11, OS id=18932
LMS5 started with pid=12, OS id=18936
MMAN started with pid=13, OS id=18940
DBW0 started with pid=14, OS id=18958
DBW1 started with pid=15, OS id=19148
DBW2 started with pid=16, OS id=19207
LGWR started with pid=17, OS id=19211
CKPT started with pid=18, OS id=19214
SMON started with pid=19, OS id=19218
RECO started with pid=20, OS id=19220
CJQ0 started with pid=21, OS id=19222
MMON started with pid=22, OS id=19224
MMNL started with pid=23, OS id=19227
Sun Dec 06 13:16:17 CST 2015
lmon registered with NM - instance id 1 (internal mem no 0)
Sun Dec 06 13:18:48 CST 2015
oracle@ypodidb1 (LMON) (ospid: 18912) detects hung instances during IMR reconfiguration
oracle@ypodidb1 (LMON) (ospid: 18912) tries to kill the instance 2.
Please check instance 2's alert log and LMON trace file for more details.
Sun Dec 06 13:20:03 CST 2015
Remote instance kill is issued with system inc 0 and reason 0x20000000
Remote instance kill map (size 1) : 2
Sun Dec 06 13:21:17 CST 2015
Error: KGXGN polling error (15)
Sun Dec 06 13:21:17 CST 2015
Errors in file /oracle/product/db_1/admin/odi/udump/odi1_ora_18345.trc:
ORA-00600: internal error code, arguments: [ksqsgn:join], [error in lmon process], [32], [], [], [], [], []
Sun Dec 06 13:21:17 CST 2015
Errors in file /oracle/product/db_1/admin/odi/bdump/odi1_lmon_18912.trc:
ORA-29702: error occurred in Cluster Group Service operation
LMON: terminating instance due to error 29702
Sun Dec 06 13:21:25 CST 2015
Shutting down instance (abort)
License high water mark = 1
Sun Dec 06 13:21:27 CST 2015
Instance terminated by LMON, pid = 18912
Sun Dec 06 13:21:30 CST 2015
Instance terminated by USER, pid = 19385
Sun Dec 06 13:30:46 CST 2015
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 eth1 100.100.100.0 configured from OCR for use as a cluster interconnect
Interface type 1 eth0 10.150.241.0 configured from OCR for use as a public interface
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST
LICENSE_MAX_USERS = 0
SYS auditing is enabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.5.0.
System parameters with non-default values:
processes = 500
sessions = 555
sga_max_size = 12884901888
__shared_pool_size = 2466250752
__large_pool_size = 33554432
__java_pool_size = 33554432
__streams_pool_size = 16777216
spfile = +DATA/odi/spfileodi.ora
nls_language = AMERICAN
sga_target = 12884901888
control_files = +DATA/odi/controlfile/current.260.713399465, +RECOVERY/odi/controlfile/current.256.713399465
db_block_size = 8192
__db_cache_size = 10317987840
compatible = 10.2.0.3.0
db_file_multiblock_read_count= 16
cluster_database = TRUE
cluster_database_instances= 2
db_create_file_dest = +DATA
db_recovery_file_dest = +RECOVERY
db_recovery_file_dest_size= 1073741824
thread = 1
instance_number = 1
undo_management = AUTO
undo_tablespace = UNDOTBS1
_smu_debug_mode = 0
remote_login_passwordfile= EXCLUSIVE
audit_sys_operations = TRUE
db_domain =
dispatchers =
local_listener = odi1
remote_listener = LISTENERS_ODI
job_queue_processes = 10
background_dump_dest = /oracle/product/db_1/admin/odi/bdump
user_dump_dest = /oracle/product/db_1/admin/odi/udump
core_dump_dest = /oracle/product/db_1/admin/odi/cdump
audit_file_dest = /oracle/product/db_1/admin/odi/adump
audit_trail = OS
db_name = odi
open_cursors = 300
pga_aggregate_target = 3363831808
Cluster communication is configured to use the following interface(s) for this instance
100.100.100.125
Sun Dec 06 13:30:47 CST 2015
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=6220
DIAG started with pid=3, OS id=6222
PSP0 started with pid=4, OS id=6224
LMON started with pid=5, OS id=6226
LMD0 started with pid=6, OS id=6228
LMS0 started with pid=7, OS id=6230
LMS1 started with pid=8, OS id=6234
LMS2 started with pid=9, OS id=6238
LMS3 started with pid=10, OS id=6242
LMS4 started with pid=11, OS id=6246
LMS5 started with pid=12, OS id=6250
MMAN started with pid=13, OS id=6259
DBW0 started with pid=14, OS id=6261
DBW1 started with pid=15, OS id=6263
DBW2 started with pid=16, OS id=6265
LGWR started with pid=17, OS id=6267
CKPT started with pid=18, OS id=6269
SMON started with pid=19, OS id=6271
RECO started with pid=20, OS id=6273
CJQ0 started with pid=21, OS id=6275
MMON started with pid=22, OS id=6277
MMNL started with pid=23, OS id=6279
Sun Dec 06 13:30:49 CST 2015
lmon registered with NM - instance id 1 (internal mem no 0)
Sun Dec 06 13:33:20 CST 2015
oracle@ypodidb1 (LMON) (ospid: 6226) detects hung instances during IMR reconfiguration
oracle@ypodidb1 (LMON) (ospid: 6226) tries to kill the instance 2.
Please check instance 2's alert log and LMON trace file for more details.
Sun Dec 06 13:34:35 CST 2015
Remote instance kill is issued with system inc 0 and reason 0x20000000
Remote instance kill map (size 1) : 2
Sun Dec 06 13:35:49 CST 2015
Error: KGXGN polling error (15)
Sun Dec 06 13:35:49 CST 2015
Errors in file /oracle/product/db_1/admin/odi/bdump/odi1_lmon_6226.trc:
ORA-29702: error occurred in Cluster Group Service operation
LMON: terminating instance due to error 29702
Instance terminated by LMON, pid = 6226
日志写到这里就没有继续了
接下来是节点1 crsd日志:
2015-12-06 01:25:14.947: [ OCRSRV][1220598112]Failure in renaming file [/oracle/product/crs_1/cdata/crs/13263954] to [/oracle/product/crs_1/cdata/crs/backup00.ocr]
2015-12-06 05:25:17.718: [ OCRSRV][1220598112]Failure in renaming file [/oracle/product/crs_1/cdata/crs/41982163] to [/oracle/product/crs_1/cdata/crs/backup00.ocr]
2015-12-06 09:25:18.526: [ OCRSRV][1220598112]Failure in renaming file [/oracle/product/crs_1/cdata/crs/35192301] to [/oracle/product/crs_1/cdata/crs/backup00.ocr]
2015-12-06 12:53:16.047: [ CRSRES][1522735456]0ora.ypodidb1.gsd target set to OFFLINE before stop action
2015-12-06 12:53:16.047: [ CRSRES][1522735456]0StopResource: setting CLI values
2015-12-06 12:53:16.050: [ CRSRES][1545816416]0ora.ypodidb1.ons target set to OFFLINE before stop action
2015-12-06 12:53:16.050: [ CRSRES][1545816416]0StopResource: setting CLI values
2015-12-06 12:53:16.053: [ CRSRES][1522735456]0Attempting to stop `ora.ypodidb1.gsd` on member `ypodidb1`
2015-12-06 12:53:16.066: [ CRSRES][1493350752]0ora.ypodidb2.gsd target set to OFFLINE before stop action
2015-12-06 12:53:16.066: [ CRSRES][1493350752]0StopResource: setting CLI values
2015-12-06 12:53:16.069: [ CRSRES][1495452000]0ora.ypodidb2.ons target set to OFFLINE before stop action
2015-12-06 12:53:16.069: [ CRSRES][1495452000]0StopResource: setting CLI values
2015-12-06 12:53:16.074: [ CRSRES][1497553248]0ora.odi.db target set to OFFLINE before stop action
2015-12-06 12:53:16.074: [ CRSRES][1497553248]0StopResource: setting CLI values
2015-12-06 12:53:16.083: [ CRSRES][1545816416]0Attempting to stop `ora.ypodidb1.ons` on member `ypodidb1`
2015-12-06 12:53:16.086: [ CRSRES][1497553248]0Attempting to stop `ora.odi.db` on member `ypodidb1`
2015-12-06 12:53:16.174: [ CRSRES][1550018912]0ora.odi.odi1.inst target set to OFFLINE before stop action
2015-12-06 12:53:16.174: [ CRSRES][1550018912]0StopResource: setting CLI values
2015-12-06 12:53:16.179: [ CRSRES][1554221408]0ora.odi.odi2.inst target set to OFFLINE before stop action
2015-12-06 12:53:16.179: [ CRSRES][1554221408]0StopResource: setting CLI values
2015-12-06 12:53:16.194: [ CRSRES][1550018912]0Attempting to stop `ora.odi.odi1.inst` on member `ypodidb1`
2015-12-06 12:53:16.330: [ CRSRES][1522735456]0Stop of `ora.ypodidb1.gsd` on member `ypodidb1` succeeded.
2015-12-06 12:53:16.355: [ CRSRES][1545816416]0Stop of `ora.ypodidb1.ons` on member `ypodidb1` succeeded.
2015-12-06 12:53:23.600: [ CRSAPP][1550018912]0StopResource error for ora.odi.odi1.inst error code = 1
2015-12-06 12:53:23.608: [ CRSRES][1550018912][ALERT]0`ora.odi.odi1.inst` on member `ypodidb1` has experienced an unrecoverable failure.
2015-12-06 12:53:23.608: [ CRSRES][1550018912]0Human intervention required to resume its availability.
2015-12-06 13:01:01.927: [ CRSRES][1545816416]0StopResource: setting CLI values
2015-12-06 13:01:01.947: [ CRSRES][1545816416]0Attempting to stop `ora.odi.odi1.inst` on member `ypodidb1`
2015-12-06 13:01:40.045: [ CRSRES][1545816416]0Stop of `ora.odi.odi1.inst` on member `ypodidb1` succeeded.
2015-12-06 13:03:46.182: [ CRSEVT][1497553248]0CAAMonitorHandler :: 0:Could not join /oracle/product/crs_1/bin/racgwrap(stop)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child
2015-12-06 13:03:46.182: [ CRSEVT][1497553248]0CAAMonitorHandler :: 0:Action Script /oracle/product/crs_1/bin/racgwrap(stop) timed out for ora.odi.db! (timeout=600)
2015-12-06 13:03:46.182: [ CRSAPP][1497553248]0StopResource error for ora.odi.db error code = -2
2015-12-06 13:03:46.186: [ CRSRES][1497553248][ALERT]0`ora.odi.db` on member `ypodidb1` has experienced an unrecoverable failure.
2015-12-06 13:03:46.186: [ CRSRES][1497553248]0Human intervention required to resume its availability.
2015-12-06 13:16:13.534: [ CRSRES][1545816416]0startRunnable: setting CLI values
2015-12-06 13:16:13.544: [ CRSRES][1545816416]0Attempting to start `ora.odi.odi1.inst` on member `ypodidb1`
2015-12-06 13:21:25.196: [ CRSAPP][1545816416]0StartResource error for ora.odi.odi1.inst error code = 1
2015-12-06 13:21:31.706: [ CRSRES][1545816416]0Start of `ora.odi.odi1.inst` on member `ypodidb1` failed.
2015-12-06 13:25:20.297: [ OCRSRV][1220598112]Failure in renaming file [/oracle/product/crs_1/cdata/crs/18000758] to [/oracle/product/crs_1/cdata/crs/backup00.ocr]