本帖最后由 鸣雏之叶1 于 2015-4-2 15:13 编辑
数据库版本:10204
操作系统版本:红帽4.7 64位
遇到一台双节点的rac,间隔几个月就会因为心跳原因导致某一个节点重启(随机),今天15点:00再次发生了,给大家贴一下15点左右的各方面的日志,我的理解在最后面
数据库节点1日志(注意红字部分):
Mon Mar 30 14:13:57 2015
Thread 1 advanced to log sequence 16643 (LGWR switch)
Current log# 8 seq# 16643 mem# 0: +ORADATA_DG/newsdb/onlinelog/group_8.287.710176595
Current log# 8 seq# 16643 mem# 1: +RECOVERY_DG/newsdb/onlinelog/group_8.264.710176595
Mon Mar 30 15:02:18 2015
Reconfiguration started (old inc 4, new inc 6)
List of nodes:
0
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon Mar 30 15:02:18 2015
LMS 1: 1 GCS shadows cancelled, 0 closed
Mon Mar 30 15:02:18 2015
LMS 0: 3 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Mon Mar 30 15:02:18 2015
Instance recovery: looking for dead threads
Mon Mar 30 15:02:18 2015
Beginning instance recovery of 1 threads
Mon Mar 30 15:02:19 2015
LMS 0: 135888 GCS shadows traversed, 0 replayed
Mon Mar 30 15:02:19 2015
LMS 1: 136628 GCS shadows traversed, 0 replayed
Mon Mar 30 15:02:19 2015
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Mon Mar 30 15:02:19 2015
parallel recovery started with 7 processes
Mon Mar 30 15:02:19 2015
Started redo scan
Mon Mar 30 15:02:19 2015
Completed redo scan
10256 redo blocks read, 1346 data blocks need recovery
Mon Mar 30 15:02:21 2015
Started redo application at
Thread 2: logseq 16182, block 148446
Mon Mar 30 15:02:21 2015
Recovery of Online Redo Log: Thread 2 Group 9 Seq 16182 Reading mem 0
Mem# 0: +ORADATA_DG/newsdb/onlinelog/group_9.288.710176595
Mem# 1: +RECOVERY_DG/newsdb/onlinelog/group_9.265.710176597
Mon Mar 30 15:02:21 2015
Completed redo application
Mon Mar 30 15:02:21 2015
Completed instance recovery at
Thread 2: logseq 16182, block 158702, scn 3238289658
1078 data blocks read, 1414 data blocks written, 10256 redo blocks read
Mon Mar 30 15:02:21 2015
Thread 2 advanced to log sequence 16183 (thread recovery)
Mon Mar 30 15:04:50 2015
Thread 1 advanced to log sequence 16644 (LGWR switch)
Current log# 2 seq# 16644 mem# 0: +ORADATA_DG/newsdb/onlinelog/group_2.262.710169121
Current log# 2 seq# 16644 mem# 1: +RECOVERY_DG/newsdb/onlinelog/group_2.258.710169121
Mon Mar 30 15:05:03 2015
Reconfiguration started (old inc 6, new inc 8)
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon Mar 30 15:05:03 2015
LMS 1: 0 GCS shadows cancelled, 0 closed
Mon Mar 30 15:05:03 2015
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Mon Mar 30 15:05:03 2015
LMS 1: 8632 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 0: 8563 GCS shadows traversed, 4001 replayed
LMS 0: 8601 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 1: 8606 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 0: 8524 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 1: 8628 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 0: 8589 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 1: 8522 GCS shadows traversed, 4001 replayed
LMS 1: 8550 GCS shadows traversed, 4001 replayed
LMS 1: 8580 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 0: 8755 GCS shadows traversed, 4001 replayed
LMS 0: 8514 GCS shadows traversed, 4001 replayed
LMS 0: 8549 GCS shadows traversed, 4001 replayed
LMS 0: 8642 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 1: 8560 GCS shadows traversed, 4001 replayed
LMS 1: 8551 GCS shadows traversed, 4001 replayed
LMS 1: 8673 GCS shadows traversed, 4001 replayed
LMS 1: 8567 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 0: 8470 GCS shadows traversed, 4001 replayed
LMS 0: 8615 GCS shadows traversed, 4001 replayed
LMS 0: 8671 GCS shadows traversed, 4001 replayed
LMS 0: 8601 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 1: 8599 GCS shadows traversed, 4001 replayed
LMS 1: 8637 GCS shadows traversed, 4001 replayed
LMS 1: 8657 GCS shadows traversed, 4001 replayed
LMS 1: 8621 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 0: 8535 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 1: 8539 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 0: 8561 GCS shadows traversed, 4001 replayed
Mon Mar 30 15:05:03 2015
LMS 1: 8323 GCS shadows traversed, 3875 replayed
Mon Mar 30 15:05:03 2015
LMS 0: 8566 GCS shadows traversed, 4001 replayed
LMS 0: 7752 GCS shadows traversed, 3587 replayed
Mon Mar 30 15:05:03 2015
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
Mon Mar 30 15:11:45 2015
db_recovery_file_dest_size of 102400 MB is 1.23% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Mon Mar 30 16:03:34 2015
Thread 1 advanced to log sequence 16645 (LGWR switch)
Current log# 1 seq# 16645 mem# 0: +ORADATA_DG/newsdb/onlinelog/group_1.261.710169121
Current log# 1 seq# 16645 mem# 1: +RECOVERY_DG/newsdb/onlinelog/group_1.257.710169121
节点2的警告日志(无节点重启的记录信息):
Mon Mar 30 12:34:12 2015
Thread 2 advanced to log sequence 16182 (LGWR switch)
Current log# 9 seq# 16182 mem# 0: +ORADATA_DG/newsdb/onlinelog/group_9.288.710176595
Current log# 9 seq# 16182 mem# 1: +RECOVERY_DG/newsdb/onlinelog/group_9.265.710176597
Mon Mar 30 15:21:36 2015
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 bond1 100.100.101.0 configured from OCR for use as a cluster interconnect
Interface type 1 bond0 192.168.2.0 configured from OCR for use as a public interface
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters with non-default values:
processes = 800
sessions = 885
shared_pool_size = 805306368
large_pool_size = 218103808
java_pool_size = 167772160
spfile = +ORADATA_DG/newsdb/spfilenewsdb.ora
nls_language = SIMPLIFIED CHINESE
nls_territory = CHINA
control_files = +ORADATA_DG/newsdb/controlfile/current.260.710169117, +RECOVERY_DG/newsdb/controlfile/current.256.710169117
db_block_size = 8192
db_cache_size = 3221225472
compatible = 10.2.0.3.0
log_archive_dest_1 = LOCATION=+ORADATA_DG/
log_archive_format = %t_%s_%r.dbf
db_file_multiblock_read_count= 16
cluster_database = TRUE
cluster_database_instances= 2
db_create_file_dest = +ORADATA_DG
db_recovery_file_dest = +RECOVERY_DG
db_recovery_file_dest_size= 107374182400
thread = 2
instance_number = 2
undo_management = AUTO
undo_tablespace = UNDOTBS2
remote_login_passwordfile= EXCLUSIVE
db_domain =
dispatchers = (PROTOCOL=TCP) (SERVICE=newsdbXDB)
local_listener = (ADDRESS = (PROTOCOL = TCP)(HOST = newsrac2-vip)(PORT = 1521))
remote_listener = LISTENERS_NEWSDB
job_queue_processes = 10
cursor_sharing = SIMILAR
background_dump_dest = /opt/app/admin/newsdb/bdump
user_dump_dest = /opt/app/admin/newsdb/udump
core_dump_dest = /opt/app/admin/newsdb/cdump
audit_file_dest = /opt/app/admin/newsdb/adump
db_name = newsdb
open_cursors = 600
pga_aggregate_target = 536870912
Cluster communication is configured to use the following interface(s) for this instance
100.100.101.8
Mon Mar 30 15:21:37 2015
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=29763
DIAG started with pid=3, OS id=29765
PSP0 started with pid=4, OS id=29767
LMON started with pid=5, OS id=29769
LMD0 started with pid=6, OS id=29771
LMS0 started with pid=7, OS id=29773
LMS1 started with pid=8, OS id=29777
MMAN started with pid=9, OS id=29781
DBW0 started with pid=10, OS id=29783
LGWR started with pid=11, OS id=29785
CKPT started with pid=12, OS id=29792
SMON started with pid=13, OS id=29794
RECO started with pid=14, OS id=29796
CJQ0 started with pid=15, OS id=29798
MMON started with pid=16, OS id=29800
Mon Mar 30 15:21:37 2015
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=17, OS id=29802
Mon Mar 30 15:21:37 2015
starting up 1 shared server(s) ...
Mon Mar 30 15:21:38 2015
lmon registered with NM - instance id 2 (internal mem no 1)
Mon Mar 30 15:21:39 2015
Reconfiguration started (old inc 0, new inc 8)
List of nodes:
0 1
Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
Communication channels reestablished
* domain 0 valid = 1 according to instance 0
Mon Mar 30 15:21:39 2015
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon Mar 30 15:21:39 2015
LMS 0: 0 GCS shadows cancelled, 0 closed
Mon Mar 30 15:21:39 2015
LMS 1: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Mon Mar 30 15:21:39 2015
LMS 0: 0 GCS shadows traversed, 0 replayed
Mon Mar 30 15:21:39 2015
LMS 1: 0 GCS shadows traversed, 0 replayed
Mon Mar 30 15:21:39 2015
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=20, OS id=29819
Mon Mar 30 15:21:40 2015
ALTER DATABASE MOUNT
Mon Mar 30 15:21:40 2015
Starting background process ASMB
ASMB started with pid=22, OS id=29830
Starting background process RBAL
RBAL started with pid=23, OS id=29834
Loaded ASM Library - Generic Linux, version 2.0.2 (KABI_V2) library for asmlib interface
Mon Mar 30 15:21:43 2015
SUCCESS: diskgroup ORADATA_DG was mounted
SUCCESS: diskgroup RECOVERY_DG was mounted
Mon Mar 30 15:21:47 2015
Setting recovery target incarnation to 2
Mon Mar 30 15:21:48 2015
Successful mount of redo thread 2, with mount id 1061939212
Mon Mar 30 15:21:48 2015
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Completed: ALTER DATABASE MOUNT
Mon Mar 30 15:21:48 2015
ALTER DATABASE OPEN
Picked broadcast on commit scheme to generate SCNs
Mon Mar 30 15:21:48 2015
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=26, OS id=29879
Mon Mar 30 15:21:48 2015
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=27, OS id=29883
Mon Mar 30 15:21:48 2015
Thread 2 opened at log sequence 16183
Current log# 10 seq# 16183 mem# 0: +ORADATA_DG/newsdb/onlinelog/group_10.289.710176597
Current log# 10 seq# 16183 mem# 1: +RECOVERY_DG/newsdb/onlinelog/group_10.266.710176597
Successful open of redo thread 2
Mon Mar 30 15:21:48 2015
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Mon Mar 30 15:21:48 2015
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Mon Mar 30 15:21:48 2015
ARC0: Becoming the heartbeat ARCH
Mon Mar 30 15:21:48 2015
SMON: enabling cache recovery
Mon Mar 30 15:21:49 2015
Successfully onlined Undo Tablespace 5.
Mon Mar 30 15:21:49 2015
SMON: enabling tx recovery
Mon Mar 30 15:21:49 2015
Database Characterset is ZHS16GBK
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 8
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=30, OS id=29906
Mon Mar 30 15:21:52 2015
Completed: ALTER DATABASE OPEN
Mon Mar 30 15:21:55 2015
ALTER SYSTEM SET service_names='newsora' SCOPE=MEMORY SID='newsdb2';
Mon Mar 30 15:31:55 2015
ALTER SYSTEM SET service_names='newsora','newsdb' SCOPE=MEMORY SID='newsdb2';
Mon Mar 30 16:09:01 2015
Thread 2 advanced to log sequence 16184 (LGWR switch)
Current log# 11 seq# 16184 mem# 0: +ORADATA_DG/newsdb/onlinelog/group_11.290.710176599
Current log# 11 seq# 16184 mem# 1: +RECOVERY_DG/newsdb/onlinelog/group_11.267.710176599
接着看节点1的集群alert日志:
2014-12-06 21:43:56.758
[crsd(28613)]CRS-1205:Auto-start failed for the CRS resource . Details in newsrac1.
[cssd(29161)]CRS-1601:CSSD Reconfiguration complete. Active nodes are newsrac1 newsrac2 .
2015-03-30 15:01:45.687
[cssd(29161)]CRS-1612:node newsrac2 (2) at 50% heartbeat fatal, eviction in 29.254 seconds
2015-03-30 15:01:46.689
[cssd(29161)]CRS-1612:node newsrac2 (2) at 50% heartbeat fatal, eviction in 28.254 seconds
2015-03-30 15:02:00.715
[cssd(29161)]CRS-1611:node newsrac2 (2) at 75% heartbeat fatal, eviction in 14.234 seconds
2015-03-30 15:02:09.731
[cssd(29161)]CRS-1610:node newsrac2 (2) at 90% heartbeat fatal, eviction in 5.214 seconds
2015-03-30 15:02:10.733
[cssd(29161)]CRS-1610:node newsrac2 (2) at 90% heartbeat fatal, eviction in 4.214 seconds
2015-03-30 15:02:11.735
[cssd(29161)]CRS-1610:node newsrac2 (2) at 90% heartbeat fatal, eviction in 3.214 seconds
2015-03-30 15:02:12.737
[cssd(29161)]CRS-1610:node newsrac2 (2) at 90% heartbeat fatal, eviction in 2.204 seconds
2015-03-30 15:02:13.739
[cssd(29161)]CRS-1610:node newsrac2 (2) at 90% heartbeat fatal, eviction in 1.204 seconds
2015-03-30 15:02:14.742
[cssd(29161)]CRS-1610:node newsrac2 (2) at 90% heartbeat fatal, eviction in 0.204 seconds
2015-03-30 15:02:15.204
[cssd(29161)]CRS-1607:CSSD evicting node newsrac2. Details in /opt/app/oracle/product/10.2.0/crs/log/newsrac1/cssd/ocssd.log.
[cssd(29161)]CRS-1601:CSSD Reconfiguration complete. Active nodes are newsrac1 .
2015-03-30 15:02:18.837
[crsd(28613)]CRS-1204:Recovering CRS resources for node newsrac2.
[cssd(29161)]CRS-1601:CSSD Reconfiguration complete. Active nodes are newsrac1 newsrac2 .
根据提示,下面是ocssd.log日志:
[ CSSD]2014-12-06 21:47:25.809 [1231087968] >TRACE: clssgmReconfigThread: completed for reconfig(2), with status(1)
[ CSSD]2015-03-30 15:01:15.629 [1147169120] >WARNING: clssnmeventhndlr: Receive failure with node 2 (newsrac2), state 3, con(0x82e580), probe((nil)), rc=11
[ CSSD]2015-03-30 15:01:15.629 [1147169120] >TRACE: clssnmDiscHelper: newsrac2, node(2) connection failed, con (0x82e580), probe((nil))
[ CSSD]2015-03-30 15:01:15.629 [1189128544] >TRACE: clssgmPeerDeactivate: node 2 (newsrac2), death 0, state 0x1 connstate 0xf
[ CSSD]2015-03-30 15:01:45.687 [1199618400] >WARNING: clssnmPollingThread: node newsrac2 (2) at 50 3.064630e-317artbeat fatal, eviction in 29.510 seconds
[ CSSD]2015-03-30 15:01:45.687 [1199618400] >TRACE: clssnmPollingThread: node newsrac2 (2) is impending reconfig, flag 1, misstime 30490
[ CSSD]2015-03-30 15:01:45.687 [1199618400] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[ CSSD]2015-03-30 15:01:46.689 [1199618400] >WARNING: clssnmPollingThread: node newsrac2 (2) at 50 3.119214e-317artbeat fatal, eviction in 28.510 seconds
[ CSSD]2015-03-30 15:02:00.715 [1199618400] >WARNING: clssnmPollingThread: node newsrac2 (2) at 75 3.119238e-317artbeat fatal, eviction in 14.490 seconds
[ CSSD]2015-03-30 15:02:09.731 [1199618400] >WARNING: clssnmPollingThread: node newsrac2 (2) at 90 3.119262e-317artbeat fatal, eviction in 5.470 seconds
[ CSSD]2015-03-30 15:02:10.733 [1199618400] >WARNING: clssnmPollingThread: node newsrac2 (2) at 90 3.119285e-317artbeat fatal, eviction in 4.470 seconds
[ CSSD]2015-03-30 15:02:11.735 [1199618400] >WARNING: clssnmPollingThread: node newsrac2 (2) at 90 3.119309e-317artbeat fatal, eviction in 3.470 seconds
[ CSSD]2015-03-30 15:02:12.737 [1199618400] >WARNING: clssnmPollingThread: node newsrac2 (2) at 90 3.119333e-317artbeat fatal, eviction in 2.460 seconds
[ CSSD]2015-03-30 15:02:13.739 [1199618400] >WARNING: clssnmPollingThread: node newsrac2 (2) at 90 3.119357e-317artbeat fatal, eviction in 1.460 seconds
[ CSSD]2015-03-30 15:02:14.742 [1199618400] >WARNING: clssnmPollingThread: node newsrac2 (2) at 90 3.119380e-317artbeat fatal, eviction in 0.460 seconds
[ CSSD]2015-03-30 15:02:15.204 [1199618400] >TRACE: clssnmPollingThread: Eviction started for node newsrac2 (2), flags 0x0001, state 3, wt4c 0
[ CSSD]2015-03-30 15:02:15.204 [1220598112] >TRACE: clssnmDoSyncUpdate: Initiating sync 3
[ CSSD]2015-03-30 15:02:15.204 [1220598112] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[ CSSD]2015-03-30 15:02:15.204 [1220598112] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2015-03-30 15:02:15.204 [1220598112] >TRACE: clssnmSetupAckWait: node(1) is ALIVE
[ CSSD]2015-03-30 15:02:15.204 [1220598112] >TRACE: clssnmSendSync: syncSeqNo(3)
[ CSSD]2015-03-30 15:02:15.204 [1220598112] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1)
[ CSSD]2015-03-30 15:02:15.204 [1147169120] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2015-03-30 15:02:15.204 [1147169120] >TRACE: clssnmHandleSync: Acknowledging sync: src[1] srcName[newsrac1] seq[9] sync[3]
[ CSSD]2015-03-30 15:02:15.204 [1220598112] >TRACE: clssnmWaitForAcks: done, msg type(11)
[ CSSD]2015-03-30 15:02:15.204 [1220598112] >TRACE: clssnmDoSyncUpdate: Terminating node 2, newsrac2, misstime(60000) state(5)
[ CSSD]2015-03-30 15:02:15.204 [1220598112] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2015-03-30 15:02:15.204 [1220598112] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE
[ CSSD]2015-03-30 15:02:15.204 [2538647328] >USER: NMEVENT_SUSPEND [00][00][00][06]
[ CSSD]2015-03-30 15:02:15.204 [1220598112] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1)
节点2的集群警告日志(没有任何信息):
2014-12-06 21:47:23.494
[crsd(28610)]CRS-1201:CRSD started on node newsrac2.
2015-03-30 15:21:20.843
[cssd(29310)]CRS-1605:CSSD voting file is online: /dev/raw/raw2. Details in /opt/app/oracle/product/10.2.0/crs/log/newsrac2/cssd/ocssd.log.
[cssd(29310)]CRS-1601:CSSD Reconfiguration complete. Active nodes are newsrac1 newsrac2 .
2015-03-30 15:21:25.153
[crsd(28755)]CRS-1012:The OCR service started on node newsrac2.
2015-03-30 15:21:25.164
[evmd(28716)]CRS-1401:EVMD started on node newsrac2.
2015-03-30 15:21:26.303
[crsd(28755)]CRS-1201:CRSD started on node newsrac2.
然后查看节点1系统日志:
Mar 30 12:36:41 newsrac1 su(pam_unix)[6177]: session closed for user oracle
Mar 30 15:01:21 newsrac1 kernel: lpfc 0000:0b:00.0: 0:1305 Link Down Event x4 received Data: x4 x20 x0
Mar 30 15:01:21 newsrac1 kernel: lpfc 0000:0b:00.0: 0:1303 Link Up Event x5 received Data: x5 x1 x10 x3
Mar 30 15:01:21 newsrac1 kernel: lpfc 0000:0b:00.1: 1:1305 Link Down Event x4 received Data: x4 x20 x0
Mar 30 15:01:21 newsrac1 kernel: lpfc 0000:0b:00.1: 1:1303 Link Up Event x5 received Data: x5 x1 x10 x3
Mar 30 15:03:03 newsrac1 kernel: lpfc 0000:0b:00.1: 1:1305 Link Down Event x6 received Data: x6 x20 x0
Mar 30 15:03:03 newsrac1 kernel: lpfc 0000:0b:00.1: 1:1303 Link Up Event x7 received Data: x7 x1 x10 x4
Mar 30 15:03:09 newsrac1 kernel: lpfc 0000:0b:00.0: 0:1305 Link Down Event x6 received Data: x6 x20 x0
Mar 30 15:03:09 newsrac1 kernel: lpfc 0000:0b:00.0: 0:1303 Link Up Event x7 received Data: x7 x1 x10 x4
节点2的系统日志:
Mar 29 04:03:07 newsrac2 syslogd 1.4.1: restart.
Mar 29 22:00:01 newsrac2 su(pam_unix)[26861]: session opened for user oracle by (uid=0)
Mar 29 22:20:22 newsrac2 su(pam_unix)[26861]: session closed for user oracle
Mar 30 14:59:42 newsrac2 kernel: ocssd.bin[29196]: segfault at 0000000000000008 rip 0000002a96210251 rsp 0000000045006820 error 6
Mar 30 14:59:42 newsrac2 logger: Oracle CSSD failure 139.
Mar 30 15:20:38 newsrac2 syslogd 1.4.1: restart.
Mar 30 15:20:38 newsrac2 syslog: syslogd startup succeeded
Mar 30 15:20:38 newsrac2 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Mar 30 15:20:38 newsrac2 kernel: Bootdata ok (command line is ro root=LABEL=/ rhgb quiet)
Mar 30 15:20:38 newsrac2 kernel: Linux version 2.6.9-67.ELsmp (brewbuilder@hs20-bc1-5.build.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-8)) #1 SMP Wed Nov 7 13:56:44 EST 2007
Mar 30 15:20:38 newsrac2 kernel: BIOS-provided physical RAM map:
Mar 30 15:20:38 newsrac2 kernel: BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
Mar 30 15:20:38 newsrac2 kernel: BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
自己理解:因为节点2的警告日志没什么信息,根据节点1的alert日志判断有两种原因:
1,首先两节点网络心跳出现问题,导致其中一节点重启,但是那样的话节点2中没有记录信息很奇怪
2,由于节点2突然重启,导致节点1出现心跳错误,在节点2message里的14:59的信息也预示了是该原因,
于是想根据这两条信息去搞清楚节点2重启的原因,去support查看说是bug,google上也有说升级下glibc包,想问下各位有什么见解。
日志附件:
150330.zip
(1.73 MB, 下载次数: 3)
2015-4-2 15:13 上传
点击文件名下载附件
日志附件