主机硬件问题导致rac节点重启

昨晚,rac节点重启,虽未影响应用,但需查明原因

1,查看数据库日志alert.log,显示数据库直接重启,重启之前没有任何日志

2012-11-11 06:00:00.091000 +08:00
Setting Resource Manager plan SCHEDULER[0x310D]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Starting background process VKRM
VKRM started with pid=60, OS id=23499
2012-11-11 06:00:06.599000 +08:00
Begin automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
2012-11-11 06:01:16.131000 +08:00
End automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"
2012-11-11 22:28:42.709000 +08:00
Adjusting the default value of parameter parallel_max_servers
from 1280 to 985 due to the value of parameter processes (1000)
Starting ORACLE instance (normal)
****************** Huge Pages Information *****************
Huge Pages memory pool detected (total: 35840 free: 35840)
DFLT Huge Pages allocation successful (allocated: 3001)
***********************************************************
2012-11-11 22:28:43.755000 +08:00
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
2012-11-11 22:28:50.135000 +08:00
Private Interface 'bond1:1' configured from GPnP for use as a private interconnect.
  [name='bond1:1', type=1, ip=169.254.61.86, mac=00-1b-21-d5-26-b0, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'bond0' configured from GPnP for use as a public interface.
  [name='bond0', type=1, ip=10.4.124.235, mac=e4-1f-13-80-57-c1, net=10.4.124.224/27, mask=255.255.255.224, use=public/1]
Public Interface 'bond0:1' configured from GPnP for use as a public interface.
  [name='bond0:1', type=1, ip=10.4.124.245, mac=e4-1f-13-80-57-c1, net=10.4.124.224/27, mask=255.255.255.224, use=public/1]
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
Using parameter settings in server-side pfile /oracle/app/oracle/product/11.2.0/db_1/dbs/initSMPDB3.ora
System parameters with non-default values:

ASM log

2012-11-11 22:28:05.078000 +08:00
* instance_number obtained from CSS = 3, checking for the existence of node 0...
* node 0 does not exist. instance_number = 3
Starting ORACLE instance (normal)

2,linux系统日志/var/log/error和messages

error,疑点是memory crash kernel

Nov 11 22:22:21 dtydb5 kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
Nov 11 22:22:45 dtydb5 automount[17304]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
Nov 11 22:26:32 dtydb5 ntpd[19555]: 10.7.0.81 is inappropriate address for the fudge command, line ignored
Nov 11 22:26:33 dtydb5 logger: Oracle HA daemon is enabled for autostart.
Nov 11 22:26:34 dtydb5 logger: exec /oracle/11.2.0/grid/perl/bin/perl -I/oracle/11.2.0/grid/perl/lib /oracle/11.2.0/grid/bin/crswrapexece.pl /oracle/11.2.0/grid/crs/install/s_crsconfig_dtydb5_env.txt /oracle/11.2.0/grid/bin/ohasd.bin "reboot"
Nov 11 22:27:07 dtydb5 smartd[20467]: Problem creating device name scan list
Nov 11 22:27:56 dtydb5 multipathd: asm!.asm_ctl_spec: failed to store path info
Nov 11 22:27:56 dtydb5 multipathd: uevent trigger error
Nov 11 22:27:56 dtydb5 multipathd: asm!.asm_ctl_vmb: failed to store path info
Nov 11 22:27:56 dtydb5 multipathd: uevent trigger error
Nov 11 22:27:56 dtydb5 multipathd: asm!.asm_ctl_vdbg: failed to store path info
mesages 22:18 syslogd 重启,应该没啥问题

Nov 11 22:22:18 dtydb5 syslogd 1.4.1: restart.
Nov 11 22:22:19 dtydb5 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Nov 11 22:22:19 dtydb5 kernel: Linux version 2.6.18-194.el5 (mockbuild@x86-005.build.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Tue Mar 16 21:52:39 EDT 2010
Nov 11 22:22:19 dtydb5 kernel: Command line: ro root=/dev/rootvg/LogVol00 rhgb quiet
Nov 11 22:22:19 dtydb5 kernel: BIOS-provided physical RAM map:

3,rac日志,主要还是怀疑rac 节点被剔除重启导致服务器重启

crsd 日志:/oracle/11.2.0/grid/log/dtydb5/crsd/crsdOUT.log

2012-11-11 22:28:14
Changing directory to /oracle/11.2.0/grid/log/dtydb5/crsd
2012-11-11 22:28:14
CRSD REBOOT
/oracle/11.2.0/grid/log/dtydb5/crsd/crsd.l01
2012-11-11 22:20:20.413: [UiServer][1171753280] {3:22096:3634} Sending message to PE. ctx= 0xd671ea0
2012-11-11 22:20:20.414: [   CRSPE][1169652032] {3:22096:3634} Processing PE command id=593485. Description: [Stat Resource : 0x2aaaadda9a60]
2012-11-11 22:20:20.418: [UiServer][1171753280] {3:22096:3634} Done for ctx=0xd671ea0
2012-11-11 22:28:14.786: [ default][900772256] First attempt: init CSS context succeeded.
[  clsdmt][1087560000]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=dtydb5DBG_CRSD))
2012-11-11 22:28:14.791: [  clsdmt][1087560000]PID for the Process [21647], connkey 1
2012-11-11 22:28:14.792: [  clsdmt][1087560000]Creating PID [21647] file for home /oracle/11.2.0/grid host dtydb5 bin crs to /oracle/11.2.0/grid/crs/init/
2012-11-11 22:28:14.792: [  clsdmt][1087560000]Writing PID [21647] to the file [/oracle/11.2.0/grid/crs/init/dtydb5.pid]
2012-11-11 22:28:15.308: [ default][1087560000] Policy Engine is not initialized yet!
2012-11-11 22:28:15.308: [ default][900772256] CRS Daemon Starting
2012-11-11 22:28:15.311: [ default][900772256] ENV Logging level for Module: AGENT  1
2012-11-11 22:28:15.311: [ default][900772256] ENV Logging level for Module: AGFW  0
2012-11-11 22:28:15.311: [ default][900772256] ENV Logging level for Module: CLSFRAME  0

ohasd.log :/oracle/11.2.0/grid/log/dtydb5/ohasd/ohasd.log

2012-11-11 22:27:08.498: [ default][3640775072] OHASD Daemon Starting. Command string :reboot
2012-11-11 22:27:08.500: [ default][3640775072] Initializing OLR
2012-11-11 22:27:08.520: [  OCRRAW][3640775072]proprioo: for disk 0 (/oracle/11.2.0/grid/cdata/dtydb5.olr), id match (1), total id sets, (1) need recover (0), my votes (0), total votes (0), commit_lsn (4630), lsn (4630)
2012-11-11 22:27:08.520: [  OCRRAW][3640775072]proprioo: my id set: (931531576, 1028247821, 0, 0, 0)
2012-11-11 22:27:08.520: [  OCRRAW][3640775072]proprioo: 1st set: (931531576, 1028247821, 0, 0, 0)
2012-11-11 22:27:08.520: [  OCRRAW][3640775072]proprioo: 2nd set: (0, 0, 0, 0, 0)
2012-11-11 22:27:08.551: [ default][3640775072] Running mode check...
2012-11-11 22:27:08.551: [ default][3640775072] OHASD running as the Privileged user

2012-11-11 22:27:08.551: [ default][3640775072] Loading debug levels...
2012-11-11 22:27:08.553: [ default][3640775072] OCR Logging level for Module: AGFW  0
2012-11-11 22:27:08.554: [ default][3640775072] OCR Logging level for Module: CLSFRAME  0
2012-11-11 22:27:08.554: [ default][3640775072] OCR Logging level for Module: CLSVER  0
2012-11-11 22:27:08.554: [ default][3640775072] OCR Logging level for Module: CLUCLS  0
2012-11-11 22:27:08.555: [ default][3640775072] OCR Logging level for Module: CRSAPP  0
2012-11-11 22:27:08.555: [ default][3640775072] OCR Logging level for Module: CRSCCL  0

crs alert alertdtydb5.log

2012-11-11 22:27:08.548
[ohasd(19651)]CRS-2112:The OLR service started on node dtydb5.
2012-11-11 22:27:08.620
[ohasd(19651)]CRS-1301:Oracle High Availability Service started on node dtydb5.
2012-11-11 22:27:08.647
[ohasd(19651)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
2012-11-11 22:27:10.481
[/oracle/11.2.0/grid/bin/oraagent.bin(20785)]CRS-5815:Agent '/oracle/11.2.0/grid/bin/oraagent_grid' could not find any base type entry points for type 'ora.daemon.type'. Details at (:CRSAGF00108:) {0:2:2} in /oracle/11.2.0/grid/log/dtydb5/agent/ohasd/oraagent_grid/oraagent_grid.log.
2012-11-11 22:27:10.592
[/oracle/11.2.0/grid/bin/oraagent.bin(20785)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/oracle/11.2.0/grid/log/dtydb5/agent/ohasd/oraagent_grid/oraagent_grid.log"
2012-11-11 22:27:11.496
2012-11-11 22:27:11.496
[/oracle/11.2.0/grid/bin/orarootagent.bin(20781)]CRS-5016:Process "/oracle/11.2.0/grid/bin/acfsload" spawned by agent "/oracle/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/oracle/11.2.0/grid/log/dtydb5/agent/ohasd/orarootagent_root/orarootagent_root.log"
2012-11-11 22:27:26.622
[/oracle/11.2.0/grid/bin/oraagent.bin(20912)]CRS-5815:Agent '/oracle/11.2.0/grid/bin/oraagent_grid' could not find any base type entry points for type 'ora.daemon.type'. Details at (:CRSAGF00108:) {0:5:2} in /oracle/11.2.0/grid/log/dtydb5/agent/ohasd/oraagent_grid/oraagent_grid.log.
2012-11-11 22:27:29.974
[gpnpd(20934)]CRS-2328:GPNPD started on node dtydb5.
经检查,无网络和磁盘方面的问题,也无其它问题

4,系统方面无问题,只能看看服务器硬件方面了

登录web登录服务器的管理口,方面如下内容,问题基本可以确定了,硬件报错CPU 4:Cache error occurred.,这个问题只能硬件工程师来了

E 30 11/11/2012 22:19:21 OEM Event OEM Event CPU 4:Cache error occurred.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值