Red Hat Linux AS3.0 U6 , DELL 2950
Oracle RAC , 9.2.0.7 , OCFS , RAC , 双节点
今天11点左右发现 NODE02 中的 /var/log/message 以及 oracle alert log 不能被访问,都是read only 模式,节点1没有问题, 然后测试其他命令,发现有些命令也不能正确使用 。 报错。 初步估计是OS问题。 赶紧到机房查看机器屏幕报错及是否磁盘问题 。 发现磁盘正常, Server屏幕报错信息如下:
EXT3-fs error ( device sd(8,5)) in start_transaction:Journal has aborted
EXT3-fs error ( device sd(8,5)) in start_transaction:Journal has aborted
EXT3-fs error ( device sd(8,8)) in start_transaction:Journal has aborted
EXT3-fs error ( device sd(8,8)) in start_transaction:Journal has aborted
EXT3-fs error ( device sd(8,8)) in start_transaction:Journal has aborted
想查询节点2上的 log messages 以及 linux dmesg 都不能打开 。ftp 也不能使用, 没有办法,直接将节点2上 /var/log 下的 messges 及 dmesg 拷贝到共享磁盘上, 然后在正常的节点1上拷贝到办公电脑,传输给DELL服务工程师 。 dmesg 的一些信息如下:
Device 08:40 not ready.
I/O error: dev 08:40, sector 0
Device 08:80 not ready.
I/O error: dev 08:80, sector 0
Device 08:60 not ready.
I/O error: dev 08:60, sector 0
Device 08:80 not ready.
I/O error: dev 08:80, sector 0
Device 08:60 not ready.
I/O error: dev 08:60, sector 0
Device 08:a0 not ready.
I/O error: dev 08:a0, sector 0
Device 08:a0 not ready.
I/O error: dev 08:a0, sector 0
Device 08:80 not ready.
I/O error: dev 08:80, sector 0
Device 08:80 not ready.
I/O error: dev 08:80, sector 0
Device 08:c0 not ready.
I/O error: dev 08:c0, sector 0
Device 08:a0 not ready.
I/O error: dev 08:a0, sector 0
Device 08:c0 not ready.
I/O error: dev 08:c0, sector 0
Device 08:a0 not ready.
I/O error: dev 08:a0, sector 0
Device 08:c0 not ready.
I/O error: dev 08:c0, sector 0
Device 08:c0 not ready.
I/O error: dev 08:c0, sector 0
Attached scsi generic sg0 at scsi0, channel 0, id 8, lun 0, type 13
SCSI device sdb: 555745280 512-byte hdwr sectors (284542 MB)
sdb: sdb1
SCSI device sdd: 524288000 512-byte hdwr sectors (268435 MB)
sdd: sdd1
SCSI device sdb: 555745280 512-byte hdwr sectors (284542 MB)
sdb: sdb1
SCSI device sdf: 545259520 512-byte hdwr sectors (279173 MB)
sdf: sdf1 sdf2 sdf3 sdf4
SCSI device sdd: 524288000 512-byte hdwr sectors (268435 MB)
sdd: sdd1
SCSI device sdh: 555745280 512-byte hdwr sectors (284542 MB)
sdh: sdh1
SCSI device sdf: 545259520 512-byte hdwr sectors (279173 MB)
sdf: sdf1 sdf2 sdf3 sdf4
SCSI device sdj: 524288000 512-byte hdwr sectors (268435 MB)
sdj: sdj1
SCSI device sdh: 555745280 512-byte hdwr sectors (284542 MB)
sdh: sdh1
SCSI device sdl: 545259520 512-byte hdwr sectors (279173 MB)
sdl: sdl1 sdl2 sdl3 sdl4
SCSI device sdj: 524288000 512-byte hdwr sectors (268435 MB)
sdj: sdj1
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,3), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,7), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,5), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,6), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,2), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
ide-floppy driver 0.99.newide
hda: attached ide-cdrom driver.
hda: ATAPI 24X DVD-ROM drive, 2048kB Cache, UDMA(33)
Uniform. CD-ROM driver Revision: 3.12
hda: DMA disabled
/var/log/ messages 的信息没有什么发现: 都是一些监控系统报的信息 。
Jan 13 19:44:46 DELL-RAC02 telnetd[10671]: ttloop: peer died: EOF
Jan 13 19:49:45 DELL-RAC02 telnetd[10889]: ttloop: peer died: EOF
Jan 13 19:54:45 DELL-RAC02 telnetd[11127]: ttloop: peer died: EOF
Jan 13 19:59:45 DELL-RAC02 telnetd[11379]: ttloop: peer died: EOF
Jan 13 20:04:46 DELL-RAC02 telnetd[11599]: ttloop: peer died: EOF
Jan 13 20:09:45 DELL-RAC02 telnetd[11765]: ttloop: peer died: EOF
Jan 13 20:14:45 DELL-RAC02 telnetd[11871]: ttloop: peer died: EOF
Jan 13 20:19:45 DELL-RAC02 telnetd[12028]: ttloop: peer died: EOF
Jan 13 20:24:45 DELL-RAC02 telnetd[12140]: ttloop: peer died: EOF
Jan 13 20:29:45 DELL-RAC02 telnetd[12308]: ttloop: peer died: EOF
Jan 13 20:34:46 DELL-RAC02 telnetd[12484]: ttloop: peer died: EOF
Jan 13 20:39:45 DELL-RAC02 telnetd[12678]: ttloop: peer died: EOF
Jan 13 20:44:45 DELL-RAC02 telnetd[12808]: ttloop: peer died: EOF
Jan 13 20:49:45 DELL-RAC02 telnetd[12975]: ttloop: peer died: EOF
Jan 13 20:54:45 DELL-RAC02 telnetd[13131]: ttloop: peer died: EOF
Jan 13 20:59:45 DELL-RAC02 telnetd[13301]: ttloop: peer died: EOF
Jan 13 21:04:47 DELL-RAC02 telnetd[13439]: ttloop: peer died: EOF
Jan 13 21:09:46 DELL-RAC02 telnetd[13650]: ttloop: peer died: EOF
Jan 13 21:14:46 DELL-RAC02 telnetd[13890]: ttloop: peer died: EOF
Jan 13 21:19:46 DELL-RAC02 telnetd[14099]: ttloop: peer died: EOF
Jan 13 21:24:45 DELL-RAC02 telnetd[14213]: ttloop: peer died: EOF
Jan 13 21:29:45 DELL-RAC02 telnetd[14347]: ttloop: peer died: EOF
Jan 13 21:34:46 DELL-RAC02 telnetd[14593]: ttloop: peer died: EOF
Jan 13 21:39:46 DELL-RAC02 telnetd[14885]: ttloop: peer died: EOF
----------------------------------------------------------------------------
基本可以排除是硬盘问题,因为面板上没有报错。 初步确定是文件系统出现错误,和Linux 有关。
查询了一下baidu , google , 大多数都是重新启动系统后OK的 ,我估计也没有太大问题。但是为了确认以及责任方面的问题, 还是得问问DELL工程是, dell工程师的反馈是让重新启动系统 。 具体原因也mail让他们查找 。回复说2.6 内核版本的linux有一些文件系统的bug , 但是2.4 的文件系统目前还没有bug 。 具体还要运行什么EMCGrab执行日志给他 (他们发的一个执行脚本), 具体原因还待查 。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/35489/viewspace-539597/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/35489/viewspace-539597/