测试工作正在如火如荼的进行,突然数据库就连接不上了。我连接上主机发现数据库alert_sid日志中有如下信息:
KCF: write/open error block=0x9a6 online=1
file=2 /oracle_data1/UNDOTBS3.dbf
error=27072 txt: 'Linux Error: 5: Input/output error
Additional information: 2469'
Thu Dec 4 12:56:39 2008
Errors in file /opt/ora9/admin/tax/bdump/orcl_dbw0_9605.trc:
ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode
ORA-01114: IO error writing block to file 2 (block # 2470)
ORA-01110: data file 2: '/oracle_data1/UNDOTBS3.dbf'
ORA-27072: skgfdisp: I/O error
Linux Error: 5: Input/output error
Additional information: 2469
DBW0: terminating instance due to error 1242
Instance terminated by DBW0, pid = 9605
数据库已经down了。初步看是因为磁盘的IO错误。看看主机的日志吧。/var/log/message
Dec 4 12:52:10 tax smartd[2924]: Device: /dev/sdb, 2 Currently unreadable (pending) sectors
Dec 4 12:52:10 tax smartd[2924]: Device: /dev/sdb, 2 Offline uncorrectable sectors
Dec 4 12:56:39 tax kernel: ata1: command 0xca timeout, stat 0xd0 host_stat 0x61
Dec 4 12:56:39 tax kernel: ata1: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
Dec 4 12:56:39 tax kernel: ata1: status=0xd0 { Busy }
Dec 4 12:56:39 tax kernel: SCSI error : <0 0 1 0> return code = 0x8000002
Dec 4 12:56:39 tax kernel: Info fld=0x5b4b38b, Current sdb: sense key Aborted Command
Dec 4 12:56:39 tax kernel: Additional sense: Scsi parity error
Dec 4 12:56:39 tax kernel: end_request: I/O error, dev sdb, sector 95728523
Dec 4 12:56:39 tax kernel: Buffer I/O error on device sdb6, logical block 1483645
Dec 4 12:56:39 tax kernel: lost page write due to I/O error on sdb6
Dec 4 12:56:39 tax kernel: Aborting journal on device sdb6.
Dec 4 12:56:39 tax kernel: ext3_abort called.
Dec 4 12:56:39 tax kernel: EXT3-fs error (device sdb6): ext3_journal_start_sb: Detected aborted journal
Dec 4 12:56:39 tax kernel: Remounting filesystem read-only
Dec 4 12:57:09 tax kernel: ata1: command 0xca timeout, stat 0xd0 host_stat 0x61
Dec 4 12:57:09 tax kernel: ata1: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
Dec 4 12:57:09 tax kernel: ata1: status=0xd0 { Busy }
Dec 4 12:57:09 tax kernel: SCSI error : <0 0 1 0> return code = 0x8000002
Dec 4 12:57:09 tax kernel: Info fld=0x5b4b38b, Current sdb: sense key Aborted Command
Dec 4 12:57:09 tax kernel: Additional sense: Scsi parity error
Dec 4 12:57:09 tax kernel: end_request: I/O error, dev sdb, sector 41934794
Dec 4 12:57:09 tax kernel: Buffer I/O error on device sdb3, logical block 643
Dec 4 12:57:09 tax kernel: lost page write due to I/O error on sdb3
Dec 4 12:57:44 tax kernel: ata1: command 0xca timeout, stat 0xd0 host_stat 0x61
Dec 4 12:57:44 tax kernel: ata1: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
Dec 4 12:57:44 tax kernel: ata1: status=0xd0 { Busy }
Dec 4 12:57:44 tax kernel: SCSI error : <0 0 1 0> return code = 0x8000002
Dec 4 12:57:44 tax kernel: Info fld=0x5b4b38b, Current sdb: sense key Aborted Command
Dec 4 12:57:44 tax kernel: Additional sense: Scsi parity error
Dec 4 12:57:44 tax kernel: end_request: I/O error, dev sdb, sector 83864507
Dec 4 12:57:44 tax kernel: Buffer I/O error on device sdb6, logical block 643
Dec 4 12:57:44 tax kernel: lost page write due to I/O error on sdb6
Dec 4 12:57:44 tax sshd(pam_unix)[11222]: session opened for user oracle by (uid=0)
Dec 4 12:58:03 tax sshd(pam_unix)[11276]: session opened for user oracle by (uid=0)
Dec 4 12:59:25 tax kernel: ata1: command 0xc8 timeout, stat 0xd0 host_stat 0x61
Dec 4 12:59:25 tax kernel: ata1: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
Dec 4 12:59:25 tax kernel: ata1: status=0xd0 { Busy }
Dec 4 12:59:25 tax kernel: SCSI error : <0 0 1 0> return code = 0x8000002
Dec 4 12:59:25 tax kernel: Info fld=0x5b4b38b, Current sdb: sense key Aborted Command
Dec 4 12:59:25 tax kernel: Additional sense: Scsi parity error
Dec 4 12:59:25 tax kernel: end_request: I/O error, dev sdb, sector 41934794
Dec 4 12:59:25 tax kernel: EXT3-fs error (device sdb3): ext3_get_inode_loc: unable to read inode block - inode=12, block=643
Dec 4 12:59:25 tax kernel: Aborting journal on device sdb3.
Dec 4 12:59:55 tax kernel: ata1: command 0xca timeout, stat 0xd0 host_stat 0x61
Dec 4 12:59:55 tax kernel: ata1: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
Dec 4 12:59:55 tax kernel: ata1: status=0xd0 { Busy }
操作系统后台出现严重的IO错误。
但是当进入到某一个分区后,竟然无法创建文件,报错误为只读的文件系统。
[oracle@tax oracle_data2]$ touch aa
touch: cannot touch `aa': Read-only file system
操作系统加载的磁盘方式为rw,全部为读写的方式加载的。
oracle_data1]# mount
/dev/sda5 on / type ext3 (rw)
none on /proc type proc (rw)
none on /sys type sysfs (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
usbfs on /proc/bus/usb type usbfs (rw)
/dev/sda1 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
/dev/sda9 on /opt type ext2 (rw)
/dev/sdb6 on /oracle_data1 type ext3 (rw)
/dev/sdb5 on /oracle_data2 type ext3 (rw)
/dev/sdb3 on /oracle_data3 type ext3 (rw)
/dev/sdb2 on /oracle_data4 type ext3 (rw)
/dev/sdb1 on /oracle_data5 type ext3 (rw)
/dev/sda8 on /oracle_index type ext3 (rw)
/dev/sda7 on /oracle_iot type ext3 (rw)
/dev/sda6 on /oracle_tmp type ext3 (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
You have new mail in /var/spool/mail/root
既然是文件系统有问题,那么就修复文件系统吧。使用单用户模式进入系统,单用户就是在系统启动的时候启动项加入single选项。
然后使用fsck修故操作系统,修复完毕后,进入系统正常,因为数据库是自动启动的,只能启动都mount状态,说数据库文件需要恢复,于是recover database,修复完成。直接打开数据库了。
最近怎么磁盘总是出现问题呢?