Oracle出现各种错误很常见,但是直接Segmentation fault,还真是不常见。
数据库版本10203 for Linux x86-64。
公司的其他部分一个数据库没有响应,让我帮忙检查一下。登陆数据库后简单检查后,发现归档目录满了,导致所有写操作都必须等待归档的完成。
检查发现整个$ORACLE_BASE目录已经没有空间了。
[oracle@sqdata backupset]$ env|grep ORACLE
ORACLE_SID=bjsqdb
ORACLE_BASE=/data/oracle
ORACLE_HOME=/data/oracle/product/10.2.0/db_1
[oracle@sqdata ~]$ df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 24797412 560700 22956736 3% /
/dev/sda1 194442 15937 168466 9% /boot
/dev/sda9 313414864 297188540 48872 100% /data
tmpfs 4064128 0 4064128 0% /dev/shm
/dev/sda5 17856888 176896 16758268 2% /opt
/dev/sda8 9920592 153884 9254640 2% /tmp
/dev/sda7 11904588 4089392 7200712 37% /usr
/dev/sda3 19840924 334428 18482356 2% /var
打算利用RMAN清除一些备份,没想到碰到了错误:
[oracle@sqdata backupset]$ rman target /
Recovery Manager: Release 10.2.0.3.0 - Production on Thu Dec 9 09:25:50 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
connected to target database: BJSQDB (DBID=657759334)
RMAN> delete obsolete;
using target database control file instead of recovery catalog
Segmentation fault
这个问题很难确定,首先因为空间被占满,因此没有任何的core文件产生。而且这个错误不是每次都能重现:
[oracle@sqdata backupset]$ df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 24797412 560700 22956736 3% /
/dev/sda1 194442 15937 168466 9% /boot
/dev/sda9 313414864 297238912 0 100% /data
tmpfs 4064128 0 4064128 0% /dev/shm
/dev/sda5 17856888 176896 16758268 2% /opt
/dev/sda8 9920592 153884 9254640 2% /tmp
/dev/sda7 11904588 4089392 7200712 37% /usr
/dev/sda3 19840924 334468 18482316 2% /var
[oracle@sqdata backupset]$ rman target /
Recovery Manager: Release 10.2.0.3.0 - Production on Thu Dec 9 10:00:43 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
connected to target database: BJSQDB (DBID=657759334)
RMAN> delete obsolete;
using target database control file instead of recovery catalog
RMAN retention policy will be applied to the command
RMAN retention policy is set to recovery window of 21 days
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=424 devtype=DISK
no obsolete backups found
而且问题不是空间满了造成的,刚才报错的时候,/data目录还有剩余空间,而现在已经没有空间,可是rman却执行成功。
检查了metalink,发现9.2上有一些Segmentation fault的bug,不过在10.1中都被fixed了,没有看到10.2上有类似的情况产生。
好在这个问题只是偶尔出现,对系统使用没有太大的影响。
检查了上次自动备份的log文件:
Recovery Manager: Release 10.2.0.3.0 - Production on Thu Dec 9 00:30:03 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
RMAN-00571: ================================================connected to target database: BJSQDB (DBID=657759334)
RMAN> 2> 3> 4> 5> 6> 7>
using target database control file instead of recovery catalog
allocated channel: d1
channel d1: sid=481 devtype=DISK
sql statement: alter system archive log current
released channel: d1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of sql command on default channel at 12/09/2010 00:30:08
RMAN-11003: failure during parse/execution of SQL statement: alter system archive log current
ORA-16014: log 3 sequence# 1674 not archived, no available destinations
ORA-00312: online log 3 thread 1: '/data/oracle/oradata/bjsqdb/REDOC01.LOG'
ORA-00312: online log 3 thread 1: '/data/oracle/oradata/bjsqdb/REDOC02.LOG'
ORA-00312: online log 3 thread 1: '/data/oracle/oradata/bjsqdb/REDOC03.LOG'
RMAN>
Recovery Manager complete.
---rman_archivelog and controlfile end---
---rman delete obsolete backupset---
Recovery Manager: Release 10.2.0.3.0 - Production on Thu Dec 9 00:30:08 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04005: error from target database:
ORA-09945: Unable to initialize the audit trail file
Linux-x86_64 Error: 28: No space left on device
---rman delete obsolete backupset end---
---ftp file to 172.0.2.85---
---ftp end---
由于空间不足,导致日志切换失败,而随后的操作在连接数据库的时候就因为没有空间而导致audit trail文件写失败,从而连接rman报错。
现在只能怀疑是最后一次连接Rman是状态不正常,导致这次Segmentation fault。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/4227/viewspace-681468/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/4227/viewspace-681468/