环境:主库Oracle10g + ASM,备库为基于文件系统的Data Guard灾备环境
问题的发现来源于Oracle的bdump目录下不断增长的包含m000的trace文件,生产规律大概为10分钟生成或更新一次。
通过该问题处理得到的教训是,在具备data guard的物理灾备环境中,删除表空间一定不要忘记including contents and datafiles附加语句。[@more@]
trace文件具体内容如下:
/oracle/admin/dbserv/bdump/dbserv1_m000_23519.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /oracle/product/10.2/database
System name: SunOS
Node name: dbserv-db
Release: 5.10
Version: Generic_144488-04
Machine: sun4u
Instance name: dbserv1
Redo thread mounted by this instance: 1
Oracle process number: 82
Unix process pid: 23519, image: (m000)
*** ACTION NAME:(Monitor Tablespace Thresholds) 2012-11-08 22:13:46.856
*** MODULE NAME:(MMON_SLAVE) 2012-11-08 22:13:46.856
*** SERVICE NAME:(SYS$BACKGROUND) 2012-11-08 22:13:46.856
*** SESSION ID:(891.26043) 2012-11-08 22:13:46.856
MMON encountered error 959, clearing the error
通过如下SQL查询发现,两个表的表空间不一致。即在查询结果中Name列的值多于tablespace_name列的值:
SQL> select ts.name,dt.tablespace_name from dba_tablespaces dt,sys.ts$ ts
2 where dt.tablespace_name(+)=ts.name;
根据内容及网络找不出什么具体处理方法,关注点转移了trae文件最早的生产时间,查看当时的日志。
-------------------------
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
Tue Sep 18 13:15:31 CST 2012
drop tablespace gejie
Tue Sep 18 13:15:35 CST 2012
Deleted Oracle managed file +DATADG/dbserv/datafile/gejie.379.786564879
Completed: drop tablespace gejie
Tue Sep 18 13:15:49 CST 2012
drop tablespace gejie
Tue Sep 18 13:15:49 CST 2012
ORA-959 signalled during: drop tablespace gejie
...
Tue Sep 18 15:54:14 CST 2012
Thread 1 advanced to log sequence 24592 (LGWR switch)
进一步查找发现,在DG的灾备库上还是存在着主库上已被删除的数据文件:+DATADG/dbserv/datafile/abcd.379.786564879
怀疑本故障的来源为但是删除表空间时遗漏了删除表空间的附加语句including contents and datafiles。
在主库重新进行了drop tablespace abcd including contents and datafiles;
该语句执行失败,因为在主库abcd的表空间已经不存在了。因此在主库中重新创建的名称为abcd的表空间,数据文件的名称为备库上未被删除的原名,即:
create tablespace abcd datafile '+DATADG/dbserv/datafile/abcd.379.786564879'
现在主库和备库都有了相同名称的表空间及其数据文件。再次采用附加语句including contents and datafiles的方式在主库上进行表空间的删除。即:
drop tablespace abcd including contents and datafiles;
观察后发现,没十分钟生成一次trace文件的问题已经得以解决,但是如下语句查询结果上的差异还是存在。
SQL> select ts.name,dt.tablespace_name from dba_tablespaces dt,sys.ts$ ts
2 where dt.tablespace_name(+)=ts.name;
针对该语句查询结果不一致,具体什么方法来进行修正呢?这个答案还在寻找中...