信息采集工具:
1.dbv 2.rman backup check validate 3.analyze table 4.exdata cell
trc, trm这两天疯狂产生,占满了磁盘
alert 内容如下:( kdsgrp1 )
trc文件如下:
Trace file /u2/oracle/diag/rdbms/topprod/topprod/trace/topprod_j000_8716.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u2/oracle/product/11.2.0/dbhome_1
System name: Linux
Node name: ape-ap
Release: 2.6.18-164.el5
Version: #1 SMP Tue Aug 18 15:51:48 EDT 2009
Machine: x86_64
Instance name: topprod
Redo thread mounted by this instance: 1
Oracle process number: 18
Unix process pid: 8716, image: oracle@ape-ap (J000)
*** 2017-04-13 17:36:17.193
*** SESSION ID:(289.54143) 2017-04-13 17:36:17.193
*** CLIENT ID:() 2017-04-13 17:36:17.193
*** SERVICE NAME:(SYS$USERS) 2017-04-13 17:36:17.193
*** MODULE NAME:() 2017-04-13 17:36:17.193
*** ACTION NAME:() 2017-04-13 17:36:17.193
* kdsgrp1-1: *************************************************
row 0x0080298c.1b continuation at
0x0080298c.1b file# 2 block# 10636 slot 27 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 27 ..... nrows: 30
kdsgrp - dump CR block dba=0x0080298c
Block header dump: 0x0080298c
Object id on Block? Y
seg/obj: 0x3812 csc: 0x00.16aedd4f itc: 2 flg: E typ: 1 - DATA
brn: 0 bdba: 0x802988 ver: 0x01 opc: 0
inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0021.01e.000090e3 0x04021b55.0987.24 --U- 1 fsc 0x006a.1b927f4a
0x02 0x0024.007.00005aa8 0x04004675.0b12.0e --U- 1 fsc 0x006b.1b927f36
bdba: 0x0080298c
data_block_dump,data header at 0x192b94064
===============
tsiz: 0x1f98
hsiz: 0x4e
pbl: 0x192b94064
76543210
flag=--------
ntab=1
nrow=30
frre=9
fsbo=0x4e
fseo=0xedb
avsp=0x1306
tosp=0x13e5
查找file#=2 ,block#= 10636 的对象:
SELECT OWNER,
SEGMENT_NAME,
SEGMENT_TYPE,
TABLESPACE_NAME
FROM DBA_EXTENTS
WHERE FILE_ID =2
AND 10636 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
确认异常:
alert:
看来是表有坏块
dbv file='/u2/oracle/oradata/topprod/sysaux01.dbf' blocksize=8192
Pages Failing 一个数据块和一个索引块,估计是硬盘物理坏块
Pages Marked Corrupt 有7给块,逻辑坏块,应该可以通过recover命令修复。
还可以 通过rman再检查一下:
RMAN> backup validate datafile 2 ;
以下视图查询逻辑坏块:
select * from v$database_block_corruption where file#=2;
select count(*) from sysman.mgmt_task_qtable
返回27笔数据,可以查询。
block corrupted的表还可以访问,不过 查询时将跳过坏块(block#10636),也就是此时查出的数据是缺失一部分的。
可以通过以下方式取消skip corrupt:
若数据库有有效的RMAN备份 ,用RMAN命令恢复
RMAN> blockrecover datafile 2 block 10636;
如果通过,重复 analyze table xxx validate structure cascade online;命令看看是否有其它的corrupted block。
曾经做过以下尝试是失败的,试图重建表来避过这个错误:
select count(*) from sysman.mgmt_task_qtable
27笔数据
create table sysman.mgmt_task_qtable_bak
as select * from sysman.mgmt_task_qtable;
alter session set current_schema=sysman;
alter table mgmt_task_qtable rename to mgmt_task_qtable_old;
alter table mgmt_task_qtable_bak rename to mgmt_task_qtable;
truncate table mgmt_task_qtable;
insert into mgmt_task_qtable select * from mgmt_task_qtable_bak
也报同样的错误,看来 核心表不给动
根据MOS的文档,发生ORA-600[kdsgrp1]错误主要有以下几种原因:
1. Bug 6772911 - OERI[12700] OERI[qertbFetchByRowID] OERI[kdsgrp1] due to bad CR rollback of INDEX block (Doc ID 6772911.8)
2. Bug 9457185 - Intermittent ORA-600 [kdsgrp1] during CR read (Doc ID 9457185.8)
3. Bug 8771916 - OERI [kdsgrp1] during CR read (Doc ID 8771916.8)
4. Getting Ora-600 [kdsgrp1] On A Select (Doc ID 429746.1)
附: ORA-01578:ORACLE data block corrupted错误的处理过程
使用 recoverblock 进行恢复
选择一个块进行recoverblock
对表进行alalyze 时 仍提示ora-01578错误,而且损坏的块号仍是 23563007
rman 下使用 blockrecover corruption list 进行全部恢复,不到一秒结束,只提示使用介质恢复,仍然无法修复。
想到了另一种方法就是先设置事件SQL> alter system set events='10231 trace name context forever,level 10';
使用数据泵 跳过坏块将 相关的表导出,之后删除表再导回去,这样的问题就是可能会丢失一些数据,由于现在还只是演练环境,所以丢失也没办法,将此方法发邮件回复应用人员,准备第二天再看,但那边催得紧,回复是否可以从生产。我们回复可以,便决定从生产导出表再导入。
参考: Oracle数据块损坏的恢复实例
1.dbv 2.rman backup check validate 3.analyze table 4.exdata cell
trc, trm这两天疯狂产生,占满了磁盘
alert 内容如下:( kdsgrp1 )
trc文件如下:
Trace file /u2/oracle/diag/rdbms/topprod/topprod/trace/topprod_j000_8716.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u2/oracle/product/11.2.0/dbhome_1
System name: Linux
Node name: ape-ap
Release: 2.6.18-164.el5
Version: #1 SMP Tue Aug 18 15:51:48 EDT 2009
Machine: x86_64
Instance name: topprod
Redo thread mounted by this instance: 1
Oracle process number: 18
Unix process pid: 8716, image: oracle@ape-ap (J000)
*** 2017-04-13 17:36:17.193
*** SESSION ID:(289.54143) 2017-04-13 17:36:17.193
*** CLIENT ID:() 2017-04-13 17:36:17.193
*** SERVICE NAME:(SYS$USERS) 2017-04-13 17:36:17.193
*** MODULE NAME:() 2017-04-13 17:36:17.193
*** ACTION NAME:() 2017-04-13 17:36:17.193
* kdsgrp1-1: *************************************************
row 0x0080298c.1b continuation at
0x0080298c.1b file# 2 block# 10636 slot 27 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 27 ..... nrows: 30
kdsgrp - dump CR block dba=0x0080298c
Block header dump: 0x0080298c
Object id on Block? Y
seg/obj: 0x3812 csc: 0x00.16aedd4f itc: 2 flg: E typ: 1 - DATA
brn: 0 bdba: 0x802988 ver: 0x01 opc: 0
inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0021.01e.000090e3 0x04021b55.0987.24 --U- 1 fsc 0x006a.1b927f4a
0x02 0x0024.007.00005aa8 0x04004675.0b12.0e --U- 1 fsc 0x006b.1b927f36
bdba: 0x0080298c
data_block_dump,data header at 0x192b94064
===============
tsiz: 0x1f98
hsiz: 0x4e
pbl: 0x192b94064
76543210
flag=--------
ntab=1
nrow=30
frre=9
fsbo=0x4e
fseo=0xedb
avsp=0x1306
tosp=0x13e5
查找file#=2 ,block#= 10636 的对象:
SELECT OWNER,
SEGMENT_NAME,
SEGMENT_TYPE,
TABLESPACE_NAME
FROM DBA_EXTENTS
WHERE FILE_ID =2
AND 10636 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;
确认异常:
analyze table sysman.mgmt_task_qtable validate structure cascade online;
alert:
看来是表有坏块
dbv file='/u2/oracle/oradata/topprod/sysaux01.dbf' blocksize=8192
Pages Failing 一个数据块和一个索引块,估计是硬盘物理坏块
Pages Marked Corrupt 有7给块,逻辑坏块,应该可以通过recover命令修复。
还可以 通过rman再检查一下:
RMAN> backup validate datafile 2 ;
以下视图查询逻辑坏块:
select * from v$database_block_corruption where file#=2;
select count(*) from sysman.mgmt_task_qtable
返回27笔数据,可以查询。
block corrupted的表还可以访问,不过 查询时将跳过坏块(block#10636),也就是此时查出的数据是缺失一部分的。
可以通过以下方式取消skip corrupt:
SQL> exec dbms_repair.skip_corrupt_blocks('SYS','TT1',flags=>dbms_repair.noskip_flag);
PL/SQL procedure successfully completed.
SQL> select table_name,skip_corrupt from dba_tables where table_name='TT1';
TABLE_NAME SKIP_COR
SQL>analyze table sysman.mgmt_task_qtable validate structure cascade online;ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 2, block # 10636)
ORA-01110: data file 2: '/u01/xxx/xxx.DBF'
若数据库有有效的RMAN备份 ,用RMAN命令恢复
RMAN> blockrecover datafile 2 block 10636;
如果通过,重复 analyze table xxx validate structure cascade online;命令看看是否有其它的corrupted block。
曾经做过以下尝试是失败的,试图重建表来避过这个错误:
select count(*) from sysman.mgmt_task_qtable
27笔数据
create table sysman.mgmt_task_qtable_bak
as select * from sysman.mgmt_task_qtable;
alter session set current_schema=sysman;
alter table mgmt_task_qtable rename to mgmt_task_qtable_old;
alter table mgmt_task_qtable_bak rename to mgmt_task_qtable;
truncate table mgmt_task_qtable;
insert into mgmt_task_qtable select * from mgmt_task_qtable_bak
也报同样的错误,看来 核心表不给动
根据MOS的文档,发生ORA-600[kdsgrp1]错误主要有以下几种原因:
1. Bug 6772911 - OERI[12700] OERI[qertbFetchByRowID] OERI[kdsgrp1] due to bad CR rollback of INDEX block (Doc ID 6772911.8)
2. Bug 9457185 - Intermittent ORA-600 [kdsgrp1] during CR read (Doc ID 9457185.8)
3. Bug 8771916 - OERI [kdsgrp1] during CR read (Doc ID 8771916.8)
4. Getting Ora-600 [kdsgrp1] On A Select (Doc ID 429746.1)
附: ORA-01578:ORACLE data block corrupted错误的处理过程
使用 recoverblock 进行恢复
- set linesize 200
- select * from v$database_block_corruption;
-
- rman target /
-
- run{
- allocate channel d1 type disk;
- backup check logical validate datafile 29;
- release channel d1;
- };
-
- select * from v$database_block_corruption;
- rman target /
- RMAN> blockrecover datafile 29 block 23563007;
- set linesize 100;
- set pagesize 9999;
- col owner for a10;
- col segment_name for a30;
- col segment_type for a15;
- col tablespace_name for a20;
- alter session force parallel query parallel 32;
- select owner,segment_name,segemnt_type,tablespace_name from dba_extents where file_id=29 and 23563007 between block_id and block_id+blocks-1;
-
- analyze table IML.XXXXXX validate structure;
rman 下使用 blockrecover corruption list 进行全部恢复,不到一秒结束,只提示使用介质恢复,仍然无法修复。
想到了另一种方法就是先设置事件SQL> alter system set events='10231 trace name context forever,level 10';
使用数据泵 跳过坏块将 相关的表导出,之后删除表再导回去,这样的问题就是可能会丢失一些数据,由于现在还只是演练环境,所以丢失也没办法,将此方法发邮件回复应用人员,准备第二天再看,但那边催得紧,回复是否可以从生产。我们回复可以,便决定从生产导出表再导入。
- select owner,segment_name,segemnt_type,tablespace_name from dba_extents where file_id=29 and 23563007 between block_id and block_id+blocks-1;
参考: Oracle数据块损坏的恢复实例
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/807718/viewspace-2137235/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/807718/viewspace-2137235/