解决sysman.mgmt_task_qtable ORA-600 kdsgrp1错误

信息采集工具:
1.dbv 2.rman backup check validate 3.analyze table 4.exdata cell

trc, trm这两天疯狂产生,占满了磁盘


alert 内容如下:( kdsgrp1


trc文件如下:
Trace file /u2/oracle/diag/rdbms/topprod/topprod/trace/topprod_j000_8716.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u2/oracle/product/11.2.0/dbhome_1
System name:    Linux
Node name:    ape-ap
Release:    2.6.18-164.el5
Version:    #1 SMP Tue Aug 18 15:51:48 EDT 2009
Machine:    x86_64
Instance name: topprod
Redo thread mounted by this instance: 1
Oracle process number: 18
Unix process pid: 8716, image: oracle@ape-ap (J000)


*** 2017-04-13 17:36:17.193
*** SESSION ID:(289.54143) 2017-04-13 17:36:17.193
*** CLIENT ID:() 2017-04-13 17:36:17.193
*** SERVICE NAME:(SYS$USERS) 2017-04-13 17:36:17.193
*** MODULE NAME:() 2017-04-13 17:36:17.193
*** ACTION NAME:() 2017-04-13 17:36:17.193
 
* kdsgrp1-1: *************************************************
            row 0x0080298c.1b continuation at
            0x0080298c.1b file# 2 block# 10636 slot 27 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 27 ..... nrows: 30
kdsgrp - dump CR block dba=0x0080298c
Block header dump:  0x0080298c
 Object id on Block? Y
 seg/obj: 0x3812  csc: 0x00.16aedd4f  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x802988 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0021.01e.000090e3  0x04021b55.0987.24  --U-    1  fsc 0x006a.1b927f4a
0x02   0x0024.007.00005aa8  0x04004675.0b12.0e  --U-    1  fsc 0x006b.1b927f36
bdba: 0x0080298c
data_block_dump,data header at 0x192b94064
===============
tsiz: 0x1f98
hsiz: 0x4e
pbl: 0x192b94064
     76543210
flag=--------
ntab=1
nrow=30
frre=9
fsbo=0x4e
fseo=0xedb
avsp=0x1306
tosp=0x13e5

查找file#=2 ,block#= 10636 的对象:
SELECT OWNER,
       SEGMENT_NAME,
       SEGMENT_TYPE,
       TABLESPACE_NAME
FROM   DBA_EXTENTS
WHERE  FILE_ID =2
       AND 10636 BETWEEN BLOCK_ID AND BLOCK_ID + BLOCKS - 1;



确认异常:
analyze table sysman.mgmt_task_qtable validate structure cascade online;


alert:


看来是表有坏块
dbv file='/u2/oracle/oradata/topprod/sysaux01.dbf' blocksize=8192




Pages Failing 一个数据块和一个索引块,估计是硬盘物理坏块
Pages Marked Corrupt 有7给块,逻辑坏块,应该可以通过recover命令修复。

还可以 通过rman再检查一下:
RMAN> backup validate datafile 2 ;

以下视图查询逻辑坏块:
select * from v$database_block_corruption where file#=2;

select count(*) from sysman.mgmt_task_qtable
返回27笔数据,可以查询。

block corrupted的表还可以访问,不过 查询时将跳过坏块(block#10636),也就是此时查出的数据是缺失一部分的。
可以通过以下方式取消skip corrupt:

SQL> exec dbms_repair.skip_corrupt_blocks('SYS','TT1',flags=>dbms_repair.noskip_flag);

PL/SQL procedure successfully completed.

SQL> select table_name,skip_corrupt from dba_tables where table_name='TT1';

TABLE_NAME SKIP_COR

SQL>analyze table sysman.mgmt_task_qtable validate structure cascade online;

ERROR at line 1:

ORA-01578: ORACLE data block corrupted (file # 2, block # 10636)

ORA-01110: data file 2: '/u01/xxx/xxx.DBF'


若数据库有有效的RMAN备份 ,用RMAN命令恢复
RMAN> blockrecover datafile 2 block 10636;
如果通过,重复 analyze table xxx validate structure cascade online;命令看看是否有其它的corrupted block。


曾经做过以下尝试是失败的,试图重建表来避过这个错误:
select count(*) from sysman.mgmt_task_qtable
27笔数据

create table sysman.mgmt_task_qtable_bak
as select * from sysman.mgmt_task_qtable;

alter session set current_schema=sysman;
alter table mgmt_task_qtable rename to mgmt_task_qtable_old;
alter table mgmt_task_qtable_bak rename to mgmt_task_qtable;

truncate table mgmt_task_qtable;
insert into mgmt_task_qtable select * from mgmt_task_qtable_bak
也报同样的错误,看来 核心表不给动




根据MOS的文档,发生ORA-600[kdsgrp1]错误主要有以下几种原因:
1. Bug 6772911 - OERI[12700] OERI[qertbFetchByRowID] OERI[kdsgrp1] due to bad CR rollback of INDEX block (Doc ID 6772911.8)
2. Bug 9457185 - Intermittent ORA-600 [kdsgrp1] during CR read (Doc ID 9457185.8)
3. Bug 8771916 - OERI [kdsgrp1] during CR read (Doc ID 8771916.8)
4. Getting Ora-600 [kdsgrp1] On A Select (Doc ID 429746.1)




附: ORA-01578:ORACLE data block corrupted错误的处理过程
使用 recoverblock 进行恢复

  1. set linesize 200
  2. select * from v$database_block_corruption;

  3. rman target /

  4. run{
  5.     allocate channel d1 type disk;
  6.     backup check logical validate datafile 29;
  7.     release channel d1;
  8. };

  9. select * from v$database_block_corruption;
选择一个块进行recoverblock
  1. rman target /
  2. RMAN> blockrecover datafile 29 block 23563007;
  1. set linesize 100;
  2. set pagesize 9999;
  3. col owner for a10;
  4. col segment_name for a30;
  5. col segment_type for a15;
  6. col tablespace_name for a20;
  7. alter session force parallel query parallel 32;
  8. select owner,segment_name,segemnt_type,tablespace_name from dba_extents where file_id=29 and 23563007 between block_id and block_id+blocks-1;

  9. analyze table IML.XXXXXX validate structure;
对表进行alalyze 时 仍提示ora-01578错误,而且损坏的块号仍是  23563007
rman 下使用 blockrecover corruption list 进行全部恢复,不到一秒结束,只提示使用介质恢复,仍然无法修复。

想到了另一种方法就是先设置事件SQL> alter system set events='10231 trace name context forever,level 10';
使用数据泵 跳过坏块将 相关的表导出,之后删除表再导回去,这样的问题就是可能会丢失一些数据,由于现在还只是演练环境,所以丢失也没办法,将此方法发邮件回复应用人员,准备第二天再看,但那边催得紧,回复是否可以从生产。我们回复可以,便决定从生产导出表再导入。

  1. select owner,segment_name,segemnt_type,tablespace_name from dba_extents where file_id=29 and 23563007 between block_id and block_id+blocks-1;


参考: Oracle数据块损坏的恢复实例



来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/807718/viewspace-2137235/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/807718/viewspace-2137235/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值