在一次严重故障中
控制文件重建,强行拉起库,前面的问题解决了(HA引发的血案,涉及他人误操作,具体略)
数据库和应用起来后却遇到大量cr块slot notfound的问题,时间正是业务高峰的白天,一阵头皮发麻,具体报错内容:
alert:
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
相关trace
tpioadb1$ls -trl *.trc
-rw-r----- 1 oracle oinstall 181401269 Oct 14 12:27 oa_ora_17825870.trc
-rw-r----- 1 oracle oinstall 181462560 Oct 14 12:28 oa_ora_49872920.trc
-rw-r----- 1 oracle oinstall 181776046 Oct 14 12:29 oa_ora_52166770.trc
-rw-r----- 1 oracle oinstall 182010875 Oct 14 12:29 oa_ora_47317200.trc
-rw-r----- 1 oracle oinstall 182080391 Oct 14 12:30 oa_ora_198714.trc
-rw-r----- 1 oracle oinstall 182205907 Oct 14 12:30 oa_ora_47448310.trc
-rw-r----- 1 oracle oinstall 182159439 Oct 14 12:31 oa_ora_48234660.trc
-rw-r----- 1 oracle oinstall 182186772 Oct 14 12:31 oa_ora_41617502.trc
-rw-r----- 1 oracle oinstall 182186742 Oct 14 12:31 oa_ora_48824328.trc
-rw-r----- 1 oracle oinstall 182186771 Oct 14 12:31 oa_ora_52166772.trc
-rw-r----- 1 oracle oinstall 182258339 Oct 14 12:32 oa_ora_39913614.trc
-rw-r----- 1 oracle oinstall 182258369 Oct 14 12:32 oa_ora_30148840.trc
-rw-r----- 1 oracle oinstall 182375997 Oct 14 12:32 oa_ora_17825876.trc
-rw-r----- 1 oracle oinstall 182408523 Oct 14 12:32 oa_ora_66256924.trc
-rw-r----- 1 oracle oinstall 182408580 Oct 14 12:32 oa_ora_20121758.trc
-rw-r----- 1 oracle oinstall 182490385 Oct 14 12:32 oa_ora_6424806.trc
-rw-r----- 1 oracle oinstall 182490387 Oct 14 12:33 oa_ora_15272010.trc
-rw-r----- 1 oracle oinstall 182490356 Oct 14 12:33 oa_ora_4851758.trc
-rw-r----- 1 oracle oinstall 182516313 Oct 14 12:33 oa_ora_16713804.trc
-rw-r----- 1 oracle oinstall 182594432 Oct 14 12:33 oa_ora_28510210.trc
-rw-r----- 1 oracle oinstall 182616536 Oct 14 12:33 oa_ora_30015720.trc
-rw-r----- 1 oracle oinstall 182700158 Oct 14 12:33 oa_ora_25102374.trc
-rw-r----- 1 oracle oinstall 182734746 Oct 14 12:34 oa_ora_28641506.trc
-rw-r----- 1 oracle oinstall 182734715 Oct 14 12:34 oa_ora_3016860.trc
-rw-r----- 1 oracle oinstall 182841440 Oct 14 12:34 oa_ora_52429006.trc
-rw-r----- 1 oracle oinstall 182865907 Oct 14 12:34 oa_ora_47185978.trc
-rw-r----- 1 oracle oinstall 182865909 Oct 14 12:34 oa_ora_57212934.trc
-rw-r----- 1 oracle oinstall 182865907 Oct 14 12:34 oa_ora_23267338.trc
-rw-r----- 1 oracle oinstall 182865938 Oct 14 12:34 oa_ora_14354628.trc
-rw-r----- 1 oracle oinstall 70326197 Oct 14 12:34 oa_ora_7276572.trc
-rw-r----- 1 oracle oinstall 152186229 Oct 14 12:34 oa_ora_52953126.trc
-rw-r----- 1 oracle oinstall 96766948 Oct 14 12:34 oa_ora_47448312.trc
-rw-r----- 1 oracle oinstall 25950203 Oct 14 12:34 oa_ora_44040304.trc
-rw-r----- 1 oracle oinstall 124525 Oct 14 12:34 oa_ora_30015724.trc
-rw-r----- 1 oracle oinstall 11891981 Oct 14 12:34 oa_ora_23332896.trc
-rw-r----- 1 oracle oinstall 67996842 Oct 14 12:34 oa_ora_20580500.trc
-rw-r----- 1 oracle oinstall 50813235 Oct 14 12:34 oa_ora_17301742.trc
-rw-r----- 1 oracle oinstall 171017368 Oct 14 12:34 oa_ora_14811386.trc
tpioadb1$more oa_ora_52429006.trc
Trace file /home/oracle/app/diag/rdbms/oa/oa/trace/oa_ora_52429006.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORACLE_HOME = /home/oracle/app/product/11.2.0/db_1
System name: AIX
Node name: tpioadb1
Release: 1
Version: 6
Machine: 00CB77B54C00
Instance name: oa
Redo thread mounted by this instance: 1
Oracle process number: 169
Unix process pid: 52429006, image: oracle@tpioadb1
*** 2014-10-14 12:33:21.941
*** SESSION ID:(1289.36435) 2014-10-14 12:33:21.941
*** CLIENT ID:() 2014-10-14 12:33:21.941
*** SERVICE NAME:(SYS$USERS) 2014-10-14 12:33:21.941
*** MODULE NAME:(JDBC Thin Client) 2014-10-14 12:33:21.941
*** ACTION NAME:() 2014-10-14 12:33:21.941
* kdsgrp1-1: *************************************************
row 0x024e73b1.97 continuation at
0x024e73b1.97 file# 9 block# 947121 slot 151 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 151 ..... nrows: 216
kdsgrp - dump CR block dba=0x024e73b1
Block header dump: 0x024e73b1
Object id on Block? Y
seg/obj: 0x1ce1e csc: 0x638.28d23ba0 itc: 2 flg: E typ: 1 - DATA
brn: 0 bdba: 0x24e7281 ver: 0x01 opc: 0
inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x003b.012.0000246a 0x00c048a5.011d.32 --U- 1 fsc 0x0016.28d23ba1
0x02 0x0002.006.00021b71 0x00c03c7f.1478.18 C--- 0 scn 0x0638.28d21c9c
bdba: 0x024e73b1
data_block_dump,data header at 0x70001028890e064
===============
通过语句找到该对象和块:
Select owner, segment_name, segment_type, partition_name,tablespace_name
From dba_extents
Where relative_fno = 9
And 947121 between block_id and (block_id+blocks-1);
dump该块,发现该块里面只有1个table的216行,由于系统已经带病运行,且无可用备份,无停机时间,意味着最多可能丢失这个表的216行数据。
dump内容如下
tpioadb1$more oa_ora_38142130.trc
Trace file /home/oracle/app/diag/rdbms/oa/oa/trace/oa_ora_38142130.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORACLE_HOME = /home/oracle/app/product/11.2.0/db_1
System name: AIX
Node name: tpioadb1
Release: 1
Version: 6
Machine: 00CB77B54C00
Instance name: oa
Redo thread mounted by this instance: 1
Oracle process number: 177
Unix process pid: 38142130, image: oracle@tpioadb1
*** 2014-10-14 14:39:49.105
*** SESSION ID:(167.64629) 2014-10-14 14:39:49.105
*** CLIENT ID:() 2014-10-14 14:39:49.105
*** SERVICE NAME:(SYS$USERS) 2014-10-14 14:39:49.105
*** MODULE NAME:(JDBC Thin Client) 2014-10-14 14:39:49.105
*** ACTION NAME:() 2014-10-14 14:39:49.105
* kdsgrp1-1: *************************************************
row 0x024e73b1.97 continuation at
0x024e73b1.97 file# 9 block# 947121 slot 151 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 151 ..... nrows: 216
kdsgrp - dump CR block dba=0x024e73b1
Block header dump: 0x024e73b1
Object id on Block? Y
seg/obj: 0x1ce1e csc: 0x638.29bb4112 itc: 2 flg: E typ: 1 - DATA
brn: 0 bdba: 0x24e7281 ver: 0x01 opc: 0
inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x002d.01a.0000a6d7 0x04c08774.04a9.29 --U- 1 fsc 0x0017.29bb4115
0x02 0x002f.007.0000945e 0x00c02f56.0412.0d C--- 0 scn 0x0638.29bb08d6
bdba: 0x024e73b1
data_block_dump,data header at 0x7000105876a6064
===============
tsiz: 0x1f98
hsiz: 0x1c2
pbl: 0x7000105876a6064
76543210
flag=--------
ntab=1
nrow=216
frre=29
fsbo=0x1c2
fseo=0x7d4
avsp=0xbe9
tosp=0xc02
0xe:pti[0] nrow=216 offs=0
后通过业务分析,找到白天有dml的行,通过create as select查询者部分行,发现会话hang,报错依旧,采用逐行insert的方式,最终定位出有问题的只有一行,通过dump的行数据,将关键字段转化出来给业务确认,确认为该行为无效行,剔除该行后,安排在下班后维护窗口内重建该表,问题解决,业务未受到影响。
数据库和应用起来后却遇到大量cr块slot notfound的问题,时间正是业务高峰的白天,一阵头皮发麻,具体报错内容:
alert:
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
相关trace
tpioadb1$ls -trl *.trc
-rw-r----- 1 oracle oinstall 181401269 Oct 14 12:27 oa_ora_17825870.trc
-rw-r----- 1 oracle oinstall 181462560 Oct 14 12:28 oa_ora_49872920.trc
-rw-r----- 1 oracle oinstall 181776046 Oct 14 12:29 oa_ora_52166770.trc
-rw-r----- 1 oracle oinstall 182010875 Oct 14 12:29 oa_ora_47317200.trc
-rw-r----- 1 oracle oinstall 182080391 Oct 14 12:30 oa_ora_198714.trc
-rw-r----- 1 oracle oinstall 182205907 Oct 14 12:30 oa_ora_47448310.trc
-rw-r----- 1 oracle oinstall 182159439 Oct 14 12:31 oa_ora_48234660.trc
-rw-r----- 1 oracle oinstall 182186772 Oct 14 12:31 oa_ora_41617502.trc
-rw-r----- 1 oracle oinstall 182186742 Oct 14 12:31 oa_ora_48824328.trc
-rw-r----- 1 oracle oinstall 182186771 Oct 14 12:31 oa_ora_52166772.trc
-rw-r----- 1 oracle oinstall 182258339 Oct 14 12:32 oa_ora_39913614.trc
-rw-r----- 1 oracle oinstall 182258369 Oct 14 12:32 oa_ora_30148840.trc
-rw-r----- 1 oracle oinstall 182375997 Oct 14 12:32 oa_ora_17825876.trc
-rw-r----- 1 oracle oinstall 182408523 Oct 14 12:32 oa_ora_66256924.trc
-rw-r----- 1 oracle oinstall 182408580 Oct 14 12:32 oa_ora_20121758.trc
-rw-r----- 1 oracle oinstall 182490385 Oct 14 12:32 oa_ora_6424806.trc
-rw-r----- 1 oracle oinstall 182490387 Oct 14 12:33 oa_ora_15272010.trc
-rw-r----- 1 oracle oinstall 182490356 Oct 14 12:33 oa_ora_4851758.trc
-rw-r----- 1 oracle oinstall 182516313 Oct 14 12:33 oa_ora_16713804.trc
-rw-r----- 1 oracle oinstall 182594432 Oct 14 12:33 oa_ora_28510210.trc
-rw-r----- 1 oracle oinstall 182616536 Oct 14 12:33 oa_ora_30015720.trc
-rw-r----- 1 oracle oinstall 182700158 Oct 14 12:33 oa_ora_25102374.trc
-rw-r----- 1 oracle oinstall 182734746 Oct 14 12:34 oa_ora_28641506.trc
-rw-r----- 1 oracle oinstall 182734715 Oct 14 12:34 oa_ora_3016860.trc
-rw-r----- 1 oracle oinstall 182841440 Oct 14 12:34 oa_ora_52429006.trc
-rw-r----- 1 oracle oinstall 182865907 Oct 14 12:34 oa_ora_47185978.trc
-rw-r----- 1 oracle oinstall 182865909 Oct 14 12:34 oa_ora_57212934.trc
-rw-r----- 1 oracle oinstall 182865907 Oct 14 12:34 oa_ora_23267338.trc
-rw-r----- 1 oracle oinstall 182865938 Oct 14 12:34 oa_ora_14354628.trc
-rw-r----- 1 oracle oinstall 70326197 Oct 14 12:34 oa_ora_7276572.trc
-rw-r----- 1 oracle oinstall 152186229 Oct 14 12:34 oa_ora_52953126.trc
-rw-r----- 1 oracle oinstall 96766948 Oct 14 12:34 oa_ora_47448312.trc
-rw-r----- 1 oracle oinstall 25950203 Oct 14 12:34 oa_ora_44040304.trc
-rw-r----- 1 oracle oinstall 124525 Oct 14 12:34 oa_ora_30015724.trc
-rw-r----- 1 oracle oinstall 11891981 Oct 14 12:34 oa_ora_23332896.trc
-rw-r----- 1 oracle oinstall 67996842 Oct 14 12:34 oa_ora_20580500.trc
-rw-r----- 1 oracle oinstall 50813235 Oct 14 12:34 oa_ora_17301742.trc
-rw-r----- 1 oracle oinstall 171017368 Oct 14 12:34 oa_ora_14811386.trc
tpioadb1$more oa_ora_52429006.trc
Trace file /home/oracle/app/diag/rdbms/oa/oa/trace/oa_ora_52429006.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORACLE_HOME = /home/oracle/app/product/11.2.0/db_1
System name: AIX
Node name: tpioadb1
Release: 1
Version: 6
Machine: 00CB77B54C00
Instance name: oa
Redo thread mounted by this instance: 1
Oracle process number: 169
Unix process pid: 52429006, image: oracle@tpioadb1
*** 2014-10-14 12:33:21.941
*** SESSION ID:(1289.36435) 2014-10-14 12:33:21.941
*** CLIENT ID:() 2014-10-14 12:33:21.941
*** SERVICE NAME:(SYS$USERS) 2014-10-14 12:33:21.941
*** MODULE NAME:(JDBC Thin Client) 2014-10-14 12:33:21.941
*** ACTION NAME:() 2014-10-14 12:33:21.941
* kdsgrp1-1: *************************************************
row 0x024e73b1.97 continuation at
0x024e73b1.97 file# 9 block# 947121 slot 151 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 151 ..... nrows: 216
kdsgrp - dump CR block dba=0x024e73b1
Block header dump: 0x024e73b1
Object id on Block? Y
seg/obj: 0x1ce1e csc: 0x638.28d23ba0 itc: 2 flg: E typ: 1 - DATA
brn: 0 bdba: 0x24e7281 ver: 0x01 opc: 0
inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x003b.012.0000246a 0x00c048a5.011d.32 --U- 1 fsc 0x0016.28d23ba1
0x02 0x0002.006.00021b71 0x00c03c7f.1478.18 C--- 0 scn 0x0638.28d21c9c
bdba: 0x024e73b1
data_block_dump,data header at 0x70001028890e064
===============
Select owner, segment_name, segment_type, partition_name,tablespace_name
From dba_extents
Where relative_fno = 9
And 947121 between block_id and (block_id+blocks-1);
dump该块,发现该块里面只有1个table的216行,由于系统已经带病运行,且无可用备份,无停机时间,意味着最多可能丢失这个表的216行数据。
dump内容如下
tpioadb1$more oa_ora_38142130.trc
Trace file /home/oracle/app/diag/rdbms/oa/oa/trace/oa_ora_38142130.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORACLE_HOME = /home/oracle/app/product/11.2.0/db_1
System name: AIX
Node name: tpioadb1
Release: 1
Version: 6
Machine: 00CB77B54C00
Instance name: oa
Redo thread mounted by this instance: 1
Oracle process number: 177
Unix process pid: 38142130, image: oracle@tpioadb1
*** 2014-10-14 14:39:49.105
*** SESSION ID:(167.64629) 2014-10-14 14:39:49.105
*** CLIENT ID:() 2014-10-14 14:39:49.105
*** SERVICE NAME:(SYS$USERS) 2014-10-14 14:39:49.105
*** MODULE NAME:(JDBC Thin Client) 2014-10-14 14:39:49.105
*** ACTION NAME:() 2014-10-14 14:39:49.105
* kdsgrp1-1: *************************************************
row 0x024e73b1.97 continuation at
0x024e73b1.97 file# 9 block# 947121 slot 151 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 151 ..... nrows: 216
kdsgrp - dump CR block dba=0x024e73b1
Block header dump: 0x024e73b1
Object id on Block? Y
seg/obj: 0x1ce1e csc: 0x638.29bb4112 itc: 2 flg: E typ: 1 - DATA
brn: 0 bdba: 0x24e7281 ver: 0x01 opc: 0
inc: 0 exflg: 0
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x002d.01a.0000a6d7 0x04c08774.04a9.29 --U- 1 fsc 0x0017.29bb4115
0x02 0x002f.007.0000945e 0x00c02f56.0412.0d C--- 0 scn 0x0638.29bb08d6
bdba: 0x024e73b1
data_block_dump,data header at 0x7000105876a6064
===============
tsiz: 0x1f98
hsiz: 0x1c2
pbl: 0x7000105876a6064
76543210
flag=--------
ntab=1
nrow=216
frre=29
fsbo=0x1c2
fseo=0x7d4
avsp=0xbe9
tosp=0xc02
0xe:pti[0] nrow=216 offs=0
后通过业务分析,找到白天有dml的行,通过create as select查询者部分行,发现会话hang,报错依旧,采用逐行insert的方式,最终定位出有问题的只有一行,通过dump的行数据,将关键字段转化出来给业务确认,确认为该行为无效行,剔除该行后,安排在下班后维护窗口内重建该表,问题解决,业务未受到影响。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/16179598/viewspace-1329351/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/16179598/viewspace-1329351/