概述
最近发现Oracle数据库其中一个节点告警日志频繁提示:Transaction recovery: lock conflict caught and ignored,还是有点棘手的,下面记录下问题排查的过程:
![7ab251a4f7934b21d50c01ef60731bfa.png](https://img-blog.csdnimg.cn/img_convert/7ab251a4f7934b21d50c01ef60731bfa.png)
环境:Oracle11.2.0.1 RAC 2节点 AIX 6.1
处理思路
1.找出该对象
2.查看对象的具体状态。
1、通过该对象所使用的undo段
select b.name useg, b.inst# instid, b.status$ status, a.ktuxeusn xid_usn, a.ktuxeslt xid_slot, a.ktuxesqn xid_seq, a.ktuxesiz undoblocks, a.ktuxesta txstatus from x$ktuxe a, undo$ b where a.ktuxecfl like '%DEAD%'and a.ktuxeusn = b.us#;
![3972a24e02d949f82c2f718a4a9cc180.png](https://img-blog.csdnimg.cn/img_convert/3972a24e02d949f82c2f718a4a9cc180.png)
2、通过dump undo header 寻找事务使用的undo segment
ALTER SYSTEM DUMP UNDO HEADER '_SYSSMU8_1131636851$';--查看DUMP文件位置:select * from v$diag_info WHERE name='Default Trace File';
![16fe4486b91e76c21e19b77a9bf8e935.png](https://img-blog.csdnimg.cn/img_convert/16fe4486b91e76c21e19b77a9bf8e935.png)
3、分析dump文件
导出dump文件进行分析:看state为10就知道哪个在活动了,9是正常的
![0868cd79c0a5a358b47bbf03713d1a78.png](https://img-blog.csdnimg.cn/img_convert/0868cd79c0a5a358b47bbf03713d1a78.png)
说明:
- UBA: Undo block address
- RBA: Redo block address
- Dba: Data block address
- Rdba: Root dba
- Xid: Transaction ID
- ITL: Interested Transaction List 保存在数据块的头部(事务信息部分),包含XID,UBA,LCK,FLG等重要信息
![f1aa4eadc4ca37909652a666bdaea6f9.png](https://img-blog.csdnimg.cn/img_convert/f1aa4eadc4ca37909652a666bdaea6f9.png)
usn: Undo segment number. usn表示XID_USN
usn.index.wrap# gives transaction id. wrap#表示XID_SEQ
--查看undo块地址对应十进制select to_number('0c872e87','xxxxxxxxxxxxx') from dual;-->210185863
![cd4a6c8993b92b4f299962e5471039c7.png](https://img-blog.csdnimg.cn/img_convert/cd4a6c8993b92b4f299962e5471039c7.png)
4、查看dump对应的undo块
--fileID:select DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(210185863) from dual;--blockID:select DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(210185863) from dual;--dump undo 块SQL> alter system dump datafile 50 block 470663;SQL> select * from v$diag_info WHERE name='Default Trace File';
![6d495101b13840f0be4ac9c43aa5fc9e.png](https://img-blog.csdnimg.cn/img_convert/6d495101b13840f0be4ac9c43aa5fc9e.png)
5、分析dump的undo块
把trace文件下载下来进行分析:
![9a7089ad1376ae92c5bf83b3c0d300e1.png](https://img-blog.csdnimg.cn/img_convert/9a7089ad1376ae92c5bf83b3c0d300e1.png)
- irb points to last UNDO RECORD in UNDO block. irb指向undo块里面最后一个undo记录。
- rci points to previous UNDO RECORD. if rci=0, it's the first UNDO RECORD.rci指向前一个undo记录,如果值为0则为undo的第一个记录。
- Recovery operation starts from irb and chain is followed by rci until rci is zero.恢复操作使用irb开始,并往前移动,直至rci为0。
--The transaction of 008.1d.229f824 starts recovery from UNDO RECORD of 0x37
6、分析undo record
*-----------------------------* Rec #0x1 slt: 0x1d objn: 178516(0x0002b954) objd: 178516 tblspc: 6(0x00000006)* Layer: 11 (Row) opc: 1 rci 0x00 Undo type: Regular undo Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x0c872e86*-----------------------------KDO undo record:KTB Redo op: 0x02 ver: 0x01 compat bit: 4 (post-11) padding: 1op: C uba: 0x0c872e86.de93.2dKDO Op code: IRP row dependencies Disabled xtype: XA flags: 0x00000000 bdba: 0x01a72e1a hdba: 0x01851203itli: 1 ispac: 0 maxfr: 4858tabn: 0 slot: 14(0xe) size/delt: 292fb: --H-FL-- lb: 0x0 cc: 49null:01234567890123456789012345678901234567890123456789012345678901234567890123456789---------NNN--NN------NNNNNNN---NNNNNN-NNNNNNN---col 0: [47]..........................*-----------------------------* Rec #0x2 slt: 0x1d objn: 178517(0x0002b955) objd: 178517 tblspc: 7(0x00000007)* Layer: 10 (Index) opc: 22 rci 0x01 Undo type: Regular undo User Undo Applied Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000*-----------------------------index undo for leaf key operationsKTB Redo op: 0x02 ver: 0x01 compat bit: 4 (post-11) padding: 1op: C uba: 0x0c872e86.de93.2eDump kdilk : itl=2, kdxlkflg=0x1 sdc=34 indexid=0x1c75b83 block=0x01e30944(kdxlre): restore leaf row (clear leaf delete flags)key :(51): 2f 46 53 4c 32 2e 53 55 2d 50 56 32 30 31 39 30 31 32 31 53 48 30 33 33 36 30 30 34 33 39 2d 31 38 30 30 32 33 34 39 36 37 36 32 34 2d 30 30 31 02 c1 05keydata/bitmap: (6): 01 a7 2e 1a 00 0e.....................*-----------------------------* Rec #0x37 slt: 0x1d objn: 188387(0x0002dfe3) objd: 188387 tblspc: 8(0x00000008)* Layer: 11 (Row) opc: 1 rci 0x36 Undo type: Regular undo User Undo Applied Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000*-----------------------------KDO undo record:KTB Redo op: 0x02 ver: 0x01 compat bit: 4 (post-11) padding: 1op: C uba: 0x0c872e87.de93.2cKDO Op code: DRP row dependencies Disabled xtype: XA flags: 0x00000000 bdba: 0x10032712 hdba: 0x0200f983itli: 1 ispac: 0 maxfr: 4858tabn: 0 slot: 9(0x9) 2019-08-13 20:09:28.120262 : kjbmbassert [0x72e87.32]2019-08-13 20:09:28.120413 : kjbmsassert(0x72e87.32)(2)End dump data blocks tsn: 2 file#: 50 minblk 470663 maxblk 470663
说明:
SLT 就是我们在UNDO HEADER里面TRANSACTION TABLE的槽号
objn:OBJECT_ID
objd:DATA_OBJECT_ID。
Undo块是正向写,逆向恢复的。这个时候,我们应该从最后一个undo record往前看。也就是先分析Rec #0x37为何无法回滚
这里通过OBJECT_ID=188387可以找到对象为archive.s_ship_unit_line_dmp(select * from dba_objects where object_id=188387)的表。
然后我们通过objd,bdba,slot推算出rowid:AAAt/jAAIAQAycSAAJ
附:如何通过objd,bdba,slot推算出rowid
1、语法:
DBMS_ROWID.ROWID_CREATE (rowid_type IN NUMBER,object_number IN NUMBER,relative_fno IN NUMBER,block_number IN NUMBER,row_number IN NUMBER)
2、参数:
rowid_type:rowid类型(restricted或者extended)。设置rowid_type为0时,代表restricted ROWID(此时,将忽略参数object_number):设置rowid_type为1时,代表extended ROWID。object_number:数据对象编号(仅restricted类型rowid可用)。relative_fno:所在数据文件编号。block_number:该数据文件中的数据块编号。row_number:在该块中的行编号。--rowid类型为1(select dbms_rowid.rowid_type(rowid) from archive.s_ship_unit_line_dmp)--数据对象编号为188387(select dbms_rowid.rowid_object(ROWID) from archive.s_ship_unit_line_dmp)--数据文件编号为8(select dbms_rowid.rowid_relative_fno(ROWID) from archive.s_ship_unit_line_dmp)--block_number:bdba--row_number:slotselect dbms_rowid.rowid_create(1,188387,8,268642066,9) from archive.s_ship_unit_line_dmp where rownum=1;
![537edd6805228fdc576290e149f0d781.png](https://img-blog.csdnimg.cn/img_convert/537edd6805228fdc576290e149f0d781.png)
关键信息如下:
![26c37e2a93052277724f1f0e3aee6dab.png](https://img-blog.csdnimg.cn/img_convert/26c37e2a93052277724f1f0e3aee6dab.png)
因为里面都是16进制,所以需要做一下转换:
![7a26af824f82d6da5741a55463369ffb.png](https://img-blog.csdnimg.cn/img_convert/7a26af824f82d6da5741a55463369ffb.png)
到这里就没有继续分析下去了,应该是最后rowid推算时slot考虑的不太对,有点问题,所以定位不到这一行,没法继续深入了,然后发现告警日志又突然恢复正常了...
后面会分享更多devops和DBA方面的内容,感兴趣的朋友可以关注下~
![d4902d5895b67e5faf934535f5fe4f0f.gif](https://img-blog.csdnimg.cn/img_convert/d4902d5895b67e5faf934535f5fe4f0f.gif)