分布式事务故障处理

故障处理

故障阶段本地远端状态检查处理方法解释
PREPARE
PHASE
GCCPS本地:
select local_tran_id,state from dba_2pc_pending;
LOCAL_TRAN_ID          STATE
---------------------- ----------------
2.12.64845              collecting

远程:
select local_tran_id,state from dba_2pc_pending;
no rows selected
本地
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');
本地数据库要求其他点做好commit或者rollback准备,现在正在“收集”其他点的数据库的返回信息,但是远程数据库未知状态(in doubt)。我们需要将本地的Global Coordinator的状态清除掉
本地:
select local_tran_id,state from dba_2pc_pending;
LOCAL_TRAN_ID          STATE
---------------------- ----------------
2.12.64845             prepared
远程:
select local_tran_id,state from dba_2pc_pending;
no rows selected
本地:
rollback force 'local_tran_id';
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');
本地Global Coordinator已经做好准备,已经将分布式锁放到各个事务的表上,但是远程数据库的状态再次未知(in doubt),我们需要手工的将本地的transaction rollback掉,并且清除分布式事务信息
COMMIT
PHASE
GCCPS本地:
select local_tran_id,state from dba_2pc_pending;
LOCAL_TRAN_ID          STATE
---------------------- ----------------
2.12.64845             prepared
远程:
select local_tran_id,state from dba_2pc_pending;
LOCAL_TRAN_ID          STATE
---------------------- ----------------
1.92.66874             commited
本地:
commit force 'local_tran_id';
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');

远程:
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');
远程数据库可能已经commit,但是本地Global Coordinator未知远程数据库的状态,还是处于prepare的状态
本地:
select local_tran_id,state from dba_2pc_pending;
LOCAL_TRAN_ID          STATE
---------------------- ----------------
2.12.64845            commited
远程:
select local_tran_id,state from dba_2pc_pending;
LOCAL_TRAN_ID          STATE
---------------------- ----------------
1.92.66874             commited
本地:
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');

远程:
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');
远程数据库和本地数据库均已经完成commit,但是分布式事务的信息尚未清除,我们需要在本地和远程运行
FORGET
PHASE
GCCPS本地:
select local_tran_id,state from dba_2pc_pending;
LOCAL_TRAN_ID          STATE
---------------------- ----------------
2.12.64845            commited
远程:
select local_tran_id,state from dba_2pc_pending;
no rows selected
本地:
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');
远程commit point site已经完成commit,通知Global Coordinator清除本地的分布式事务信息,但是Global Coordinator没有收到该信息
      
      
特殊情况     
情况1,当我们遇到使用rollback/commit force的时候,无法正常清除分布式事务的信息,会报错ORA-02058: no prepared transaction found with ID X.XX.XXXXX时,需要通过手工方式来清除该信息在dba_2pc表中还有事务记录,但是实际已经不存在该事务了:
select local_tran_id, state from dba_2pc_pending;
 
       LOCAL_TRAN_ID          STATE
       ---------------------- ----------------
       1.92.66874             prepared
 
(注:'1.92.66874' 的结构为rbs#, slot#, wrap#,此事务在rollback segment #1)

找出使用rollback segment #1且状态是active的transaction:
  SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */
         KTUXESTA Status,
         KTUXECFL Flags
  FROM x$ktuxe
  WHERE ktuxesta!='INACTIVE'
        AND ktuxeusn= 1; <== 这是rollback segment#,即rbs#
 
no rows selected
因此我们在rollback force的时候会报错:
ORA-02058: no prepared transaction found with ID 1.92.66874
我们需要如下处理:

set transaction use rollback segment SYSTEM;
 
delete from sys.pending_trans$ where local_tran_id = '1.92.66874';
delete from sys.pending_sessions$ where local_tran_id = '1.92.66874';
delete from sys.pending_sub_sessions$ where local_tran_id = '1.92.66874';
 
commit;
 
情况2,这种情况比较少见,在dba_2pc表中无法查到分布式事务信息,但是实际上却是存在该分布式事务的我们在alertlog中可以看到:
ORA-1591: lock held by in-doubt distributed transaction 1.92.66874

查询dba_2pc的表,发现没有分布式事务信息:
select local_tran_id, state from dba_2pc_pending
where local_tran_id='1.92.66874';
 
no rows selected

但是去查实际的rollback segment信息,却发现有prepared状态的分布式事务存在:
SELECT KTUXEUSN, KTUXESLT, KTUXESQN, /* Transaction ID */
       KTUXESTA Status,
       KTUXECFL Flags
FROM x$ktuxe
WHERE ktuxesta!='INACTIVE'
      AND ktuxeusn= 1;
 
  KTUXEUSN   KTUXESLT   KTUXESQN STATUS           FLAGS
---------- ---------- ---------- ---------------- ------------------------
         1         92      66874 PREPARED         SCO|COL|REV|DEAD

我们无法做commit force或者rollback force:

rollback force '1.92.66874';
ORA-02058: no prepared transaction found with ID 1.92.66874
我们用如下的方式手工清理:
alter system disable distributed recovery;

    insert into pending_trans$ (
        LOCAL_TRAN_ID,
        GLOBAL_TRAN_FMT,
        GLOBAL_ORACLE_ID,
        STATE,
        STATUS,
        SESSION_VECTOR,
        RECO_VECTOR,
        TYPE#,
        FAIL_TIME,
        RECO_TIME)
    values( '1.92.66874', /* <== 此处为你的local tran id */
        306206,                  /*                                         */
        'XXXXXXX.12345.1.2.3',   /*  这些值不必更改,   */
        'prepared','P',          /*  是静态参数,可以直接   */
        hextoraw( '00000001' ),  /*  在这个sql中使用                             */
        hextoraw( '00000000' ),  /*                                         */
        0, sysdate, sysdate );
 
    insert into pending_sessions$
    values( '1.92.66874',/* <==此处为你的local tran id  */
        1, hextoraw('05004F003A1500000104'), 'C', 0, 30258592, '', 146);
    commit;
 
    commit force '1.92.66874';
   
    此时如果commit force还是出现报错,需要继续执行:
      delete from pending_trans$ where local_tran_id='1.92.66874';
      delete from pending_sessions$ where local_tran_id='1.92.66874';
      commit;
      alter system enable distributed recovery;
 
    此时如果没有报错,则执行以下语句:
      alter system enable distributed recovery;
      connect / as sysdba
      alter session set "_smu_debug_mode" = 4;
      (注:如果使用auto的undo管理方式,需要执行此步骤,此步骤能避免在后续执行purge_lost_db_entry出现ORA-01453 的报错,详细信息可见Bug2191458)
      commit; 
      exec dbms_transaction.purge_lost_db_entry( '1.92.66874' )
 

 

分布式锁-解锁

运行shell脚本后,会生成 roll.sql文件。
只需要运行roll.sql就会解除dba_2pc_pending 中的分布式锁。
 
注意:根据情况决定shell脚本中是否需要where 条件,比如过滤status 是prepare的,还是其他的,还是全部的。
 
----------------------shell脚本-------------------------
#!/bin/bash
. /home/oracle/.profile
sqlplus -S  / as sysdba <<EOF >  roll.sql
set heading off pagesize 0 echo off linesize 200 feedback off trimspool on
select 'rollback force  ''' || LOCAL_TRAN_ID || ''';' || chr(10) ||
'execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY(''' || LOCAL_TRAN_ID
 || ''');' || chr(10) || 'commit;' from DBA_2PC_PENDING;
exit;
EOF
 
------------------------上述sh生成的roll.sql文件格式-----------------------
 
rollback force  '1506.7.4851';
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1506.7.4851');
commit;
rollback force  '84.4.6215898';
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('84.4.6215898');
commit;
 

参考

https://www.cnblogs.com/princessd8251/articles/4559993.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Hehuyi_In

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值