[20181030]模拟分布式事务挂起导致TX锁争用.txt

[20181030]模拟分布式事务挂起导致TX锁争用.txt


--//随着业务不断扩展,在业务中使用分布式事务的情况越来越多,除了导致scn抬高外,可能由于网络问题导致连接中断,

--//甚至由于业务连接外部网络,比较常见ora-01591错误.


$ oerr ora 1591

01591, 00000, "lock held by in-doubt distributed transaction %s"

// *Cause:  Trying to access resource that is locked by a dead two-phase commit

//          transaction that is in prepared state.

// *Action: DBA should query the pending_trans$ and related tables, and attempt

//          to repair network connection(s) to coordinator and commit point.

//          If timely repair is not possible, DBA should contact DBA at commit

//          point if known or end user for correct outcome, or use heuristic

//          default if given to issue a heuristic commit or abort command to

//          finalize the local portion of the distributed transaction.


--//网上有相关的文章提到如何解决这个问题.

--//这里模拟分布式事务挂起导致TX锁争用,模拟出现ora-01591错误.

--//测试参看链接:http://www.cnxdug.org/?p=1389


1.环境:

--//客户端:

SCOTT@book> @ ver1

PORT_STRING                    VERSION        BANNER

------------------------------ -------------- --------------------------------------------------------------------------------

x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production


SCOTT@book> create PUBLIC database link test033 connect to scott identified by btbtms using '192.168.100.33:1521/test:DEDICATED';

Database link created.


SCOTT@book> create table deptx as select * from deptx;

Table created.


SCOTT@book> select sysdate from dual@test033;

SYSDATE

-------------------

2018-10-31 08:55:34

--//OK,连接没有问题.

SCOTT@book> commit;

Commit complete.


2.测试:

--//为了模拟分布式事务失败,无法自动恢复的情况,需要把分布式事务自动恢复关掉。

SCOTT@book> alter system disable distributed recovery;

System altered.


--//服务端:

SCOTT@test> @ &r/ver1

PORT_STRING                    VERSION        BANNER

------------------------------ -------------- ----------------------------------------------------------------

x86_64/Linux 2.4.xx            10.2.0.4.0     Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi


SCOTT@test> create table depty as select * from dept;

Table created.


2.在客户端执行:

SCOTT@book> update deptx         set loc = lower(loc) where deptno=10;

1 row updated.


SCOTT@book> update depty@test033 set loc = lower(loc) where deptno=20;

1 row updated.


SCOTT@book> @ &r/xid

XIDUSN_XIDSLOT_XIDSQN

------------------------------

10.2.24482


C70                                                                        XIDUSN    XIDSLOT     XIDSQN     UBAFIL     UBABLK     UBASQN     UBAREC STATUS            USED_UBLK  USED_UREC XID              ADDR             START_DATE          FLAG

---------------------------------------------------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- ---------- ---------- ---------------- ---------------- ------------------- ----------

ALTER SYSTEM DUMP UNDO BLOCK '_SYSSMU10_1197734989$' XID 10 2 24482;           10          2      24482          3      10115       4716          1 ACTIVE                    1          1 0A000200A25F0000 0000000081BD0748 2018-10-31 08:56:30 4197891

ALTER SYSTEM DUMP UNDO HEADER '_SYSSMU10_1197734989$';

ALTER SYSTEM DUMP DATAFILE 3 BLOCK 10115;


SCOTT@book> COMMIT COMMENT 'ORA-2PC-CRASH-TEST-4';

COMMIT COMMENT 'ORA-2PC-CRASH-TEST-4'

*

ERROR at line 1:

ORA-02054: transaction 10.2.24482 in-doubt

ORA-02059: ORA-2PC-CRASH-TEST-4 in commit comment


SCOTT@book> @ &r/xid

XIDUSN_XIDSLOT_XIDSQN

------------------------------


no rows selected

--//已经没有事务.


--//"COMMIT COMMENT 'ORA-2PC-CRASH-TEST-4'"这个命令,是Oracle用来强制分布式事务失败,4代表非提交点的分布式事务失败在

--//prepare后。其它数值代表的意思如下:


1 Crash commit point after collect

2 Crash non-commit-point site after collect

3 Crash before prepare (non-commit-point site)

4 Crash after prepare (non-commit-point site)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

5 Crash commit point site before commit

6 Crash commit point site after commit

7 Crash non-commit-point site before commit

8 Crash non-commit-point site after commit

9 Crash commit point site before forget

10 Crash non-commit-point site before forget


--//查询dba_2pc_pending视图:

SCOTT@book> select local_tran_id, GLOBAL_TRAN_ID, state from dba_2pc_pending;

LOCAL_TRAN_ID          GLOBAL_TRAN_ID            STATE

---------------------- ------------------------- ----------------

10.2.24482             BOOK.e6127bf4.10.2.24482  prepared


--//本地事务是10.2.24482(与前面能对上),全局事务是BOOK.e6127bf4.10.2.24482,状态是prepared。


--//观察lock状态:

SCOTT@book> @ &r/viewlock

no rows selected

--//没有lock.


--//如果我们再去执行更新deptx表的同一行发现会话挂起,过了就会报ORA-01591,Oracle自动检测到了这行记录被分布式事务挂起,导

--//致更新失败。

SCOTT@book> update deptx         set dname = lower(dname) where deptno=10;

update deptx         set dname = lower(dname) where deptno=10

       *

ERROR at line 1:

ORA-01591: lock held by in-doubt distributed transaction 10.2.24482

--//挂起一小会后,出现提示.


SCOTT@book> @ &r/viewlock

no rows selected


SCOTT@book> select local_tran_id, GLOBAL_TRAN_ID, state from dba_2pc_pending;

LOCAL_TRAN_ID          GLOBAL_TRAN_ID           STATE

---------------------- ------------------------ ----------------

10.2.24482             BOOK.e6127bf4.10.2.24482 prepared


--//解决很简单:

SYS@book> select local_tran_id, GLOBAL_TRAN_ID, state from dba_2pc_pending;

LOCAL_TRAN_ID          GLOBAL_TRAN_ID           STATE

---------------------- ------------------------ ----------------

10.2.24482             BOOK.e6127bf4.10.2.24482 forced commit

--//state=forced commit.


SYS@book> alter system enable distributed recovery;

System altered.


SYS@book> exec dbms_transaction.purge_lost_db_entry( '10.2.24482');

BEGIN dbms_transaction.purge_lost_db_entry( '10.2.24482'); END;

*

ERROR at line 1:

ORA-06510: PL/SQL: unhandled user-defined exception

ORA-06512: at "SYS.DBMS_TRANSACTION", line 105

ORA-06512: at line 1

--//我的测试前面alter system enable distributed recovery;,打开以后就可以提交了.


SYS@book> select * from scott.deptx where deptno=10;

    DEPTNO DNAME          LOC

---------- -------------- -------------

        10 ACCOUNTING     new york


SCOTT@book> select * from scott.depty@test033 where deptno=20;

    DEPTNO DNAME          LOC

---------- -------------- -------------

        20 RESEARCH       DALLAS


--//本地修改了,而远程修改失败.


--//链接:http://www.cnxdug.org/?p=1389,提到设置commit_point_strength参数,自己也重复测试看看.

--//以下是转载:(操作我自己重新做了一次)

如何防止进行prepared状态


如果分布式事务的操作,经常在某个重要的数据库实例上进行,那为了防止本地事务异常进入prepared状态无法自动回滚处理,就需要把

这个重要节点的commit_point_strength参数改大为dblink数据库中最大值,这样这个节点在提交分布式事务时,本地事务是不进入

prepared状态的。


如当前远程实例commit_point_strength为1,把本地节点的commit_point_strength改为100,修改这个参数需要重启数据库实例。


SYS@book> @ &r/hide commit_point_strength

NAME                  DESCRIPTION                                                   DEFAULT_VALUE SESSION_VALUE SYSTEM_VALUE

--------------------- ------------------------------------------------------------- ------------- ------------- ------------

commit_point_strength Bias this node has toward not preparing in a two-phase commit TRUE          1             1


SYS@book> alter system set commit_point_strength=100 scope=spfile;

System altered.


SYS@book> shutdown immediate ;

Database closed.

Database dismounted.

ORACLE instance shut down.


SYS@book> startup

ORACLE instance started.

Total System Global Area  634732544 bytes

Fixed Size                  2255792 bytes

Variable Size             197133392 bytes

Database Buffers          427819008 bytes

Redo Buffers                7524352 bytes

Database mounted.

Database opened.


SYS@book> show parameter commit_point_strength;

NAME                  TYPE    VALUE

--------------------- ------- -----

commit_point_strength integer 100



--//再次模拟上面的过程:

SCOTT@book> alter system disable distributed recovery;

System altered.


SCOTT@book> update deptx         set loc = lower(loc)||'a' where deptno=10;

1 row updated.


SCOTT@book> update depty@test033 set loc = lower(loc) where deptno=20;

1 row updated.


SCOTT@book> @ &r/xid


XIDUSN_XIDSLOT_XIDSQN

------------------------------

10.8.24486


SCOTT@book> COMMIT COMMENT 'ORA-2PC-CRASH-TEST-4';

COMMIT COMMENT 'ORA-2PC-CRASH-TEST-4'

*

ERROR at line 1:

ORA-02050: transaction 10.8.24486 rolled back, some remote DBs may be in-doubt

ORA-02054: transaction 10.47.75654 in-doubt

ORA-02059: ORA-2PC-CRASH-TEST-4 in commit comment

ORA-02063: preceding 2 lines from TEST033



SCOTT@book> @ &r/xid

XIDUSN_XIDSLOT_XIDSQN

------------------------------


no rows selected

--//已经没有事务.


SYS@book> select local_tran_id, GLOBAL_TRAN_ID, state from dba_2pc_pending;

LOCAL_TRAN_ID          GLOBAL_TRAN_ID           STATE

---------------------- ------------------------ ----------------

10.8.24486             BOOK.e6127bf4.10.8.24486 collecting

--//state=collecting,但是不影响本地更新操作.更新deptx表发现可以更新,不再挂起和报错。


SCOTT@book> update deptx         set loc = lower(loc)||'a' where deptno=10;

1 row updated.


SCOTT@book> commit ;

Commit complete.


--//不过作者没测试完成, 在服务端查询:

SCOTT@test> select * from depty;

select * from depty

              *

ERROR at line 1:

ORA-01591: lock held by in-doubt distributed transaction 10.47.75654


SCOTT@test> select local_tran_id, GLOBAL_TRAN_ID, state from dba_2pc_pending;

LOCAL_TRAN_ID          GLOBAL_TRAN_ID           STATE

---------------------- ------------------------ ----------------

10.47.75654            BOOK.e6127bf4.10.8.24486 prepared


--//state=prepared.


SCOTT@test> update depty set loc = lower(loc) where deptno=20;

update depty set loc = lower(loc) where deptno=20

       *

ERROR at line 1:

ORA-01591: lock held by in-doubt distributed transaction 10.47.75654


--//解决方法参考网上解决ora-01591的文档.

--//我这里执行如下:

--//客户端:

SYS@book> alter system enable distributed recovery;

System altered.


SYS@book> select local_tran_id, GLOBAL_TRAN_ID, state from dba_2pc_pending;

no rows selected


SCOTT@book> select * from deptx where deptno=10;

    DEPTNO DNAME          LOC

---------- -------------- -------------

        10 accounting     new yorka


SCOTT@book> select * from depty@test033 where deptno=20;

    DEPTNO DNAME          LOC

---------- -------------- -------------

        20 RESEARCH       DALLAS


--//服务端:

SCOTT@test> select local_tran_id, GLOBAL_TRAN_ID, state from dba_2pc_pending;

no rows selected


--//commit_point_strength=100本地业务没有影响,但是远程存在问题,依旧出现ora-01591.当然我的测试使用alter system disable distributed recovery;

--//并不算真正的模拟.


3.注意收尾工作:

SYS@book> alter system reset commit_point_strength;

System altered.


--//重启数据库,检查参数commit_point_strength设置(略).


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/267265/viewspace-2218154/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/267265/viewspace-2218154/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值