SEMQ处理809-Confirmation出现死锁问题

1.现象

在das测试过程中发现发送方由于时序控制导致频繁出现死锁,牺牲事务的情况.

select min(object_id) from tb_0031 where f009n_0031 = 0 and f023v_0031='100,1361,10600,Lsj2014061300590,0'

导致tb_0031中的记录进入f009n_0031=100的状态非常慢,大量实际已经送达的消息在发送端长时间处于发送中的状态(SS_SEND)。


2.死锁跟踪

用SQL Server Profiler跟踪死锁.
图:



记录死锁信息的文件内容:
<deadlock-list>
 <deadlock victim="process78a868">
  <process-list>
   <process id="process775438" taskpriority="0" logused="496" waitresource="KEY: 25:72057594042253312 (70002fa012a1)" waittime="5007" ownerId="423777181" transactionname="implicit_transaction" lasttranstarted="2014-06-13T17:02:39.137" XDES="0x1018b5800" lockMode="S" schedulerid="2" kpid="3748" status="suspended" spid="68" sbid="0" ecid="0" priority="0" transcount="1" lastbatchstarted="2014-06-13T17:02:39.137" lastbatchcompleted="2014-06-13T17:02:39.137" hostname="PC-200906221632" hostpid="5704" loginname="sa" isolationlevel="read committed (2)" xactid="423777181" currentdb="25" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
    <executionStack>
     <frame procname="adhoc" line="1" sqlhandle="0x0200000091a57c33e82bf15eca2e771204f65770fb0fbae8">
(@1 tinyint,@2 varchar(8000))SELECT MIN([object_id]) FROM [tb_0031] WHERE [f009n_0031]=@1 AND [f023v_0031]=@2     </frame>
     <frame procname="adhoc" line="1" sqlhandle="0x02000000e8928a071ea2fe5e743b11abb26559be24d4513c">
select min(object_id) from tb_0031 where f009n_0031 = 0 and f023v_0031='100,1361,10600,Lsj2014061300582,0'     </frame>
    </executionStack>
    <inputbuf>
select min(object_id) from tb_0031 where f009n_0031 = 0 and f023v_0031='100,1361,10600,Lsj2014061300582,0'    </inputbuf>
   </process>
   <process id="process78a868" taskpriority="0" logused="496" waitresource="KEY: 25:72057594042253312 (6b00528dc380)" waittime="5007" ownerId="423777182" transactionname="implicit_transaction" lasttranstarted="2014-06-13T17:02:39.137" XDES="0xcce41270" lockMode="S" schedulerid="3" kpid="3744" status="suspended" spid="99" sbid="0" ecid="0" priority="0" transcount="1" lastbatchstarted="2014-06-13T17:02:39.137" lastbatchcompleted="2014-06-13T17:02:39.137" hostname="PC-200906221632" hostpid="5704" loginname="sa" isolationlevel="read committed (2)" xactid="423777182" currentdb="25" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
    <executionStack>
     <frame procname="adhoc" line="1" sqlhandle="0x0200000091a57c33e82bf15eca2e771204f65770fb0fbae8">
(@1 tinyint,@2 varchar(8000))SELECT MIN([object_id]) FROM [tb_0031] WHERE [f009n_0031]=@1 AND [f023v_0031]=@2     </frame>
     <frame procname="adhoc" line="1" sqlhandle="0x020000000a97680a5c97acb45ada2d5a9ccd6de08314e059">
select min(object_id) from tb_0031 where f009n_0031 = 0 and f023v_0031='100,1361,10600,Lsj2014061300587,0'     </frame>
    </executionStack>
    <inputbuf>
select min(object_id) from tb_0031 where f009n_0031 = 0 and f023v_0031='100,1361,10600,Lsj2014061300587,0'    </inputbuf>
   </process>
   <process id="process7a0da8" taskpriority="0" logused="496" waitresource="KEY: 25:72057594042253312 (6b00528dc380)" waittime="5007" ownerId="423777185" transactionname="implicit_transaction" lasttranstarted="2014-06-13T17:02:39.137" XDES="0x101629810" lockMode="S" schedulerid="4" kpid="3580" status="suspended" spid="70" sbid="0" ecid="0" priority="0" transcount="1" lastbatchstarted="2014-06-13T17:02:39.137" lastbatchcompleted="2014-06-13T17:02:39.137" hostname="PC-200906221632" hostpid="5704" loginname="sa" isolationlevel="read committed (2)" xactid="423777185" currentdb="25" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128058">
    <executionStack>
     <frame procname="adhoc" line="1" sqlhandle="0x0200000091a57c33e82bf15eca2e771204f65770fb0fbae8">
(@1 tinyint,@2 varchar(8000))SELECT MIN([object_id]) FROM [tb_0031] WHERE [f009n_0031]=@1 AND [f023v_0031]=@2     </frame>
     <frame procname="adhoc" line="1" sqlhandle="0x02000000e750ee148a161bcd440bd202f9a33439da606d09">
select min(object_id) from tb_0031 where f009n_0031 = 0 and f023v_0031='100,1361,10600,Lsj2014061300592,0'     </frame>
    </executionStack>
    <inputbuf>
select min(object_id) from tb_0031 where f009n_0031 = 0 and f023v_0031='100,1361,10600,Lsj2014061300592,0'    </inputbuf>
   </process>
  </process-list>
  <resource-list>
   <keylock hobtid="72057594042253312" dbid="25" objectname="das_ws.dbo.tb_0031" indexname="PK_TB_0031" id="lock96137100" mode="X" associatedObjectId="72057594042253312">
    <owner-list>
     <owner id="process775438" mode="X"/>
    </owner-list>
    <waiter-list>
     <waiter id="process7a0da8" mode="S" requestType="wait"/>
     <waiter id="process78a868" mode="S" requestType="wait"/>
    </waiter-list>
   </keylock>
   <keylock hobtid="72057594042253312" dbid="25" objectname="das_ws.dbo.tb_0031" indexname="PK_TB_0031" id="lockba708a00" mode="X" associatedObjectId="72057594042253312">
    <owner-list>
     <owner id="process78a868" mode="X"/>
    </owner-list>
    <waiter-list>
     <waiter id="process775438" mode="S" requestType="wait"/>
    </waiter-list>
   </keylock>
  </resource-list>
 </deadlock>
</deadlock-list>

3个进程执行的都是相同的语句,这个语句怎么会要求排它锁,共享锁?
猜测:
select min()并不会要求排它锁.事务过程中获取排它锁的并非这里显示的SQL语句.而是,之前update记录状态的语句.


select min(object_id) from tb_0031 where f009n_0031 = 0 and f023v_0031='100,1361,10600,Lsj2014061300590,0'

select count(*) from tb_0031 where f009n_0031 = 0 and f023v_0031='100,1361,10600,Lsj2014061300590,0'
估计的查询计划相同.
图:


3.处理

采用下面的方式对比测试:
(1)屏蔽时序控制
bbox.conf增加是否支持时序控制的配置:
<enable_seq_ctrl>false></eanble_seq_ctrl> <!--是否支持时序控制,默认:true -->

屏蔽时序控制程序没有出现死锁!


(2)改变事务过程操作的顺序,先检查时序控制键,再修改记录状态.

修改后测试程序没有出现死锁!

原流程:
        CSEMQItem qe;
        int iRet = semq_.GetItem(qe, 0,0,ref_object_id);
        if(iRet < 0)
            return -3;
        if(iRet == 1)
        {
            GETDBC_RETURN(pdbor,this->local_dbc_.c_str(),-1);
            pdbor->BeginTrans();
            result = semq_.UpdateStatus(ref_object_id,ISEMQ::SS_ACK2);
            if(result)
                return -2;

            if (semq_.IsEnableSeqCtrl()) {
                if (!qe.sc_key_.empty()) {
                    CQQ_OBJECT_ID next_object_id=qe.object_id_;
                    int ret = semq_.GetNextSeqCtrlRecord(qe.sc_key_.c_str(),next_object_id);
                    GetThisLogger()->log(LO_STDOUT|LO_FILE,SEVERITY_DEBUG,"取下一个相同时序键%s的记录.qe.object_id=%I64d,next_object_id=%I64d,ret=%d.\n",
                        qe.sc_key_.c_str(),qe.object_id_,next_object_id,ret);
                    if (ret==1) {
                        pdbor->SetTXHook(TxHook,&semq_);
                        semq_tss_->v_rec.push_back(new SEMQ_RECORD_DATA_PAIR(next_object_id, qe.sc_key_));        
                    }
                    else if (ret==-1) {
                        pdbor->RollbackTrans();
                        return -3;
                    }
                }
            }
           
修改后流程:
            GETDBC_RETURN(pdbor,this->local_dbc_.c_str(),-1);
            ///< 如果目标是平台(且status==6)或者已送达,该确认的是有时序控制的记录,则取下一条
            if (semq_.IsEnableSeqCtrl()) {
                if ((qe.dest_type_==PT_CENTRAL||status==ISEMQ::SS_ACK2)&&!qe.sc_key_.empty()) {
                    CQQ_OBJECT_ID next_object_id=qe.object_id_;
                    int ret = semq_.GetNextSeqCtrlRecord(qe.sc_key_.c_str(),next_object_id);
                    if (ret==1) {
                        pdbor->SetTXHook(TxHook,&semq_);
                        semq_tss_->v_rec.push_back(new SEMQ_RECORD_DATA_PAIR(next_object_id, qe.sc_key_));
                    }
                    else if (ret==-1) {
                        return -3;
                    }
                }
            }
            pdbor->BeginTrans();
            if (semq_.UpdateStatus(ssr.record_id_,status,ISEMQ::SS_ACK2)) {
                pdbor->RollbackTrans();
                return -2;
            }

          
在UpdateStatus之前GetNextSeqCtrlRecord时没有相同时序键的后续消息待发送,   此时新增了一条消息,由于受时序控制而未发送。

这种情况下就需要依赖后续扫描处理该记录。   

逻辑完整性依赖一个后台定时执行的线程,这也是修改带来的弊端。



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值