How to Obtain a System State Trace BEFORE the Error ORA-2049 Occurs

In this Document

Goal
Fix
References

Applies to:

Oracle Server - Enterprise Edition - Version 10.1.0.2 to 11.2.0.2 [Release 10.1 to 11.2]
Information in this document applies to any platform.
Oracle Server Enterprise Edition - Version: 10.1.0.2 to 11.2.0.2


Goal

This document is intended to explain under what conditions an ORA-02049 'timeout: distributed transaction waiting for lock' error can be signaled, and to provide atoolto diagnose situations when this occurs. The challenge with troubleshooting these types of errors is that by the time the error is thrown, the contention no longer is present. A type of trace used byOracleSupport called a system state dump is useful for finding the session holding the resources that the session(s) signaling ORA-02049 are waiting on, but it must be obtained before the error is signaled. Therefore the contention cannot be captured even by cascading a system state trace off of an ORA-02049 errorstack trace, which is automatically written when the ORA-02049 error occurs.

An ORA-02049 'timeout: distributed transaction waiting for lock' error occurs when a session that is involved in a distributed transaction waits for another session's TX enqueue (row lock) for longer than the database distributed_lock_timeout. What constitutes a 'distributed transaction' is what generates confusion on this topic. Any operation that uses a database link - even a query - starts a distributed transaction. For example, consider the following:

Session 1:
-----------
SQL> select * from global_name;

GLOBAL_NAME
---------------------------------------------------------------------------
ORCL102A.WORLD

SQL> update scott.dept set loc='NOME' where deptno=10;

1 row updated.


Session 2:
----------
SQL> select * from global_name;

GLOBAL_NAME
---------------------------------------------------------------------------
ORCL102A.WORLD

SQL> select 1 from dual@orcl102b.world;

1
----------
1

SQL> update scott.dept set loc='NOME' where deptno=10;
update scott.dept set loc='NOME' where deptno=10
*
ERROR at line 1:
ORA-02049: timeout: distributed transaction waiting for lock

If a 'commit' is issued in session 2 after the distributed query but before the update operation, no ORA-02049 error is encountered.

Common methods of dealing with this problem are:

  1. Increase the database initialization parameter distributed_lock_timeout. This will enable sessions involved in a distributed transaction to wait longer for a row lock, in hopes that the session holding the row lock will release it in the increased span of time.

  2. Code the application so that the error is a 'try again' exception - if it hits the error, keep trying the action until it succeeds. Eventually the session holding the contended row lock should release it and the waiting session should be able to proceed.

However, it is a common desire for customers to want to identify the session holding the resources. The procedure presented below allows us to do this five seconds before an ORA-02049 will be signaled.

NOTE: "enq: TX - row lock contention" is based on how your application has been designed, coded and being executed by different sessions to cause row level contention. This wait can only be fixed by changing the application code, design or usage of the application.

Fix

The procedure below monitors v$session for any sessions waiting on 'enq: TX - row lock contention' for seconds_in_wait. If seconds_in_wait for a session waiting on 'enq: TX - row lock contention' advances to a user-specified number of seconds before distributed_lock_timeout will expire, and the session is involved in a distributed transaction, it will write a system state trace to background_dump_dest (10g) or diagnostic_dest (11g). It will stop running after either writing three system state traces, or the time specified for the procedure to run expires, which ever occurs first.

The OERR_2049_SSTATE procedure takes two input parameters:

  • run_minutes: The number of minutes the procedure will execute before executing. This should be set high enough to capture the problem on the system.
  • sec_b4_timeout: This specifies the number of seconds before the database distributed_lock_timeout expires that the procedure will dump the systemstate. On busy systems it can takes quite some time to dump a systemstate, so it may be necessary to experiment with this parameter - the higher the value the more likely that the systemstate will capture the problem, but it is also less likely that an ORA-02049 will be signaled since we are dumping the systemstate that much sooner than the database distributed_lock_timeout would expire.

To implement the procedure:

1) To prepare for running the procedure, the schema it will be created in needs privileges to be able to select from v$ views, for creating and running procedures, etc.

CREATE USER oerr IDENTIFIED BY oerr;
GRANT CONNECT, RESOURCE, CREATE PROCEDURE, ALTER SESSION TO oerr;
GRANT SELECT ON SYS.V_$SESSION TO oerr;
GRANT SELECT ON SYS.V_$LOCK TO oerr;
GRANT SELECT ON SYS.V_$PARAMETER TO oerr;
GRANT EXECUTE ON DBMS_LOCK TO oerr;

2) Create the following table in the schema that owns the procedure. It allows the procedure to be terminated at any time. Otherwise the procedure must be killed administratively from another session.

create table on_off (run varchar2(1));
insert into on_off values ('Y');
commit;

To stop the procedure before the time specified for the procedure to run expires, execute:

update on_off set run = 'N';
commit;

3) Create the procedure in the appropriate schema:

CREATE OR REPLACE PROCEDURE OERR_2049_SSTATE (run_minutes number,
sec_b4_timeout number)
IS

v_start_date date := sysdate;
dist_lock_timeout number;
flag integer := 1;
v_wait number;
v_run on_off.run%type := 'Y';

cursor sess_wait is select sid, saddr, seconds_in_wait
from v$session where event = 'enq: TX - row lock contention';

v_seconds_in_wait number;
v_sid number;
v_lock_type varchar2(2);
NODATA BOOLEAN := FALSE;

BEGIN

execute immediate 'update on_off set run = ''Y''';
execute immediate 'commit';

dbms_output.put_line(' ');

execute immediate 'alter session set nls_date_format =
''YYYY-MM-DD HH24:MI:SS''';
dbms_output.put_line('Procedure started at ' || v_start_date);

select to_number(value) into dist_lock_timeout
from v$parameter where name = 'distributed_lock_timeout';

dbms_output.put_line('Value of distributed_lock_timeout =
' || dist_lock_timeout);

IF dist_lock_timeout < 10 THEN
dbms_output.put_line('Value of distributed_lock_timeout is too small,
exiting...');
v_run := 'N';
ELSE
v_wait := dist_lock_timeout - sec_b4_timeout;
END IF;

IF dist_lock_timeout - sec_b4_timeout <= 5 THEN
dbms_output.put_line('Value of sec_b4_timeout is too small, exiting...');
v_run := 'N';
END IF;

WHILE flag < 4 and sysdate - v_start_date < (run_minutes/1440)
and v_run = 'Y' LOOP
FOR r in sess_wait LOOP
v_seconds_in_wait := r.seconds_in_wait;
IF v_seconds_in_wait > v_wait THEN
v_sid := r.sid;
BEGIN
select type into v_lock_type from v$lock where sid = v_sid
and type = 'DX';
dbms_output.put_line('DX lock found...');
EXCEPTION
WHEN NO_DATA_FOUND THEN
NODATA := TRUE;
END;
IF NODATA THEN
NODATA := FALSE;
END IF;
IF v_lock_type = 'DX' THEN
dbms_output.put_line('Dumping systemstate #' || flag || '...');
execute immediate 'alter session set
tracefile_identifier=''SSTATE''';
execute immediate 'alter session set events
''immediate trace name SYSTEMSTATE level 10''';
flag := flag + 1;
dbms_lock.sleep(10);
END IF;
END IF;
END LOOP;
select run into v_run from on_off;
commit;
END LOOP;
dbms_output.put_line('Procedure ended at ' || sysdate);

END;
/

4) To run the procedure, specify the time in minutes for the procedure to run, and the time in seconds to dump the systemstate(s) before the database distributed_lock_timeout expires as input. For example, to run the procedure for 300 minutes (5 hours) and to dump the systemstate 20 seconds before the database distributed_lock_timeout would expire, execute:

exec oerr_2049_sstate(300,20);

The traces will have 'SSTATE' as part of the file name. As stated previously, the output will be written to background_dump_dest (10g) or diagnostic_dest (11g). The procedure will stop running after either writing three system state traces, or the time specified for the procedure to run expires, which ever occurs first.

RAC Support

If the database is running on RAC, In order to generate the SYSTEMSTATE dumps simultaneously on all nodes of the cluster, a script. named oradbg needs to be created in all nodes of the cluster as follows:








#!/bin/sh


ORADBG_LOG_FILE=/home/oracle/log/oradbg.log <= Create this directory or change the name to an existing one


PATH=$ORACLE_HOME/bin:/bin


(


echo "======================================="


echo "`date` - ORADBG called SID=$ORACLE_SID by OS pid $*" \ >> $ORADBG_LOG_FILE


echo " "


echo "Running SQLPLUS to execute debug script"


sqlplus -s '/ as sysdba' << EOF


oradebug setmypid


oradebug -G all dump systemstate 267


EOF


echo " "


)





give execution privileges on the script. at the os level usingm and make sure the scripts belongs to the oracle user:





chmod 700 oradbg





Set the following parameter on all the instances of the RAC:





alter system set "_oradbg_pathname"='/oradbg' scope=spfile sid='*';
Where  is the directory where the oradbg script. is located.





Restart the database instances and verify the script. is working executing:





sqlplus / as sysdba


alter session set events='immediate debug';





This should show in the screen the following messages:





Running SQLPLUS to execute debug script.
Statement processed.


Statement processed.





And when completed, should generate the SYSTEMSTATE dump on the bdump directory of each instance. The trace will belong to the diag background process.





If everything is working then modify the procedure oerr_2049_sstate as follows to execute the diag script. instead of the systemstate dump as follows:





CREATE OR REPLACE PROCEDURE OERR_2049_SSTATE (run_minutes number,


   sec_b4_timeout number)


IS





  v_start_date date := sysdate;


  dist_lock_timeout number;


  flag integer := 1;


  v_wait number;


  v_run on_off.run%type := 'Y';





  cursor sess_wait is select sid, saddr, seconds_in_wait


  from v$session where event = 'enq: TX - row lock contention';





  v_seconds_in_wait number;


  v_sid number;


  v_lock_type varchar2(2);


  NODATA BOOLEAN := FALSE;





BEGIN





  execute immediate 'update on_off set run = ''Y''';


  execute immediate 'commit';





  dbms_output.put_line(' ');





  execute immediate 'alter session set nls_date_format = ''YYYY-MM-DD HH24:MI:SS''';


  dbms_output.put_line('Procedure started at ' || v_start_date);





  select to_number(value) into dist_lock_timeout from v$parameter where name = 'distributed_lock_timeout';





  dbms_output.put_line('Value of distributed_lock_timeout = ' || dist_lock_timeout);





  IF dist_lock_timeout < 10 THEN


    dbms_output.put_line('Value of distributed_lock_timeout is too small, exiting...');


    v_run := 'N';


  ELSE


     v_wait := dist_lock_timeout - sec_b4_timeout;


  END IF;





  IF dist_lock_timeout - sec_b4_timeout <= 5 THEN


    dbms_output.put_line('Value of sec_b4_timeout is too small, exiting...');


    v_run := 'N';


  END IF;





  WHILE flag < 4 and sysdate - v_start_date < (run_minutes/1440) and v_run = 'Y' LOOP


     FOR r in sess_wait LOOP


        v_seconds_in_wait := r.seconds_in_wait;


        IF v_seconds_in_wait > v_wait THEN


           v_sid := r.sid;


           BEGIN


              select type into v_lock_type from v$lock where sid = v_sid and type = 'DX';


              dbms_output.put_line('DX lock found...');


              EXCEPTION


                WHEN NO_DATA_FOUND THEN


                NODATA := TRUE;


           END;


           IF NODATA THEN


              NODATA := FALSE;


           END IF;


           IF v_lock_type = 'DX' THEN


              dbms_output.put_line('Dumping systemstate #' || flag || '...');


              execute immediate 'alter session set events ''immediate debug''';


              flag := flag + 1;


              dbms_lock.sleep(10);


           END IF;


        END IF;


      END LOOP;


   select run into v_run from on_off;


   commit;


   END LOOP;


   dbms_output.put_line('Procedure ended at ' || sysdate);


END;


/

Still have questions ?

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion via My Oracle Support Streams and Distributed Database Community

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22308399/viewspace-749917/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/22308399/viewspace-749917/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值