Master Note: Troubleshooting Database Transaction Recovery (Doc ID 1494886.1)

In this Document

 Purpose
 Details
 Concepts
 12c Multitnenant Database:
 Modes of Transaction Recovery
 Parallel Transaction Recovery
 Serial Transaction Recovery
 Detection of Transaction Recovery
 Identify Dead Transactions and their Sizes
 Identify undo segments containing dead transactions
 Identify the Time for transaction recovery to complete
 Query to identify the number of parallel Recovery Slaves
 Identify Objects involved in Transaction Recovery
 Data Dictionary Tables
 Dump the UNDO block
 Disable Transaction Recovery
 Suggestions
 Information to collect for Oracle Support
 References

APPLIES TO:

Oracle Database - Enterprise Edition - Version 9.2.0.8 to 12.1.0.1 [Release 9.2 to 12.1]
Information in this document applies to any platform.

PURPOSE

 To discuss the concepts, modes and detection of Database Transaction recovery.

DETAILS

Concepts

Transaction recovery involves rolling back all uncommitted transactions of a failed instance. These are "in-progress" transactions that did not commit and that Oracle needs to undo. It is possible for uncommitted transactions to get saved to disk. In this case, Oracle uses undo data to reverse the effects of any changes that were written to the datafiles but not yet committed.

Oracle applies undo blocks to roll back uncommitted changes in data blocks that were either written before the crash or introduced by redo application during cache recovery. This normally happens during the Roll backward phase when the DB is restarted. Transaction Recovery can be performed by either the Server process which initiated the Transaction or the SMON process (in case the Server process is dead).

SMON process takes over the recovery when

  •  Server process is dead / crashed.
  •  Instance itself is crashed

Undo blocks (whether in rollback segments or automatic undo tablespaces) record database actions that may need to be undone during certain database operations. In database recovery, the undo blocks roll back the effects of uncommitted transactions previously applied by the rolling forward phase.

After the roll forward, any changes that were not committed must be undone. Oracle applies undo blocks to roll back uncommitted changes in data blocks that were either written before the failure or introduced by redo application during cache recovery. This process is called rolling back or transaction recovery.

 

              Description of Figure 15-3 follows

                        Fig 1: Basic Recovery Steps: Rolling Forward and Rolling Back

 More details about Roll forward and Roll backward in "Instance Recovery" section in Doc ID 1505155.1

12c Multitnenant Database:

Please note that : Oracle Database performs crash and instance recovery for the entire multitenant container database (CDB). You cannot recover individual pluggable databases (PDBs).

 

To perform crash and instance recovery for a CDB:

  • Open a SQL client such as SQL*Plus.
  • Connect to the root as a user with the ALTER PLUGGABLE DATABASE system privilege.
  • Follow the procedures in "Performing Closed Database Recovery".
  • If you do not want to recover a particular PDB, take its files offline.


Symptoms


  • High CPU Utilization by SMON process
  • Database may hang during large transaction recovery.
  • If Database is shutdown abort, then the database may hang during consequent startup.
  • Database repeatedly crashes while open.
  • Database internal errors which could result in transaction recovery failure.

 

Modes of Transaction Recovery


Parallel Transaction Recovery

Recovery occurs in parallel mode. Several parallel slave processes will be spawned and will be involved in recovery. This is also termed as Fast Start Parallel Rollback. The background process SMON acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes. To enable Parallel recovery mode, set the parameter FAST_START_PARALLEL_ROLLBACK to LOW / HIGH.

Values:

  • FALSE - Parallel rollback is disabled
  • LOW - Limits the maximum degree of parallelism to 2 * CPU_COUNT
  • HIGH -Limits the maximum degree of parallelism to 4 * CPU_COUNT

If you change the value of this parameter, then transaction recovery will be stopped and restarted with the new implied degree of parallelism. Fast-start parallel rollback is mainly useful when a system has transactions that run a long time before committing, especially parallel INSERT, UPDATE, and DELETE operations. When SMON discovers that the amount of recovery work is above a certain threshold, it automatically begins parallel rollback by dispersing the work among several parallel processes: process 1 rolls back one transaction, process 2 rolls back a second transaction, and so on. The threshold is the point at which parallel recovery becomes cost-effective, in other words, when parallel recovery takes less time than serial recovery.

Oracle® Database Reference 11g Release 2 (11.2)
Part Number E25513-02

http://docs.oracle.com/cd/E11882_01/server.112/e25513/initparams089.htm#REFRN10059

Oracle® Database Concepts 12c Release 1 (12.1)
E17633-21

http://docs.oracle.com/cd/E16655_01/server.121/e17633/startup.htm#CNCPT1301

Serial Transaction Recovery
This mode recovers the transaction sequentially. Many of the times, serial transaction recovery will be faster. Setting the FAST_START_PARALLEL_ROLLBACK parameter to false will enable the serial transaction recovery.

Detection of Transaction Recovery


Identify Dead Transactions and their Sizes

select ktuxeusn  USN, ktuxeslt Slot, ktuxesqn  Seq, ktuxesta State, ktuxesiz Undo 
from x$ktuxe  
where ktuxesta <> 'INACTIVE'  
and ktuxecfl like '%DEAD%' 
order by ktuxesiz  asc;

===========

ktuxeusn –  Undo Segment Number

ktuxeslt   –  Slot number

ktuxesqn –  Sequence

ktuxesta  –  State

ktuxesiz  –  Undo Blocks Remaining

ktuxecfl   –  Flag

=========== 

  USN       SLOT        SEQ         STATE          UNDO

-------    ----------    -------     ----------    ----------

     9            9         335         ACTIVE         10337

 

Identify undo segments containing dead transactions

select  useg.segment_name, useg.segment_id, useg.tablespace_name, useg.status
from dba_rollback_segs useg 
where useg.segment_id  in (select unique ktuxeusn 
                                        from x$ktuxe 
                                        where ktuxesta <> 'INACTIVE' 
                                        and ktuxecfl like '%DEAD%');

SEGMENT_NAME     SEGMENT_ID          TABLESPACE_NAME                STATUS

--------------------   -----------------     ---------------------------          ------------

_SYSSMU9$                     9                      UNDOTBS1                         ONLINE


select  usn,name 
from v$rollname 
where usn in (select unique ktuxeusn 
                    from x$ktuxe 
                    where ktuxesta <> 'INACTIVE' 
                    and ktuxecfl like '%DEAD%');

 

Identify the Time for transaction recovery to complete

select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo", 
         decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400))  "Estimated time to complete"  
from v$fast_start_transactions;

 

  USN      STATE               Total       Done      ToDo      Estimated time to Complete

-------- ----------------     --------   --------   --------    ---------------------------------

    5    RECOVERING        16207     14693      1514     11-may-2012 08:05:40

    4    RECOVERED          7453       7453         0        11-may-2012 08:05:16

    9    RECOVERED         10337      10337       0         11-may-2012 08:05:16

 

Query to identify the number of parallel Recovery Slaves

select * from v$fast_start_servers;

STATE                UNDOBLOCKSDONE        PID         XID

-----------          --------------------------    ---------- ----------------

RECOVERING              54                        18       001F005C00001BD6

RECOVERING               0                         20      001F005C00001BD6

RECOVERING               0                         21      001F005C00001BD6

RECOVERING               0                         22      001F005C00001BD6

Column STATE shows the state of the server being IDLE or RECOVERING. If only 1 process is in state RECOVERING while the other processes are in state IDLE, then you should disable Parallel Transaction Recovery. Normally these queries will show the progress of the transaction recovery.

If there is no progress in the number of blocks, then we can consider disabling the parallel recovery and let the recovery continue in serial transaction.

Identify Objects involved in Transaction Recovery


Data Dictionary Tables

select decode(px.qcinst_id,NULL,username,' - '||lower(substr(s.program,length(s.program)-4,4) ) ) "Username",
         decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
         to_char( px.server_set) "Slave Set",
         to_char(s.sid) "SID",
         decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID", 
         px.req_degree "Requested DOP", 
         px.degree "Actual DOP" 
from gv$px_sessionpx, gv$session s 
where px.sid=s.sid (+) 
and px.serial#=s.serial# 
order by 5 , 1 desc ;

 

ORA User       QC/Slav     Slave Set   SID   QC SID     Requested DOP    Actual DOP

----------      -----------     ----------  ------   --------   -------------------  -----------------

                       QC                          159     159    

 -            p007   (Slave)        1         137     159                 8                        8

 -            p006   (Slave)        1         138     159                 8                        8

 -            p005   (Slave)        1         135     159                 8                        8

 -            p004   (Slave)        1         132     159                 8                        8

 -            p003   (Slave)        1         136     159                 8                        8

 -            p002   (Slave)        1         130     159                 8                        8

 -            p001   (Slave)        1         133     159                 8                        8

 -            p000   (Slave)        1         145     159                 8                        8 

 

select distinct current_obj#
from v$active_session_history 
where SESSION_ID=145 
and SESSION_SERIAL#=15;   --  (This Query to be run multiple times and you should be able to see the Current Object id consistently)

 

 CURRENT_OBJ#

-------------------

       77211

 

select Object_name, object_type 
from dba_objects 
where object_id=77211;

 

OBJECT_NAME  OBJECT_TYPE

---------------  ----------------

EMPLOYEES          TABLE

 

Note: This method helps when the recovery is Parallel Transaction Recovery.

Dump the UNDO block

1.Identify the UNDO name using the undo segment name and transaction details.

select sysdate, b.name useg, a.ktuxeusn xidusn, a.ktuxeslt xidslot, a.ktuxesqn xidsqn 
from x$ktuxe a, undo$ b 
where a.ktuxesta = 'ACTIVE' 
and a.ktuxecfl like '%DEAD%' 
and a.ktuxeusn = b.us#;


2 Dump the Undo Block

Alter system dump undo block ‘<undo seg name>’ xid <xidusn> <xidslot> <xidsqn>;

Eg:

sqlplus / as sysdba
oradebug setmypid
Alter system dump undo block ‘_SYSSMU16$’ xid 16 21 1408186;
oradebug tracefile_name
exit

3 Upload the resultant Trace file to Oracle Support.
   This will help Oracle Support to review the UNDO dump and identify the Object ID.

 

Disable Transaction Recovery


  • Disabling Transaction recovery could temporarily open up the database.
  • SMON will stop performing all the transaction recovery.
  • There is no way to stop only one Transaction recovery / few transaction recoveries.
  • When the transaction recovery is disabled, any data modified by this dead transaction but required by other transactions will be recovered on "as needed" basis.
  • We always suggest to disable the transaction recovery only with the help of Oracle Support. 

NOTE: Oracle does not recommend to Disable the Transaction recovery and open the Database for Business Use.

Suggestions


  • Check the long running Transactions before shutting down the Database
  • DBA and Application team to be aware of any long running Jobs.
  • Plan for the downtime and inform application team accordingly so that they do not schedule business critical / long running jobs.
  • Do not Panic. Transaction recovery is part of Database normal Operations

Information to collect for Oracle Support


1) Complete Database Alert log (at least logs from last 2sucessful startups)

2) SMON Trace file (if available)

3) Undo Block dump Trace file (if taken).

4) Output of following Queries (3 outputs in intervals of at least 5 min)

=============

spool /tmp/transaction_recovery.html
set markup html on
set time on
alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss';
select sysdate from dual;
show parameter fast_start_parallel_rollback
select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State,ktuxesiz Undo from x$ktuxe where ktuxesta <> 'INACTIVE' and ktuxecfl like '%DEAD%' order by ktuxesiz asc;
select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo", decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) 
"Estimated time to complete" from v$fast_start_transactions; 
select * from v$fast_start_servers;
select ktuxeusn, to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Time", ktuxesiz, ktuxesta from x$ktuxe where ktuxecfl = 'DEAD';
set markup html off
spool off

=============

Additional Resources


Note 238507.1 How to Disable Parallel Transaction Recovery When Parallel Txn Recovery is Active 
Note 144332.1 Parallel Rollback may hang database, Parallel query servers get 100% cpu 
Note 414242.1 Database Hangs Because SMON Is Taking 100% CPU Doing Transaction Recovery
Note 416823.1 Diagnoing BUG 5016142 enqueue ENQ - SS contention 
Note 461480.1 FAQ – Automatic Undo Management (AUM) / System Managed Undo (SMU)
Note 266159.1 SMON: ABOUT TO RECOVER UNDO SEGMENT %s messages in alert log 
Note 1060831.6 Rollback Is Taking A Long Time After Session Killed 
Note 11790175.8 Bug 11790175 - SMON spins in rollback of LOB due to double allocated block like bug 11814891 
Note 8240762.8 Bug 8240762 - Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] / SMON may spin to recover transaction 
Note 11902008.8 Bug 11902008 - SMON may crash with ORA-00600 [kcbgcur_3] or ORA-600 [kcbnew_3] during Transaction recovery

 

REFERENCES

NOTE:1506115.1  - Master Note: Oracle Transaction Management (Local) Overview
NOTE:1505155.1  - Master Note: Overview of Database Startup and Shutdown
NOTE:68932.1  - Init.ora Parameter "FAST_START_PARALLEL_ROLLBACK" Reference Note
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值