kill session 后 SMON rollback MMON(AWR, schedual jobs) MMNL介绍

Symptoms

A large transaction was killed.

Transaction Recovery is stuck and the database can appear to be hanging or hung:

select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;

      USN STATE                 Total       Done       ToDo Estimated time to complete            
---------- ---------------- ---------- ---------- ---------- -----------------------------         
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:35                                               
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:32:07 <<<<<<<<<<                                                                                                             
GNTOUN35>/

      USN STATE                 Total       Done       ToDo Estimated time to complete         
---------- ---------------- ---------- ---------- ---------- ----------------------------- 
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:39                                                   
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:33:33 <<<<<<<<<<<<<<<                      
GNTOUN35>/

      USN STATE                 Total       Done       ToDo Estimated time to complete    
---------- ---------------- ---------- ---------- ---------- -----------------------------             
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:40   
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:33:54   <<<<< see no movement for this

 
SMON trace file shows:

service name: SYS$BACKGROUND
   Current Wait Stack:
    0: waiting for 'row cache lock'
       cache id=0x3, mode=0x0, request=0x5
       wait_id=44059 seq_num=44060 snap_id=1
       wait times: snap=4 min 31 sec, exc=4 min 31 sec, total=4 min 31 sec
       wait times: max=infinite, heur=4 min 31 sec
       wait counts: calls=92 os=92
       in_wait=1 iflags=0x15a2
   There is at least one session blocking this session.
     Dumping 1 direct blocker(s):
       inst: 1, sid: 1, ser: 1
     Dumping final blocker:
       inst: 1, sid: 1, ser: 1                     <=======
   Wait State:
     fixed_waits=0 flags=0x2a boundary=0x35cd32608/0
   Session Wait History:
       elapsed time of 0.000121 sec since current wait

 

Hanganalyze shows:

Chain 1:
-------------------------------------------------------------------------------
   Oracle session identified by:
   {
               instance: 1 X1
                  os id: 26192
             process id: 14, oracle@XXXX (SMON)
             session id: 1345
       session serial #: 1
   }
   is waiting for 'row cache lock' with wait info:
   {
                     p1: 'cache id'=0x3
                     p2: 'mode'=0x0
                     p3: 'request'=0x5
           time in wait: 51 min 58 sec
          timeout after: never
                wait id: 44188
               blocking: 0 sessions
            current sql: <none>
            short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semtimedop()+10
                         <-skgpwwait()+156<-ksliwat()+1821<-kslwaitctx()+162<-kqrget()+851<-kqrLockAndPinPo()+630
                         <-kqrpre1()+765<-kqrpre()+19<-kturRecoverActiveTxns()+3677<-ktprbeg()+313<-ktmmon()+13593
                         <-ktmSmonMain()+206<-ksbrdp()+923<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266
                         <-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
           wait history:
             * time between current wait and wait #1: 0.000064 sec
             1.       event: 'db file sequential read'
                time waited: 0.001194 sec
                    wait id: 44187           p1: 'file#'=0x299
                                             p2: 'block#'=0xef544
                                             p3: 'blocks'=0x1
             * time between wait #1 and #2: 0.000138 sec
             2.       event: 'db file sequential read'
                time waited: 0.000609 sec
                    wait id: 44186           p1: 'file#'=0x299
                                             p2: 'block#'=0xef547
                                             p3: 'blocks'=0x1
             * time between wait #2 and #3: 0.000164 sec
             3.       event: 'db file sequential read'
                time waited: 0.003346 sec
                    wait id: 44185           p1: 'file#'=0x299
                                             p2: 'block#'=0xef54a
                                             p3: 'blocks'=0x1
   }
   and is blocked by
=> Oracle session identified by:
   {
               instance: 1 (X1)
                  os id: 26196
             process id: 16, oracle@XXXX (MMON)
             session id: 1                     <===============
       session serial #: 1                     <===============

Changes

Cause

Transaction recovery was blocked by MMON process and the hanganalyze trace file shows the call stack details:

ktucloGetTxTableUndoRec()+879<-ktucloGetMinPrepStartTime()+453<-ktucloUsMinScn()


If ktucloGetTxTableUndoRec is present in the call stack then it indicates that cleanout optimization feature is enabled and going on
 

Solution

Identify the current value of _smu_debug_mode.
Note it down.


Disable the cleanout optimization feature by setting the following parameter:

  _smu_debug_mode=134217728    (which is 0x8000000)


Dynamically you can do this via:

  alter system set "_smu_debug_mode"=134217728;


Then identify the pid of MMON and kill it using:

  kill -9 <pid of mmon>


Note that killing mmon will not terminate the instance and the new mmon process which starts will use _smu_debug_mode=134217728.


Once the transaction recovery is finished, return _smu_debug_mode to the previous value.

Symptoms

  • MMON process died and subsequent AWR reports are therefore not getting generated
  • ps -ef |grep mmon shows that MMON is not running:

    orcl2:/home/oracle>ps -ef |grep mmon
    oracle    2708  2689  0 14:03 pts/0    00:00:00 grep mmon

  • MMON process should automatically be respawned but it is not
  • Alert log shows following messages:

    IPC Send timeout detected. Sender: ospid 5964414 [oracle@rdcgtsdelad03p (MMON)]
    Receiver: inst 2 binc 1478883330 ospid 3212084
    minact-scn: got error during useg scan e:12751 usn:11
    minact-scn: useg scan erroring out with error e:12751
    IPC Send timeout detected. Sender: ospid 5964414 [oracle@rdcgtsdelad03p (MMON)]

Cause

MMON/MMNL not being respawned or recreated is due to following unpublished bug:

Bug 19565533 - MMON/MMNL CAN NOT BE AUTOMATICALLY RESTARTED AFTER KILLED

The issue only occurs on 11.2.0.4 and is a regression of the fix for Bug 14213856
 

Solution

To solve the issue:

  • Apply interim patch 19565533 for 11.2.0.4, if available for your platform.

    To check for conflicting patches, please use the MOS Patch Planner Tool:
     

    Note 1317012.1 - How To Use MOS Patch Planner To Check And Request The Conflict Patches?


    If no patch exists for your version, please contact Oracle Support for a backport request.

    - OR -
     
  • As a workaround, respawn the MMON process by following methods:
     
    • Restart the database instance

      - OR -
       
    • Manageability Monitor Processes

      The Manageability Monitor Processes (MMON and MMNL) perform various background tasks that maintain the different data required to automate manageability functions.

    • The manageability monitor process (MMON) performs tasks related to the Automatic Workload Repository (AWR). This includes gathering statistical information on the database, collecting AWR snapshots, initiating various automated maintenance jobs(包不包含用户定义的jobs) and generating warning messages for tasks that have exceeded their threshold values.
    • The manageability monitor lite process (MMNL) performs tasks related to Active Session History (ASH).
    • For more details of their functions see:

      Oracle Database Online Documentation 12c Release 1 (12.1) / Database Administration
      Database Concepts
      Chapter 15 Process Architecture
      Manageability Monitor Processes (MMON and MMNL)
      Process Architecture

      Both Processes will consume some CPU at various points in the day depending on how much AWR/ASH activity is in progress. The amount is variable based upon the activity being performed but will typically be < 5% of the total CPU.

      Perceived Problems Associated with (but not caused by) MMON/MMNL

      Since they are always running, MMON/MMNL may issue warnings if the detect criteria exceeding certain threshold values. Alert log messages such as the following may be generated that reference or write information to MMON/MMON worker traces:

      minact-scn: useg scan erroring out with error e:12751

      minact-scn: got error during useg scan e:1555 usn:10
      minact-scn: useg scan erroring out with error e:1555

      Or

      Errors in file /home/oracle/admin/diag/rdbms/trace/<SID>_m000_8331.trc:
      ORA-12751: cpu time or run time policy violation

      In the vast majority of cases, these are not usually problems with MMON and are typically an indicator of a problem elsewhere on the system. In these cases, investigate the error message and it’s cause as opposed to MMON itself which is simply relaying the message.

      Additionally, being as MMNL, MMON and it's slaves are background processes, they are susceptible to trigger errors such as "ORA-00445: Background Process "xxxx" Did Not Start After n Seconds" simply because they detect problems rather than initiating them. For approaches to alleviate this sort of problem see:

      Document 1379200.1 Troubleshooting Guide (TSG) - Ksvcreate: Process(xxxx) Creation Failed / ORA-00445: Background Process "xxxx" Did Not Start After n Seconds

      Set the instance to "restricted session" mode, then bring it back to normal mode using following commands as SYSDBA:
        

      alter system enable restricted session;
      alter system disable restricted session;

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.4 and later
Information in this document applies to any platform.

Symptoms

- V$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY are empty.
- V$ASH and X$ASH are empty.
- The parameter statistics_level is set to TYPICAL.
- The parameter _ASH_ENABLE=true and _ASH_SIZE is set to Non-zero.
- MMON and MMNL are running. their trace files
- No errors in database Alert log.

Cause

 The parameter CONTROL_MANAGEMENT_PACK_ACCESS is set to None.

Solution

Set CONTROL_MANAGEMENT_PACK_ACCESS to 'DIAGNOSTIC+TUNING' using below command:

sqlplus / as sysdba
alter system set control_management_pack_access='DIAGNOSTIC+TUNING';

References


NOTE:1529808.1 - Database Not Generating ASH Reports

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值