kill session 后 SMON rollback MMON(AWR， schedual jobs) MMNL介绍

jnrjian

已于 2024-09-11 12:36:10 修改

阅读量109

点赞数 2

文章标签： oracle

于 2024-09-10 17:18:29 首次发布

本文链接：https://blog.csdn.net/jnrjian/article/details/142104085

版权

Symptoms

A large transaction was killed.

Transaction Recovery is stuck and the database can appear to be hanging or hung:

select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;

      USN STATE                 Total       Done       ToDo Estimated time to complete
---------- ---------------- ---------- ---------- ---------- -----------------------------
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:35
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:32:07 <<<<<<<<<<
GNTOUN35>/

      USN STATE                 Total       Done       ToDo Estimated time to complete
---------- ---------------- ---------- ---------- ---------- -----------------------------
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:39
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:33:33 <<<<<<<<<<<<<<<
GNTOUN35>/

      USN STATE                 Total       Done       ToDo Estimated time to complete
---------- ---------------- ---------- ---------- ---------- -----------------------------
       90 RECOVERED             15669      15669          0 01-OCT-2012 05:52:40
       15 RECOVERING           174954       8137     166817 17-OCT-2012 12:33:54   <<<<< see no movement for this

SMON trace file shows:

service name: SYS$BACKGROUND
   Current Wait Stack:
    0: waiting for 'row cache lock'
       cache id=0x3, mode=0x0, request=0x5
       wait_id=44059 seq_num=44060 snap_id=1
       wait times: snap=4 min 31 sec, exc=4 min 31 sec, total=4 min 31 sec
       wait times: max=infinite, heur=4 min 31 sec
       wait counts: calls=92 os=92
       in_wait=1 iflags=0x15a2
   There is at least one session blocking this session.
     Dumping 1 direct blocker(s):
       inst: 1, sid: 1, ser: 1
     Dumping final blocker:
       inst: 1, sid: 1, ser: 1                     <=======
   Wait State:
     fixed_waits=0 flags=0x2a boundary=0x35cd32608/0
   Session Wait History:
       elapsed time of 0.000121 sec since current wait

Hanganalyze shows:

Chain 1:
-------------------------------------------------------------------------------
   Oracle session identified by:
   {
               instance: 1 X1
                  os id: 26192
             process id: 14, oracle@XXXX (SMON)
             session id: 1345
       session serial #: 1
   }
   is waiting for 'row cache lock' with wait info:
   {
                     p1: 'cache id'=0x3
                     p2: 'mode'=0x0
                     p3: 'request'=0x5
           time in wait: 51 min 58 sec
          timeout after: never
                wait id: 44188
               blocking: 0 sessions
            current sql: <none>
            short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semtimedop()+10
                         <-skgpwwait()+156<-ksliwat()+1821<-kslwaitctx()+162<-kqrget()+851<-kqrLockAndPinPo()+630
                         <-kqrpre1()+765<-kqrpre()+19<-kturRecoverActiveTxns()+3677<-ktprbeg()+313<-ktmmon()+13593
                       <-ktmSmonMain()+206<-ksbrdp()+923<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266
                       <-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
           wait history:
             * time between current wait and wait #1: 0.000064 sec
             1.       event: 'db file sequential read'
                time waited: 0.001194 sec
                    wait id: 44187           p1: 'file#'=0x299
                                             p2: 'block#'=0xef544
                                             p3: 'blocks'=0x1
             * time between wait #1 and #2: 0.000138 sec
             2.       event: 'db file sequential read'
                time waited: 0.000609 sec
                    wait id: 44186           p1: 'file#'=0x299
                                             p2: 'block#'=0xef547
                                             p3: 'blocks'=0x1
             * time between wait #2 and #3: 0.000164 sec
             3.       event: 'db file sequential read'
                time waited: 0.003346 sec
                    wait id: 44185           p1: 'file#'=0x299
                                             p2: 'block#'=0xef54a
                                             p3: 'blocks'=0x1
   }
   and is blocked by
=> Oracle session identified by:
   {
               instance: 1 (X1)
                  os id: 26196
             process id: 16, oracle@XXXX (MMON)
             session id: 1                     <===============
       session serial #: 1                     <===============

Changes

Cause

Transaction recovery was blocked by MMON process and the hanganalyze trace file shows the call stack details:

ktucloGetTxTableUndoRec()+879<-ktucloGetMinPrepStartTime()+453<-ktucloUsMinScn()

If ktucloGetTxTableUndoRec is present in the call stack then it indicates that cleanout optimization feature is enabled and going on

Solution

Identify the current value of _smu_debug_mode.
Note it down.

Disable the cleanout optimization feature by setting the following parameter:

_smu_debug_mode=134217728 (which is 0x8000000)

Dynamically you can do this via:

alter system set "_smu_debug_mode"=134217728;

Then identify the pid of MMON and kill it using:

kill -9 <pid of mmon>

Note that killing mmon will not terminate the instance and the new mmon process which starts will use _smu_debug_mode=134217728.

Once the transaction recovery is finished, return _smu_debug_mode to the previous value.

Symptoms

MMON process died and subsequent AWR reports are therefore not getting generated
ps -ef |grep mmon shows that MMON is not running:
orcl2:/home/oracle>ps -ef |grep mmon
oracle 2708 2689 0 14:03 pts/0 00:00:00 grep mmon
MMON process should automatically be respawned but it is not
Alert log shows following messages:
IPC Send timeout detected. Sender: ospid 5964414 [oracle@rdcgtsdelad03p (MMON)]
Receiver: inst 2 binc 1478883330 ospid 3212084
minact-scn: got error during useg scan e:12751 usn:11
minact-scn: useg scan erroring out with error e:12751
IPC Send timeout detected. Sender: ospid 5964414 [oracle@rdcgtsdelad03p (MMON)]

Cause

MMON/MMNL not being respawned or recreated is due to following unpublished bug:

Bug 19565533 - MMON/MMNL CAN NOT BE AUTOMATICALLY RESTARTED AFTER KILLED

The issue only occurs on 11.2.0.4 and is a regression of the fix for Bug 14213856

Solution

To solve the issue:

Apply interim patch 19565533 for 11.2.0.4, if available for your platform.

To check for conflicting patches, please use the MOS Patch Planner Tool:

Note 1317012.1 - How To Use MOS Patch Planner To Check And Request The Conflict Patches?

If no patch exists for your version, please contact Oracle Support for a backport request.

- OR -
As a workaround, respawn the MMON process by following methods:
- Restart the database instance
  
  - OR -
- Manageability Monitor Processes
  
  The Manageability Monitor Processes (MMON and MMNL) perform various background tasks that maintain the different data required to automate manageability functions.
- The manageability monitor process (MMON) performs tasks related to the Automatic Workload Repository (AWR). This includes gathering statistical information on the database, collecting AWR snapshots, initiating various automated maintenance jobs(包不包含用户定义的jobs) and generating warning messages for tasks that have exceeded their threshold values.
- The manageability monitor lite process (MMNL) performs tasks related to Active Session History (ASH).
- For more details of their functions see:
  
  Oracle Database Online Documentation 12c Release 1 (12.1) / Database Administration
  Database Concepts
  Chapter 15 Process Architecture
  Manageability Monitor Processes (MMON and MMNL)
  Process Architecture
  
  Both Processes will consume some CPU at various points in the day depending on how much AWR/ASH activity is in progress. The amount is variable based upon the activity being performed but will typically be < 5% of the total CPU.
  
  Perceived Problems Associated with (but not caused by) MMON/MMNL
  
  Since they are always running, MMON/MMNL may issue warnings if the detect criteria exceeding certain threshold values. Alert log messages such as the following may be generated that reference or write information to MMON/MMON worker traces:
  
  minact-scn: useg scan erroring out with error e:12751
  
  minact-scn: got error during useg scan e:1555 usn:10
  minact-scn: useg scan erroring out with error e:1555
  
  Or
  
  Errors in file /home/oracle/admin/diag/rdbms/trace/<SID>_m000_8331.trc:
  ORA-12751: cpu time or run time policy violation
  
  In the vast majority of cases, these are not usually problems with MMON and are typically an indicator of a problem elsewhere on the system. In these cases, investigate the error message and it’s cause as opposed to MMON itself which is simply relaying the message.
  
  Additionally, being as MMNL, MMON and it's slaves are background processes, they are susceptible to trigger errors such as "ORA-00445: Background Process "xxxx" Did Not Start After n Seconds" simply because they detect problems rather than initiating them. For approaches to alleviate this sort of problem see:
  
  Document 1379200.1 Troubleshooting Guide (TSG) - Ksvcreate: Process(xxxx) Creation Failed / ORA-00445: Background Process "xxxx" Did Not Start After n Seconds
  Set the instance to "restricted session" mode, then bring it back to normal mode using following commands as SYSDBA:
  
  alter system enable restricted session;
  alter system disable restricted session;

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.4 and later
Information in this document applies to any platform.

Symptoms

- V$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY are empty.
- V$ASH and X$ASH are empty.
- The parameter statistics_level is set to TYPICAL.
- The parameter _ASH_ENABLE=true and _ASH_SIZE is set to Non-zero.
- MMON and MMNL are running. their trace files
- No errors in database Alert log.