kill session 后一直在回滚中的处理 os kill 以及后续rollback

jnrjian

已于 2024-07-14 14:18:39 修改

阅读量933

点赞数 20

文章标签： oracle

于 2024-01-04 14:39:42 首次发布

本文链接：https://blog.csdn.net/jnrjian/article/details/135383670

版权

本文讨论了Oracle数据库中处理已杀死会话时可能出现的undosegment问题，以及在遇到资源忙错误时如何找到锁定原因并解决，包括在OS级别和使用ALTERSYSTEMKillSession的差异。

摘要由CSDN通过智能技术生成

APPLIES TO:

Oracle Database - Enterprise Edition - Version 19.17.0.0.0 and later
Information in this document applies to any platform.

SYMPTOMS

USERNAME SID SERIAL# UNDO_BLK USED_UNDO_RECORD SEGMENT_NAME RSSIZE STATUS
-------------------- ----- ------- ---------- ---------------- ------------------------------------------------------------------------------------------ ---------- ---------------------------------------------
TDS 811 37426 1 1 _SYSSMU5_991949268$ 67231744 ONLINE

CHANGES

None

CAUSE

The session remained in its KILLED state as associated OS SPID still present

SOLUTION

By finding the OS spid you can then force the process to be killed at OS level which will clear the session holding undo rollback segment.

Should Sessions be Killed in OS or Using Alter System Kill Session? (Doc ID 161794.1)

编辑To Bottom

APPLIES TO:

Oracle Database - Enterprise Edition - Version 8.1.7.0 to 11.2.0.2 [Release 8.1.7 to 11.2]
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.

GOAL

Should Sessions be killed in OS or using alter system Kill Session?

SOLUTION

fix:

Do not kill the sessions at the OS level.

Use ALTER SYSTEM KILL SESSION 'sid, serial#';   

This acts in the following manner:

1. It terminates a session, rolls back ongoing transactions, releases all session locks, frees all session resources.

2. If the session is performing some activity that must be completed (eg waiting for a reply from a remote database or rolling back a transaction), Oracle waits for this activity to complete, kills the session then returns control.

3).If the wait lasts for 60 seconds then Oracle marks the session to be killed, and returns control with a message that the session is marked to be killed. It then gets killed when the activity is complete.

When trying to drop or truncate a table, or create or alter an index of a table, you receive the following error message:

ORA-00054: resource busy and acquire with NOWAIT specified
Cause: Resource interested is busy
Action: Retry if necessary

SOLUTION

You need to analyze who has a lock on the table being dropped/truncated, or on the table
whose index is being created/alter-ed, and if it is valid. Do the following:

1. Go to svrmgrl and connect internal.

2. select object_id from dba_objects where object_name='<tablename>';

3. select * from v$locked_object where object_id=<id number);
Note the "oracle_username" and "session_id".

4. Or you can query v$access

select sid from v$access where owner='<table owner>' and object='<table name>';
Note the session id number or "sid".

5. select sid, serial#, command, taddr from v$session where sid=<session id number>;

6. Now you have identified the user and what they are doing.

Investigation into the validity of this session needs to be made. Many times it may be a job
that ran before or a hanging query.
If it is determined that this session needs to be terminated, go to step 7, or else wait until
the user has completed the action.
To find out what they are doing, look at the command number in the COMMAND column and match it
with the Command Number Values Table in either the Oracle 7 or 8 Server Reference manual listed
under the entry for "V$SESSION".

7. To terminate the session:
alter system kill session '<sid, serial#>';

8. The session should now be killed and the lock SHOULD release. Rechecking "v$locked_object" will tell you this.
If the lock does not immediately release, there may be a rollback occuring. To check this, goto step nine,
else dropping the table should work now.

9. To check for rollback:
select used_ublk from v$transaction where ADDR=<value from TADDR in v$session>;

If there is a value there, this is the number of undo blocks used by the transaction. Wait one minute and again
select "used_ublk" from "v$transaction" where ADDR=<value from TADDR in v$session>;.
Note the value. If it is decreasing, a rollback is occuring and based on the difference between these values,
you can "guesstimate" the time required to complete the rollback. For example, if the first query returns a value
of 80000 and the second one returns 70000, it took 1 minute to rollback 10000 blocks.
Based on this number, you can guestimate the time to complete the rollback. In this case, it would be 7 minutes.

10. In some cases, you might need to kill the session at the OS level as well. Some 3rd party database monitoring
software and certain unexplained situations will cause the lock to remain despite not showing up in "v$locked_object"
after killing the session.

Explanation
-----------
Dropping or truncating a table requires you to acquire an exclusive lock on the table. A table is a "busy" resource
if there are other sessions modifying or holding a lock on the same table. Furthermore, drop user cascade can hit this error
as it is implicitly dropping objects that might be referenced by other users.

Doing a CREATE INDEX, DROP INDEX and ALTER INDEX REBUILD will lock the table with a Share Table Lock (S). No DML operations
are permitted on the base table. Note, this is true for pre-8.1. For 8.1, if the ONLINE key word is included as part of the
CREATE INDEX or ALTER INDEX....REBUILD commands, the table is locked with a Row Share Table Lock (RS). Other users will be
able to continue to access the table as normal except for any DDL operations. DML operations are permitted.

--------------------操作系统上kill session，释放lock，但是rollback还是要做的------------

Database Hangs Because SMON Is Taking 100% CPU Doing Transaction Recovery (Doc ID 414242.1)

After killing a large running transaction, either by killing the process or aborting the database, the database may appear to hang and/or the SMON background process is taking all the available cpu. Basically SMON is in the progress of rolling back this terminated large transaction, which can be time consuming. As a result of this users often try to restart the database, but because a shutdown immediate will hang, the database is often aborted. Please note that stopping the database will actually make things worse and will not decrease the amount of work SMON needs to perfom to complete the rollback.

The message 'Waiting for smon to disable tx recovery' will be posted in the alert log as well.

You may also see alert log messages:

SMON: about to recover undo segment 12
SMON: mark undo segment 12 as available
SMON: about to recover undo segment 12
*** 2009-11-12 10:08:11.389
SMON: mark undo segment 12 as available

Status column of v$fast_start_transactions is showing as Recovering with long date to complete.

SQL> select * from V$FAST_START_TRANSACTIONS ;

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL PID CPUTIME PARENTUSN PARENTSLT PARENTSEQ XID PXID RCVSERVERS CON_ID
---------- ---------- ---------- ---------------- -------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------------- ---------------- ---------- ----------
7 25 40890 RECOVERING 454 107652 941149 07001900BA9F0000 0 0

SOLUTION

First, we need to determine the progress SMON is making. After killing the shadow process or aborting the database the v$transaction entry is lost, so you cannot estimate by examining v$transaction.used_ublk to see how the rollback procedure is proceeding.

The following queries are available to monitor the progress of the transaction recovery:

  set linesize 100 
  alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS'; 
  select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
         decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete" 
  from v$fast_start_transactions;

Run the above query several times in a row, this will give you a good idea on how SMON is progressing.

- In some versions the cputime does not work (always 0), hence the estimated completion time will not be displayed
- In some cases the v$fast_start_transactions view will not work. If this is the case then you can query the internal data dictionary view x$ktuxe; the 'ktuxesiz' column represents the remaining number of undo blocks required for rollback:

   select ktuxeusn, to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Time", ktuxesiz, ktuxesta 
   from x$ktuxe 
   where ktuxecfl = 'DEAD';

The following outlines a few action that can be taken in order to speed up SMON recovery of the large transaction.

1. There are cases where parallel transaction recovery is not as fast as serial transaction recovery, because the pq secondary are interfering with each other. To check the Parallel Recovery processes and their state, run the following query:

     select * from v$fast_start_servers;

Column STATE shows the state of the server being IDLE or RECOVERING, if only 1 process is in
    state RECOVERING while the other processes are in state IDLE, then you should disable
    Parallel Transaction Recovery. How to do this is outlined in the following note:

     note 238507.1: How to Disable Parallel Transaction Recovery When Parallel Txn Recovery is Active

    If all the processes are in state RECOVERING, then you can benefit from adding more processes:

   SQL> alter system set fast_start_parallel_rollback = high;

This will create parallel servers as much as 4 times the number of CPUs.

2. Increase the parameter '_cleanup_rollback_entries' to 400. The default is 100.

This parameter cannot be changed dynamically, so in order to change this the database will need to be restarted. As outlined above this is not advisable in the current situation, but there is basically no alternative.

If your database is using spfile

To change to 400, this step needs database restart
alter system set "_cleanup_rollback_entries"=400 scope=spfile;

To change back to default or unset this parameter, This step also needs database restart
ALTER SYSTEM RESET "_cleanup_rollback_entries" scope=spfile;

Do not use "_cleanup_rollback_entries"=0, this will not unset/reset or it does not change it back to default, 0 is not the default value. Setting 0 may cause database stuck in rollback.

If the database is is going to be restarted, then you best also specify

fast_start_parallel_rollback=false

If your database is using spfile

alter system set fast_start_parallel_rollback=false scope=both;

3. As a last resort, if the above has been verified, we can disable SMON from performing the recovery at this time. This will allow the database to operate as normal, and you can choose to let the recovery take place at a later time when it is more convenient. Because the clean-up of the large transaction is actually put on hold it is important, and I cannot stress this enough, to disable this event again to let SMON finish its work.

  init.ora:

    event="10513 trace name context forever, level 2"

NOTE:
You can also dynamically prevent SMON to perform the transaction recovery, for instance when you want to cancel a long running update without doing a rollback (at that time).
However, please note that the rollback has still to be done at a later time to assure a consistent state of the database.

You can postpone the transaction recovery to a more convenient time as below:

-- Determine the SMON process id:

SQL> select pid, program from v$process where program like '%SMON%';

PID PROGRAM
---------- ------------------------------------------------
6 oracle@stsun7 (SMON)

-- Disable SMON transaction cleanup:

SQL> oradebug setorapid <SMON's Oracle PID>

SQL> oradebug event 10513 trace name context forever, level 2

-- To enable the transaction recovery again:

SQL> oradebug setorapid <SMON's Oracle PID>
SQL> oradebug event 10513 trace name context off