快速定位原因
查看操作系统负载
登上数据库服务器后,第一个就是通过系统命令确认下CPU、内存、I/O是否异常,每个系统的命令不一样,常见的有top、topas、vmstat、iostat。
查看等待事件
第二步就是连到数据库查看活动的等待事件,这是监控、巡检、诊断数据库最基本的手段,通常81%的问题都可以通过等待事件初步定为原因,它是数据库运行情况最直接的体现,如下脚本是查看每个等待事件的个数、等待时长,并排除了一些常见的IDLE等待事件。
--墨天轮 wait_event
col event for a45
SELECT inst_id,EVENT, SUM(DECODE(WAIT_TIME, 0, 0, 1)) "Prev", SUM(DECODE(WAIT_TIME, 0, 1, 0)) "Curr", COUNT(*) "Tot" , sum(SECONDS_IN_WAIT) SECONDS_IN_WAIT
FROM GV$SESSION_WAIT
WHERE event NOT
IN ('smon timer','pmon timer','rdbms ipc message','SQL*Net message from client','gcs remote message')
AND event NOT LIKE '%idle%'
AND event NOT LIKE '%Idle%'
AND event NOT LIKE '%Streams AQ%'
GROUP BY inst_id,EVENT
ORDER BY 1,5 desc
这里就需要掌握一些常见异常等待事件的原因,并形成条件反射,比如library cache lock、read by other session、row cache lock、buffer busy waits、latch:shared pool、gc buffer busy、cursor: pin S on X、direct path read、log file sync、enq: TX - index contention、PX Deq Credit: send blkd、latch free、enq: TX - row lock contention等等,如果异常等待事件的个数和等待时间很长,那么排查原因的入口就在这里。
根据等待事件查会话
得到异常等待事件之后,我们就根据等待事件去查会话详情,也就是查看哪些会话执行哪些SQL在等待,另外还查出来用户名和机器名称,以及是否被阻塞。另外如下脚本可改写成根据用户查会话、根据SQL_ID查会话等等。
--墨天轮 session_by_event
SELECT /*+rule */ sid, s.serial#, spid, event, sql_id, seconds_in_wait ws, row_wait_obj# obj,
s.username, s.machine, BLOCKING_INSTANCE||'.'||blocking_session b_sess
FROM v$session s, v$process p
WHERE event='&event_name' AND s.paddr = p.addr order by 6;
查询某个会话详情
得到会话列表之后,可以根据如下SQL查询某个会话的详细信息,如上次个执行的SQL_ID,登录时间等,该SQL也可改写成多个。
--墨天轮 session_by_sid
SELECT s.sid, s.serial#, spid, event, sql_id, PREV_SQL_ID, seconds_in_wait ws, row_wait_obj# obj,
s.username, s.machine, module,blocking_session b_sess,logon_time
FROM v$session s, v$process p
WHERE sid = '&sid' AND s.paddr = p.addr;
查询对象信息
从前面两个SQL都可以看到会话等待的对象ID,可以通过如下SQL查询对象的详细信息。
--墨天轮 obj_info
col OBJECT_NAME for a30
select owner,object_name,subobject_name,object_type
from dba_objects
where object_id=&oid;
查询SQL语句
根据SQL_ID、HASH_VALUE查询SQL语句。如果v$sqlarea中查不到,可以尝试DBA_HIST_SQLTEXT视图中查询。
--墨天轮 sql_text
select sql_id,SQL_fullTEXT
from v$sqlarea
where (sql_id='&sqlid' or hash_value=to_number('&hashvale') )
and rownum<2;
关于SQL语句的执行计划、对象的统计信息、性能诊断、跟踪SQL等这里就不展开,后面计划出一个类似的系列,敬请关注。
查询会话阻塞情况
通过如下SQL查询某个会话阻塞了多少个会话。
--墨天轮 blocking_sess
select count(*),blocking_session
from v$session
where blocking_session is not null
group by blocking_session;
查询数据库的锁
通过如下SQL查询某个会话的锁,有哪些TM、TX锁,以及会话和锁关联查询的SQL,注意这里指定了ctime大于100秒,30%的情况是人为误操作锁表,导致应用SQL被阻塞,无法运行。
--墨天轮 lock
set linesize 180
col username for a15
col owner for a15
col OBJECT_NAME for a30
col SPID for a10
--查询某个会话的锁
select /*+rule*/SESSION_ID,OBJECT_ID,ORACLE_USERNAME,OS_USER_NAME,PROCESS,LOCKED_MODE
from gv$locked_object where session_id=&sid;
--查询TM、TX锁
select /*+rule*/* from v$lock
where ctime >100 and type in ('TX','TM') order by 3,9;
--查询数据库中的锁
select /*+rule*/s.sid,p.spid,l.type,round(max(l.ctime)/60,0) lock_min,s.sql_id,s.USERNAME,b.owner,b.object_type,b.object_name
from v$session s, v$process p,v$lock l,v$locked_object o,dba_objects b
where o.SESSION_ID=s.sid and s.sid=l.sid and o.OBJECT_ID=b.OBJECT_ID
and s.paddr = p.addr and l.ctime >100 and l.type in ('TX','TM','FB')
group by s.sid,p.spid,l.type,s.sql_id,s.USERNAME,b.owner,b.object_type,b.object_name
order by 9,1,3;
保留现场证据
有的问题可能需要分析很长时间,或者是需要外部人员协助分析,那么保留现场证据就非常重要了,下面脚本是systemstate dump和hanganalyze步骤,如果有sqlplus无法登陆的情况,可以加-prelim参数。
--systemstate dump
sqlplus -prelim / as sysdba
oradebug setmypid
oradebug unlimit;
oradebug dump systemstate 266;
--wait for 1 min
oradebug dump systemstate 266;
--wait for 1 min
oradebug dump systemstate 266;
oradebug tracefile_name;
--hanganalyze
oradebug setmypid
oradebug unlimit;
oradebug dump hanganalyze 3
--wait for 1 min
oradebug dump hanganalyze 3
--wait for 1 min
oradebug dump hanganalyze 3
oradebug tracefile_name
杀会话
通常情况下,初步定为问题后为了快速恢复业务,需要去杀掉某些会话,特别是批量杀会话,有时还会直接kill所有LOCAL=NO的进程,再杀会话时一定要检查确认,更不能在别的节点或者别的服务器上执行。
--墨天轮 kill_sess
set line 199
col event format a35
--杀某个SID会话
SELECT /*+ rule */ sid, s.serial#, 'kill -9 '||spid, event, blocking_session b_sess
FROM v$session s, v$process p WHERE sid='&sid' AND s.paddr = p.addr order by 1;
--根据SQL_ID杀会话
SELECT /*+ rule */ sid, s.serial#, 'kill -9 '||spid, event, blocking_session b_sess
FROM v$session s, v$process p WHERE sql_id='&sql_id' AND s.paddr = p.addr order by 1;
--根据等待事件杀会话
SELECT /*+ rule */ sid, s.serial#, 'kill -9 '||spid, event, blocking_session b_sess
FROM v$session s, v$process p WHERE event='&event' AND s.paddr = p.addr order by 1;
--根据用户杀会话
SELECT /*+ rule */ sid, s.serial#, 'kill -9 '||spid, event, blocking_session b_sess
FROM v$session s, v$process p WHERE username='&username' AND s.paddr = p.addr order by 1;
--kill所有LOCAL=NO进程
ps -ef|grep LOCAL=NO|grep $ORACLE_SID|grep -v grep|awk '{print $2}' |xargs kill -9
重启大法
如需要修改静态参数、内存等问题,需要重启数据库,(不要觉得重启很LOW,在很多情况下为了快速恢复业务经常使用这个从网吧里传出来的绝招),记住千万不要在这个时候死磕问题原因、当作课题研究,我们的首要任务是恢复业务。
--tail -f alert_.log
alter system checkpoint;
alter system switch logfile;
shutdown immediate;
startup
CRT按钮小技巧
另外介绍一个小技巧,就是把常用的脚本整理到SecureCRT的Button Bar中,只需要点一下设置好的button,就相当于直接执行相应的SQL语句,这样就不用每次粘贴复制执行,或者是把脚本上传到每个服务器上。不过不要设置DDL等操作性的button,以免误点。
image.png
以上就是遇到数据库问题用到的一些脚本,特别是应用反应慢、卡的情况,另外建议首先对脚本进行阅读然后再使用,还可以根据自己的环境改写,融会贯通,积累经验。
undo暴增后查询思路
1.查询undo active使用状态
select tablespace_name, status, round( sum( bytes ) / 1048576, 2 ) mb,
count(*) extent_count
from dba_undo_extents
group by tablespace_name, status
order by tablespace_name, status;
UNEXPIRED已经释放,在保留期内。无需手动处理,在undo表空间紧张时,undo会自动调整undo保留期的
select to_char(begin_time, 'hh24:mi:ss') BEGIN_TIME,
to_char(end_time, 'hh24:mi:ss') END_TIME,
maxquerylen,
nospaceerrcnt,
tuned_undoretention
from v$undostat;
最后一列就是undo的保留期,可以看到会根据使用情况自动调整。
2.确认占用的sid与使用大小
select s.sid, substr( s.program, 1, 15 ) program,
s.machine,
t.xidusn || '.' || t.xidslot || '.' || t.xidsqn tx_addr,
t.status, t.start_time, tbs.tablespace_name tbs_name,
round( t.used_ublk * tbs.block_size /1048576, 2 ) undo_size_mb,
t.used_urec
from v$transaction t, v$session s, v$parameter p, dba_tablespaces tbs
where t.ses_addr = s.saddr
and p.name = 'undo_tablespace'
and p.value = tbs.tablespace_name
order by t.used_ublk desc;
3.确认占用sql
SELECT S.SID,
S.USERNAME,
U.NAME,
Q.SQL_TEXT,
Q.HASH_VALUE,
T.UBABLK
FROM V$TRANSACTION T,
V$ROLLSTAT R,
V$ROLLNAME U,
V$SESSION S,
V$SQL Q
WHERE S.TADDR = T.ADDR
AND T.XIDUSN = R.USN
AND R.USN = U.USN
AND Q.HASH_VALUE =
DECODE(S.SQL_HASH_VALUE,
NULL, S.PREV_HASH_VALUE,
S.SQL_HASH_VALUE)
ORDER BY S.USERNAME;
本次模拟对一张10w数据的表进行update。通过以上信息可以看到,sys用户通过本地sqlplus登陆,执行updata语句。
但有时候会发现,明明active的空间已经很大,但是v$transaction却查不到,需要关注死事务的产生。
4. 死事务的查询
http://blog.itpub.net/22034023/viewspace-710505/
死事务出现在异常关闭数据库或者事务进程不正常结束,比如KILL -9,shutdown abort的情况下。即使是DBA也只能等待smon清理完成,没有好的办法。当然禁用smon这种方法,如:
oradebug setorapid ‘SMON’s Oracle PID’;
oradebug event 10513 trace name context forever, level 2
这种在生产中绝对是谨慎谨慎,除非有着非常深入的理解和实践。
当前数据库里的死事务可以通过查询内部表x$ktuxe来获得。
select ADDR,KTUXEUSN,KTUXESLT,KTUXESQN,KTUXESIZ from x$ktuxe where KTUXECFL='DEAD';
KTUXESIZ代表需要回滚的回滚块数。
死事务的回滚进程数可以通过参数fast_start_parallel_rollback来设置。
可以根据以下算法来粗略的估算回滚需要的时间,这里是小时:
declare
l_start number;
l_end number;
begin
select ktuxesiz
into l_start
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39; ---------这里根据实际数字来填写
dbms_lock.sleep(60); ---------可以缩小这个时间,但是太小,可能会导致误差较大
select ktuxesiz
into l_end
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39; ---------这里根据实际数字来填写
dbms_output.put_line('time cost Day:' ||
round(l_end / (l_start - l_end) / 60, 2));
end;
/
5.应急处理
•如果占用undo高的是执行不合适的sql导致,那么可以临时kill掉。事后也可以让开发针对sql的提交方式进行优化。
•如果是死事务,只能通过上述方法进行计算时间。无法手工干预。
•如果应急期间定位不到具体问题,那最紧要的就是扩容undo表空间,确保生产事务正常运行。
健康检查
set pages 1000
set line 1000
set feedback off
set serveroutput on
declare
l_start number;
l_end number;
mydate varchar2(200);
est_time varchar2(200);
v_pause_secs number;
begin
--Set the pause time for get 2 times change,default is 300 seconds
v_pause_secs:=600;
select sum(undoblockstotal-undoblocksdone) into l_start from v$fast_start_transactions where state='RECOVERING';
dbms_lock.sleep(v_pause_secs);
select sum(undoblockstotal-undoblocksdone) into l_end from v$fast_start_transactions where state='RECOVERING';
select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') into mydate from dual;
select
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),9,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),' ')-9)||'Days '||
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),12,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),':')-12)||'Hours '||
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),15,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),':',1,2)-15)||'Mins '||
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),18,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),'.',1,1)-18)||'Secs'
into est_time
from dual;
dbms_output.put_line(mydate||'==> Base on '||v_pause_secs||'Secs calculated, Estimate remaining trx rollback time is ......:'|| est_time);
end;
/
set pages 1000
set line 1000
set feedback off
set serveroutput on
declare
l_start number;
l_end number;
mydate varchar2(200);
est_time varchar2(200);
v_pause_secs number;
begin
--Set the pause time for get 2 times change,default is 30 seconds
v_pause_secs:=600;
select sum(ktuxesiz) into l_start from x$ktuxe where KTUXECFL ='DEAD';
dbms_lock.sleep(v_pause_secs);
select sum(ktuxesiz) into l_end from x$ktuxe where KTUXECFL ='DEAD';
select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') into mydate from dual;
select
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),9,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),' ')-9)||'Days '||
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),12,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),':')-12)||'Hours '||
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),15,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),':',1,2)-15)||'Mins '||
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),18,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),'.',1,1)-18)||'Secs'
into est_time
from dual;
dbms_output.put_line(mydate||'==> Base on '||v_pause_secs||'Secs calculated, Estimate remaining trx rollback time is ......:'|| est_time);
end;
/
Oralce undo健康检查脚本(这个脚本是基于进程没在os层面kill -9的情况下):
set pages 1000
set line 1000
set feedback off
set serveroutput on
-- UNDO Check script start
exec dbms_output.put_line('== >>>>>>>>>> UNDO CHECK SCRIPT START <<<<<<<<<< ==');
exec dbms_output.put_line(' ');
-- Check database version
PROMPT Checking database version......
PROMPT ============
select * from v$version;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check if flashback on
PROMPT Checking if flashback on......
PROMPT ============
select flashback_on from v$database;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check the undo parameter
PROMPT Checking the undo parameter......
PROMPT ============
show parameter undo;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check undo tablespace usage
PROMPT Checking undo tablespace usage......
PROMPT ============
SELECT d.status ,
d.tablespace_name ,
d.contents,
d.extent_management,
to_char(nvl(a.bytes / 1024 / 1024, 0), '99999999.999') as total_size_mb,
to_char(nvl(a.bytes - nvl(f.bytes, 0), 0) / 1024 / 1024,
'99999999.999') used_size_mb,
to_char(nvl(nvl(f.bytes, 0), 0) / 1024 / 1024, '99999999.999') free_size_mb,
to_char(nvl((a.bytes - nvl(f.bytes, 0)) / a.bytes * 100, 0),
'990.00') as used_percent
FROM sys.dba_tablespaces d,
(SELECT tablespace_name, SUM(bytes) bytes
FROM dba_data_files
GROUP BY tablespace_name) a,
(SELECT tablespace_name, SUM(bytes) bytes
FROM dba_free_space
GROUP BY tablespace_name) f
WHERE d.tablespace_name = a.tablespace_name(+)
AND d.tablespace_name = f.tablespace_name(+)
AND d.contents='UNDO';
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check rollback segment status
PROMPT Checking rollback segment status......
PROMPT ============
select tablespace_name,status,sum(bytes)/1024/1024 mb from DBA_UNDO_EXTENTS group by tablespace_name,status;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
--Check all roll segment online or offline status
PROMPT Checking Check all roll segment online or offline status......
PROMPT ============
select substr(segment_name,1,7) as rollname,status,count(*) from dba_rollback_segs
group by substr(segment_name,1,7),status
order by 1;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
--Check online roll segment status
PROMPT Checking roll segment online status......
PROMPT ============
SELECT
substr(name,1,7) as rollname,
status,count(*) as cnt
FROM v$rollstat, v$rollname
WHERE v$rollstat.usn=v$rollname.usn
group by
substr(name,1,7),status order by 1;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check TOP 20 roll segment extend status
PROMPT Checking TOP 20 roll segment extend status......
PROMPT ============
select * from (
SELECT
ds.segment_name "Seq Name",
ds.bytes "Bytes",
ds.blocks "Blocks",
ds.extents "Extents",
ds.initial_extent "Init Ext",
ds.next_extent "Next Ext",
ds.min_extents "Min Ext",
ds.max_extents "Max Ext"
FROM dba_segments ds
WHERE segment_type in ('ROLLBACK','TYPE2 UNDO') order by extents desc) where rownum<=20;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check undo header wait
PROMPT Checking Undo Header Waits......
PROMPT (Note:Need to wait 60Secs)
PROMPT ============
declare
s_num_rbs number;
s_header_wait number;
s_waits_per_rbs number;
e_num_rbs number;
e_header_wait number;
e_waits_per_rbs number;
delta_num_rbs number;
delta_header_wait number;
delta_waits_per_rbs varchar2(200);
s_mydate varchar2(200);
e_mydate varchar2(200);
v_pause_secs number;
begin
--Set the pause time for get 2 times change,default is 30 seconds
v_pause_secs:=60;
select to_char(sysdate,'hh24:mi:ss') into s_mydate from dual;
select COUNT(stat.USN) ,wait.Count,round(wait.Count/COUNT(stat.USN),4) into s_num_rbs,s_header_wait,s_waits_per_rbs from V$WAITSTAT wait, V$ROLLSTAT stat where stat.Status = 'ONLINE' and wait.Class = 'undo header' and stat.USN > 0 group by wait.Count;
dbms_lock.sleep(v_pause_secs);
select to_char(sysdate,'hh24:mi:ss') into e_mydate from dual;
select COUNT(stat.USN),wait.Count,round(wait.Count/COUNT(stat.USN),4) into e_num_rbs,e_header_wait,e_waits_per_rbs from V$WAITSTAT wait, V$ROLLSTAT stat where stat.Status = 'ONLINE' and wait.Class = 'undo header' and stat.USN > 0 group by wait.Count;
select (e_num_rbs-s_num_rbs),(e_header_wait-s_header_wait),to_char(round(nvl(decode((e_waits_per_rbs-s_waits_per_rbs),0,null,(e_waits_per_rbs-s_waits_per_rbs)),0),4),'fm999999990.999999999') into delta_num_rbs,delta_header_wait,delta_waits_per_rbs from dual;
dbms_output.put_line(s_mydate||'==> At start time:');
dbms_output.put_line('Number of Rollback segments:'||s_num_rbs||' ,Number of Undo header waits:'||s_header_wait||' ,Number of header wait per Rollback segment:'||s_waits_per_rbs);
dbms_output.put_line(e_mydate||'==> At end time:');
dbms_output.put_line('Number of Rollback segments:'||e_num_rbs||' ,Number of Undo header waits:'||e_header_wait||' ,Number of header wait per Rollback segment:'||e_waits_per_rbs);
dbms_output.put_line('==== Change during '||v_pause_secs||' Seconds:====');
dbms_output.put_line('Delta of Rollback segments:'||delta_num_rbs||' ,Delta of Undo header waits:'||delta_header_wait||' ,Delta of header wait per Rollback segment:'||delta_waits_per_rbs);
end;
/
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check latch: undo global data
PROMPT Checking TOP 20 "latch: undo global data"......
PROMPT ============
select * from (
select to_char(end_interval_time,'yyyy-mm-dd hh24:mi') as end_interval_time,
nvl(round((c.time_waited_micro - lag(time_waited_micro)
over(order by c.snap_id)) /
decode((c.total_waits - lag(total_waits) over(order by c.snap_id)),0,null,(c.total_waits - lag(total_waits) over(order by c.snap_id)))/1000,2),0) as avg_wait_time_ms
from dba_hist_system_event c, dba_hist_snapshot dd
where event_name = 'latch: undo global data' and c.instance_number=dd.instance_number
and c.snap_id = dd.snap_id and c.instance_number=1
and c.snap_id >= 1 and end_interval_time>=trunc(sysdate-7)
order by 1 desc)
where avg_wait_time_ms<>0 and rownum<=20;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check location of undo latch undo global data
PROMPT Checking location of undo latch undo global data
PROMPT ============
select * from v$latch_misses where SLEEP_COUNT>0 and parent_name like 'undo global data%' order by sleep_count desc;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check undo buffer busy wait in ASH
PROMPT Checking most 20 recently undo buffer busy wait in ASH......
PROMPT ============
select * from (
select to_char(sample_time,'yyyy-mm-dd hh24:mi:ss') as sample_time,
sql_id,event,TOP_LEVEL_SQL_ID,p1 as file_id,p2 as block_id, p3 as reason from v$active_session_history
where event='buffer busy waits'
and p1 in (select file_id from DBA_ROLLBACK_SEGS where segment_name<>'SYSTEM')
and sample_time>=trunc(sysdate-7)
order by sample_time desc)
where rownum<=20;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check most recent ORA-1555 count
PROMPT Checking most recent ORA-1555 count......
PROMPT ============
select * from (
select to_char(end_time,'yyyy-mm-dd hh24:mi:ss') end_time,
x.unexpiredblks,x.unxpblkrelcnt,x.unxpblkreucnt,
x.expiredblks,x.expblkrelcnt,x.expblkreucnt,
x.ssolderrcnt
from DBA_HIST_UNDOSTAT x
where x.ssolderrcnt>0
order by end_time desc)
where rownum<=20;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check the status of TOP 20 session which open transaction
PROMPT Checking the status of TOP 20 session which open transaction
PROMPT ============
PROMPT Check the status of TOP 20 session which open transaction......
select * from (
select b.sid,
b.SERIAL#,
b.USERNAME,
b.status as session_status,
a.STATUS as trx_status,
b.MACHINE,
b.sql_id,
a.START_TIME as trx_start_time,
a.USED_UBLK as used_undo_blks,
a.USED_UREC as used_undo_records,
a.START_UBAFIL as used_undo_file_id,
a.START_UBABLK as used_undo_block_id
from v$transaction a, v$session b
where a.ses_addr = b.saddr order by USED_UBLK desc)
where rownum<=20;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check TOP session current wait event
PROMPT Checking TOP session current wait event
PROMPT ============
select * from (
select b.sid,
b.SERIAL#, event,p1text,p1,p2text, p2
from v$transaction a, v$session b
where a.ses_addr = b.saddr order by USED_UBLK desc)
where rownum<=1;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check TOP session wait event history
PROMPT Checking TOP session wait event history
PROMPT ============
select * from (
select to_char(sample_time,'hh24:mi:ss') as sample_time,sid,serial#,sql_id,event,p1text,p1,p2text,p2 from v$active_session_history c,
(select * from (select b.sid,b.serial# from
v$transaction a, v$session b
where a.ses_addr = b.saddr
order by USED_UBLK desc) where rownum<=1) x
where c.SESSION_ID=x.sid and c.SESSION_SERIAL#=x.serial#
order by sample_time desc)
where rownum<=20;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check TOP session "db file sequential read" average wait time
PROMPT Checking TOP session "db file sequential read" average wait time......
PROMPT (Note:Need to wait 60Secs)
PROMPT ============
declare
v_top_used_ublk_sid number;
v_top_used_ublk_serial# number;
s_chktm varchar2(200);
s_tm_waited_micro number;
s_total_waits number;
s_avg_wait number;
e_chktm varchar2(200);
e_tm_waited_micro number;
e_total_waits number;
e_avg_wait number;
v_delta_avg_wait varchar2(200);
v_pause_secs number;
v_cycle_cnt number;
begin
--Set the pause time for get 2 times change,default is 3 seconds
v_pause_secs:=3;
v_cycle_cnt:=20;
dbms_output.put_line('TOP rolling back session "db file sequential read" avg_wait_time_ms is: ');
select sid,serial# into v_top_used_ublk_sid,v_top_used_ublk_serial# from (select b.sid,b.serial# from v$transaction a, v$session b where a.ses_addr = b.saddr order by USED_UBLK desc) where rownum<=1;
for i in 1 .. v_cycle_cnt loop
select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss'),time_waited_micro,total_waits,a.AVERAGE_WAIT into s_chktm,s_tm_waited_micro,s_total_waits,s_avg_wait from v$session_event a where a.sid=v_top_used_ublk_sid and event='db file sequential read';
dbms_lock.sleep(v_pause_secs);
select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss'),time_waited_micro,total_waits,a.AVERAGE_WAIT into e_chktm,e_tm_waited_micro,e_total_waits,s_avg_wait from v$session_event a where a.sid=v_top_used_ublk_sid and event='db file sequential read';
select to_char(round((e_tm_waited_micro-s_tm_waited_micro)/(e_total_waits-s_total_waits)/1000,2),'fm999999990.999999999') into v_delta_avg_wait from dual;
dbms_output.put_line(e_chktm||': '|| v_delta_avg_wait);
end loop;
end;
/
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check system level "db file sequential read" average wait time
PROMPT Checking Check system level "db file sequential read" average wait time......
PROMPT ============
select * from (
select 'SystemLevel_AllDatafile:'||event_name as event_name,to_char(end_interval_time,'yyyy-mm-dd hh24:mi') end_interval_time,
round((c.time_waited_micro - lag(time_waited_micro)
over(order by c.snap_id)) /
decode((c.total_waits - lag(total_waits) over(order by c.snap_id)),0,null,(c.total_waits - lag(total_waits) over(order by c.snap_id)))/1000,2) as avg_wait_time_ms
from dba_hist_system_event c, dba_hist_snapshot dd
where event_name = 'db file sequential read' and c.instance_number=dd.instance_number
and c.snap_id = dd.snap_id and c.instance_number=1
and c.snap_id >= 1 and end_interval_time>=trunc(sysdate-7)
order by 2 desc)
where rownum<=20;
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
-- Check the transaction rollback estimate time
PROMPT Checking the transaction rollback estimate time......
PROMPT (Note:Need to wait 60Secs)
PROMPT ============
declare
l_start number;
l_end number;
mydate varchar2(200);
est_time varchar2(200);
v_pause_secs number;
begin
--Set the pause time for get 2 times change,default is 30 seconds
v_pause_secs:=60;
select sum(ktuxesiz) into l_start from x$ktuxe x ,v$transaction b, v$session a where a.taddr=b.addr and x.ktuxeusn=b.XIDUSN and x.ktuxeslt=b.XIDSLOT and ktuxesqn=b.XIDSQN and a.status='KILLED' and b.STATUS='ACTIVE';
dbms_lock.sleep(v_pause_secs);
select sum(ktuxesiz) into l_end from x$ktuxe x ,v$transaction b, v$session a where a.taddr=b.addr and x.ktuxeusn=b.XIDUSN and x.ktuxeslt=b.XIDSLOT and ktuxesqn=b.XIDSQN and a.status='KILLED' and b.STATUS='ACTIVE';
select to_char(sysdate,'yyyy-mm-dd hh24:mi:ss') into mydate from dual;
select
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),9,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),' ')-9)||'Days '||
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),12,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),':')-12)||'Hours '||
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),15,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),':',1,2)-15)||'Mins '||
substr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),18,instr(to_char(numtodsinterval((l_end/((l_start -l_end)/v_pause_secs)),'second')),'.',1,1)-18)||'Secs'
into est_time
from dual;
dbms_output.put_line(mydate||'==> Base on '||v_pause_secs||'Secs calculated, Estimate remaining trx rollback time is ......:'|| est_time);
end;
/
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
exec dbms_output.put_line(' ');
exec dbms_output.put_line('== >>>>>>>>>> UNDO CHECK SCRIPT END <<<<<<<<<< ==');
temp表空间不足处理思路
从11g开始,ash视图新增TEMP_SPACE_ALLOCATED列,能看到不同时间段的增长情况。
select to_char(sample_time,'yyyymmdd hh24:mi:ss'),session_id,session_serial#,sql_id,max(TEMP_SPACE_ALLOCATED/1024/1024)
from v$active_session_history where con_id=3
and to_char(sample_time,'yyyymmdd hh24')='20200204 16'
group by to_char(sample_time,'yyyymmdd hh24:mi:ss'),session_id,session_serial#,sql_id order by 5;
测试环境资源有限,最后一列就是占用的临时表空间大小,可以看到不同时间段temp占用情况。实际生产中,可以看到几分钟内,某条sql_id占用急速上升。
查看sql_id内容:
select * from table(dbms_xplan.display_cursor(’&sqlid’));
执行计划中可以看到1300w行的group by
select SID,SERIAL#,USERNAME,MACHINE,PROGRAM from v$session where sid=72;
增加临时表空间和kill session 可以作为应急处理,解决问题可以考虑以下:
1 、设置合理的pga
2 、禁止业务高峰期执行未上线的sql
3 、优化sort sql