目前就职海天起点,服务于电力行业,致力于帮助客户解决生产过程中出现的问题,提高生产效率, 爱好书法,周易!愿结交志同道合之士!共同进步! 微信号:sunyunyi_sun
用户报告UNDO表空间100%
SYS@SID1> @undo
UNDO Extent Use Info:
Undo Tablespace Undo extent Total
Name Status Size(GB)
-------------------- -------------------- ----------
UNDOTBS12 ACTIVE 305.06 --大量活动undo blk
EXPIRED 4.33 --过期很少4G
UNEXPIRED 77.19 --未过期77G
UNDOTBS2 EXPIRED 1.14
UNEXPIRED .86
Who Use More Undo Segment:
Inst Undo exten
id USERNAME SQL_ID XIDUSN XIDSLOT XIDSQN START_TIME Status USE_UNDO_MB
---- ------------ --------------- ---------- ---------- ---------- ------------------ ---------- -----------
1 EXC30 gpv003502at19 85 17 2099886 11/10/17 11:40:06 ACTIVE 36046.7734
事物占用undo:36046.7734,第一次查询50G
SYS@SID1> @sessx
Current Session Info:
USERNAME SID SQL_ID EVENT KPID STATE SEQ# MODULE
------------ ------ --------------- ---------------------------- -------------------- ------------ ---------- ----------
EXC30 884 gpv003502at19 db file sequential read ! kill -9 184816 WAITED SHORT 60298 DBMS_SCHED
SYS@SID1> @tracepro
input sid :
884
Oradebug for dump process
Press ENTER to continue, or CTRL+C to cancel
Oracle pid: 247, Unix process pid: 184816, image: oracle@JChost1 (J003)
/u01/app/oracle/diag/rdbms/TEST/SID1/trace/SID1_j003_184816.trc
! more /u01/app/oracle/diag/rdbms/TEST/SID1/trace/SID1_j003_184816.trc
*** 2017-11-13 22:08:57.983
*** MODULE NAME
DBMS_SCHEDULER) 2017-11-13 22:08:57.983 --JOB
************************************************************
ORA-30036 DIAGNOSTIC
This diagnostic information is dumped to trace file at
most once every 24 hours, it does not indicate any error.
************************************************************
ORA-30036 happens when trying to extend undo segment _SYSSMU85_62393027$ (usn=85) by 8 blocks
Reason: Race with other extends
Current undo tablespace UNDOTBS12 (tsn=24)
undo tablespace current size 56852480 blks, maxsize 56852480 blks, fixed sized
Undo Retention (reactive):28800, Max Query Length:222713
Parameter Undo Retention:28800, Tuned Undo Retention:28800, High threshold Undo Retention:-2 autotune:0
Retention Guarantee FALSE
Current Time is 1510582137
oerr ora 30036
30036, 00000, "unable to extend segment by %s in undo tablespace '%s'"
// *Cause: the specified undo tablespace has no more space available.
// *Action: Add more space to the undo tablespace before retrying
// the operation. An alternative is to wait until active
// transactions to commit.
--没有可用的UNDO 块,导致JOB报错
Segment Information Summary
==============================
Avaialable Segments: 9 Inuse Segments: 435 Offline Segments: 0
Partiallly Available segments 0, Segments needs Recovery 0
Total Active Blocks: 39893984 Total Unexpired Blocks: 16952608 Total Expired Blocks: 4096
Total Unexpired Blocks (mql/user): 16952608 Total Expired Blocks (mql/user): 4096
过期的undo blk 只有 4096 导致事物无可用undo blk
Slave ID: 3, Job ID: 1693948
----------------------------------------
SO: 0x1000ebed20, type: 4, owner: 0x1010c4ae48, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x1010c4ae48, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 884 ser: 18539 trans: 0xfae3243b8, creator: 0x1010c4ae48
flags: (0x310141) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
DID: , short-term DID:
txn branch: (nil)
edition#: 100 oct: 2, prv: 0, sql: 0x10556fe950, psql: 0x10556fe950, user: 165/EXC30
ksuxds FALSE at location: 0
service name: SYS$USERS
client details:
O/S info: user: oracle, term: UNKNOWN, ospid: 184816
machine: DB1 program: oracle@JChost1 (J003)
application name: DBMS_SCHEDULER, hash value=2478762354
action name: JL_VERI_CHK102, hash value=2872163425
Current Wait Stack:
0: waiting for 'db file sequential read'
file#=0xad, block#=0x238e97, blocks=0x1
wait_id=276752480 seq_num=63914 snap_id=1
wait times: snap=0.053029 sec, exc=0.053029 sec, total=0.053029 sec
wait times: max=infinite, heur=0.053029 sec
wait counts: calls=0 os=0
in_wait=1 iflags=0x5a0
Wait State:
fixed_waits=0 flags=0x2a boundary=0x1000ec0c08/0
Session Wait History:
elapsed time of 0.000205 sec since current wait
0: waited for 'db file sequential read'
file#=0x2ce, block#=0x277146, blocks=0x1
wait_id=276752479 seq_num=63913 snap_id=1
wait times: snap=0.000486 sec, exc=0.000486 sec, total=0.000486 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000184 sec of elapsed time
1: waited for 'db file sequential read'
file#=0x2ce, block#=0x277186, blocks=0x1
wait_id=276752478 seq_num=63912 snap_id=1
wait times: snap=0.000339 sec, exc=0.000339 sec, total=0.000339 sec
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000169 sec of elapsed time
历史等待大量的db file sequential read 等待事件,估计不合理的索引导致低效的SQL
SYS@SID1> @sess
Current Session Info:
USERNAME SID SQL_ID SQL_TEXT EVENT MODULE STATE
------------ ------ --------------- ---------------------------- ---------------------------- ---------- ------------
EXC30 884 gpv003502at19 INSERT INTO DW_DATA_CHK_ERR_ db file sequential read DBMS_SCHED WAITING
@sqlplan
NOTE: cannot fetch plan for SQL_ID: gpv003502at19, CHILD_NUMBER: 0
Please verify value of SQL_ID and CHILD_NUMBER;
It could also be that the plan is no longer in cursor cache (check v$sql_plan)
取7天前的SQAWR:
SQL ID: gpv003502at19 DB/Inst: TEST/SID1 Snaps: 24979-25108
-> 1st Capture and Last Capture Snap IDs
refer to Snapshot IDs witin the snapshot range
-> INSERT INTO DW_DATA_CHK_ERR_MODIFY_DET( DET_ID,RULE_NO,RECHK_TIME,PK_V...
Plan Hash Total Elapsed 1st Capture Last Capture
# Value Time(ms) Executions Snap ID Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1 3568901301 510,671,408 586 24980 25073
-------------------------------------------------------------
Plan 1(PHV: 3568901301)
-----------------------
Plan Statistics DB/Inst: TEST/SID1 Snaps: 24979-25108
-> % Total DB Time is the Elapsed Time of the SQL statement divided
into the Total Database Time multiplied by 100
Stat Name Statement Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms) 5.1067E+08 871,452.9 12.3
CPU Time (ms) 4.0520E+08 691,463.1 17.2
Executions 586 N/A N/A
Buffer Gets 1.3780E+10 23,514,954.1 5.1
Disk Reads 1.7388E+07 29,672.0 0.6
Parse Calls 586 1.0 0.0
Rows 3.5749E+08 610,048.6 N/A
User I/O Wait Time (ms) 4.3945E+07 N/A N/A
Cluster Wait Time (ms) 1.8669E+07 N/A N/A
Application Wait Time (ms) 0 N/A N/A
Concurrency Wait Time (ms) 4.9640E+07 N/A N/A
Invalidations 0 N/A N/A
Version Count 94 N/A N/A
Sharable Mem(KB) 7,825 N/A N/A
-------------------------------------------------------------
Execution Plan
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart|
------------------------------------------------------------------------------------------------------------------------
| 0 | INSERT STATEMENT | | | | 14 (100)| | |
| 1 | LOAD TABLE CONVENTIONAL | | | | | | |
| 2 | SEQUENCE | SEQ_DW_DATA_CHK_ERR_MODIFY_DET | | | | | |
| 3 | VIEW | | 2 | 102 | 14 (0)| 00:00:01 | |
| 4 | UNION-ALL | | | | | | |
| 5 | FILTER | | | | | | |
| 6 | PARTITION HASH ALL | | 1 | 78 | 4 (0)| 00:00:01 | 1 |
| 7 | HASH JOIN | | 1 | 78 | 4 (0)| 00:00:01 | |
| 8 | PARTITION RANGE SINGLE| | 1 | 39 | 2 (0)| 00:00:01 | KEY |
| 9 | TABLE ACCESS FULL | DW_DATA_CHK_ERR_RST_DET | 1 | 39 | 2 (0)| 00:00:01 | KEY |
| 10 | PARTITION RANGE SINGLE| | 1 | 39 | 2 (0)| 00:00:01 | KEY |
| 11 | TABLE ACCESS FULL | DW_DATA_CHK_ERR_RST_VERI_DET | 1 | 39 | 2 (0)| 00:00:01 | KEY |
| 12 | TABLE ACCESS FULL | BH_DATA_CHK_FIX_VERI_SET | 1 | 3 | 3 (0)| 00:00:01 | |
| 13 | FILTER | | | | | | |
| 14 | FILTER | | | | | | |
| 15 | PARTITION RANGE SINGLE | | 1 | 39 | 2 (0)| 00:00:01 | KEY |
| 16 | PARTITION HASH ALL | | 1 | 39 | 2 (0)| 00:00:01 | 1 |
| 17 | TABLE ACCESS FULL | DW_DATA_CHK_ERR_RST_VERI_DET | 1 | 39 | 2 (0)| 00:00:01 | KEY |
| 18 | TABLE ACCESS FULL | BH_DATA_CHK_FIX_VERI_SET | 1 | 3 | 3 (0)| 00:00:01 | |
| 19 | FILTER | | | | | | |
| 20 | PARTITION RANGE SINGLE | | 1 | 39 | 2 (0)| 00:00:01 | KEY |
| 21 | PARTITION HASH SINGLE | | 1 | 39 | 2 (0)| 00:00:01 | KEY |
| 22 | TABLE ACCESS FULL | DW_DATA_CHK_ERR_RST_DET | 1 | 39 | 2 (0)| 00:00:01 | KEY |
------------------------------------------------------------------------------------------------------------------------
报告显示SQL执行效率高,执行计划数据没有问题(统计信息为最新)!
看看绑定变量:
col bind1 for a15
col bind2 for a15
col bind3 for a15
select INSTANCE_NUMBER,snap_id,FETCHES_DELTA,SORTS_DELTA,IOWAIT_DELTA,
dbms_sqltune.extract_bind(bind_data,1).value_string bind1,
dbms_sqltune.extract_bind(bind_data,2).value_string bind2,
dbms_sqltune.extract_bind(bind_data,2).value_string bind3
from dba_hist_sqlstat
where sql_id='&sql_id'
order by snap_id
/
gpv003502at19
INSTANCE_NUMBER SNAP_ID FETCHES_DELTA SORTS_DELTA IOWAIT_DELTA BIND1 BIND2 BIND3
--------------- ---------- ------------- ----------- ------------ --------------- --------------- ---------------
1 25014 0 0 759128939 20171101 2-2-1056 2-2-1056
2 25015 0 0 83413 20171101 2-2-207 2-2-207
1 25015 0 0 708142959 20171101 2-2-1056 2-2-1056
1 25016 0 0 718190767 20171101 2-2-1056 2-2-1056
1 25017 0 0 737617063 20171101 2-2-1056 2-2-1056
1 25018 0 0 726242088 20171101 2-2-1056 2-2-1056
1 25019 0 0 726928366 20171101 2-2-1056 2-2-1056
1 25020 0 0 632079010 20171101 2-2-1056 2-2-1056
1 25021 0 0 463302236 20171101 2-2-1056 2-2-1056
1 25022 0 0 464269331 20171101 2-2-1056 2-2-1056
1 25023 0 0 428744841 20171101 2-2-1056 2-2-1056
1 25024 0 0 466369583 20171101 2-2-1056 2-2-1056
1 25025 0 0 467891412 20171101 2-2-1056 2-2-1056
1 25026 0 0 592077427 20171101 2-2-1056 2-2-1056
1 25027 0 0 541214648 20171101 2-2-1056 2-2-1056
1 25028 0 0 427257352 20171101 2-2-1056 2-2-1056
1 25029 0 0 469078173 20171101 2-2-1056 2-2-1056
1 25030 0 0 563461411 20171101 2-2-1056 2-2-1056
1 25031 0 0 281144016 20171101 2-2-1056 2-2-1056
1 25032 0 0 29596 20171101 2-2-1056 2-2-1056
1 25033 0 0 51745 20171101 2-2-1056 2-2-1056
1 25034 0 0 22780 20171101 2-2-1056 2-2-1056
1 25035 0 0 33318 20171101 2-2-1056 2-2-1056
1 25036 0 0 11368 20171101 2-2-1056 2-2-1056
1 25037 0 0 24782 20171101 2-2-1056 2-2-1056
基本都是查询一个月数据,绑定变量无巨大变化。
看看统计信息:
Table Number Empty Chain Global Sample Date
Name of Rows Blocks Blocks Count Stats Size MM-DD-YYYY
------------------------------ ------------------ -------- ------------ -------- ------ ------------------ ----------
DW_DATA_CHK_ERR_RST_VERI_DET 96,920,216 ######## 0 0 YES 96,920,216 11-11-2017
Column Column Distinct Number Number Global Sample Date
Name Details Values Buckets Nulls Stats Size MM-DD-YYYY
------------------------------ ------------ ------------ ------- ---------- ------ ------------------ ----------
RESULT_DET_ID NUMBER(24,0) 96,920,216 1 0 YES 96,920,216 11-11-2017
RULE_NO VARCHAR2(16) 255 254 0 YES 5,843 11-11-2017
ORG_NO VARCHAR2(16) 823 254 4,233,222 YES 5,560 11-11-2017
APPROVE_TIME VARCHAR2(8) 45 36 0 YES 5,843 11-11-2017
....
PK_VALUE NUMBER(24,0) 14,190,592 1 144 YES 96,920,072 11-11-2017
Index B Tree Distinct Number Cluster Global Sample Date
Name Level Keys of Rows Factor Stats Size MM-DD-YYYY
------------------------- ------ -------------- ------------------ ------------ ------ ------------------ ----------
SYS_C00121514 3 96,524,380 96,524,380 22,542,878 YES 300,853 11-11-2017
IND_COMP_ERR_RST_VERI_DET 3 85,085,692 92,085,663 34,381,590 YES 142,686 11-11-2017
_N
Index Column Col Column
Name Name Pos Details
------------------------- ------------------------------ ---- ------------
IND_COMP_ERR_RST_VERI_DET_N RULE_NO 1 VARCHAR2(16)
APPROVE_TIME 2 VARCHAR2(8)
ORG_NO 3 VARCHAR2(16)
PK_VALUE 4 NUMBER(24,0)
SYS_C00121514 RESULT_DET_ID 1 NUMBER(24,0)
从统计信息看索引选择率太低下,所以实际走的是分区全表扫描,那为什么大量等待db file sequential read 事件?维护索引?
dump 等待的块:
Current Wait Stack:
waiting for 'db file sequential read'
file#=0x2ce, block#=0x277186, blocks=0x1 --历史大量的该文件等待
select to_number('2ce','xxxxx') from dual 718
select to_number('277186','xxxxxxxxxxx') from dual 2584966
alter system dump datafile 718 block 2584966;
0x12
ri[0] sfll=1
0x14
ri[1] sfll=2
0x16
ri[2] sfll=3
0x18
ri[3] sfll=4
0x1a
ri[4] sfll=5
0x1c
ri[5] sfll=6
0x1e
ri[6] sfll=7
确实是索引块
select count(*) from dba_extents where segment_name like 'IND_COMP_ERR_RST_VERI_DET_N%' AND FILE_ID=718;
COUNT(*)
----------
70 --正是索引IND_COMP_ERR_RST_VERI_DET_N
两个表都是按照APPROVE_TIME字段每个月一个分区,每次查询都是一个月数据下面的索引没有实际意思,且选择率很低,
大量数据维护索引成本太高!
建议drop,优化SQL
drop index user.IND_COMP_ERR_RST_DET_N
drop index user.IND_COMP_ERR_RST_VERI_DET_N
处理步骤:
1:停止 job,该JOB已经执行两天。
2:drop 索引
drop index user.IND_COMP_ERR_RST_DET_N
drop index user.IND_COMP_ERR_RST_VERI_DET_N
3:启动JOB
停止 job 后发现大量的wait for a undo record 事件:
USERNAME SID SQL_ID EVENT KPID STATE MODULE
------------ ------ --------------- ---------------------------- -------------------- ------------ ----------
SYS 1654 g283nvkrnga7c SQL*Net message to client ! kill -9 66882 WAITED SHORT sqlplus@YX
1557 wait for a undo record ! kill -9 69011 WAITING
2420 wait for a undo record ! kill -9 68520 WAITING
2789 wait for a undo record ! kill -9 68522 WAITING
....大量
------------
USERNAME SID SQL_ID EVENT KPID STATE SEQ# MODULE
------------ ------ --------------- ---------------------------- -------------------- ------------ ---------- ----------
SYS 1654 g283nvkrnga7c SQL*Net message to client ! kill -9 66882 WAITED SHORT 10754 sqlplus@YX
529 db file parallel write ! kill -9 31728 WAITED SHORT 6972
561 db file parallel write ! kill -9 31736 WAITING 11572
577 db file parallel write ! kill -9 31738 WAITING 32779
593 db file parallel write ! kill -9 31740 WAITING 49633
因为JOB执行的是insert操作,kill job 导致恢复事物操作
oracle 开启了并行事物恢复
sho parameter fast_start_parallel_rollback
fast_start_parallel_rollback = LOW
可以禁用该功能:
fast_start_parallel_rollback = false
观察UNDO:一直存在304G的活动undo
UNDO Extent Use Info:
Undo Tablespace Undo extent Total
Name Status Size(GB)
-------------------- -------------------- ----------
UNDOTBS12 ACTIVE 304.39
EXPIRED 5.56
UNEXPIRED 85.50
UNDOTBS2 EXPIRED 1.20
UNEXPIRED .80
等待事物恢复完成。
UNDO 段信息,发现存在一个300GB的undo 段状态为活动,系统当前没有大事物,如下查询结果,那么该事物就是需要恢复的事物。
Tablespace
Name SEGMENT_NAME STATUS UNDO_SEGMENT_SIZE_MB
-------------------- ------------------------------ ---------- --------------------
UNDOTBS12 _SYSSMU310_2265507800$ ACTIVE 15
UNDOTBS12 _SYSSMU442_1480812537$ ACTIVE 24
UNDOTBS12 _SYSSMU85_62393027$ ACTIVE 311584.375
INST_ID XIDUSN XIDSLOT XIDSQN START_TIME STATUS USE_UNDO_MB
---------- ---------- ---------- ---------- ------------------------------------------------------------ ---------- -----------
1 281 11 11569 11/15/17 16:16:37 ACTIVE 10.53125
1 301 19 10677 11/15/17 16:16:46 ACTIVE .765625
1 467 21 2540754 11/15/17 16:16:47 ACTIVE .265625
1 231 3 10785 11/15/17 16:16:44 ACTIVE 15.921875
1 342 32 10841 11/15/17 16:16:47 ACTIVE .03125
1 280 9 10882 11/15/17 16:16:46 ACTIVE .0078125
1 429 23 12436 11/15/17 16:16:45 ACTIVE .015625
我们dump 该undo段的事物表:
select HEADER_FILE,HEADER_BLOCK,BYTES/1024/1024/1024 SIZEGB from dba_segments where SEGMENT_NAME='_SYSSMU85_62393027$'
HEADER_FILE HEADER_BLOCK SIZEGB
----------- ------------ ----------
165 192 304.281616
alter system dump datafile 165 block 192
or alter system dump undo header '_SYSSMU85_62393027$'
事物表解释:共22个槽号,uel指向下一个槽号
活动事物:这个事物的 uel=0x5427 这是个非法的值没有这个槽号呀!
0x11 10 0x90 0x200aae 0x5427 0x0e16.b37b06be 0x2a5cd40d 0x0000.000.00000000 0x001e3c2a 0x00000000
Version: 0x01
FREE BLOCK POOL::
uba: 0x2b06effc.6e55.02 ext: 0x5426 spc: 0x1f06
uba: 0x00000000.6e4c.02 ext: 0x541d spc: 0x1f06
uba: 0x00000000.1875.0f ext: 0x12 spc: 0x1988
uba: 0x00000000.085c.01 ext: 0x46 spc: 0x12c0
uba: 0x00000000.b1c2.18 ext: 0x9 spc: 0xf52
TRN TBL::
index state cflags wrap# uel scn dba parent-xid nub stmt_num cmt
------------------------------------------------------------------------------------------------
0x00 9 0x00 0x8c146e 0x0005 0x0e16.ef250903 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x01 9 0x00 0x8c1b9f 0x0000 0x0e16.ef250901 0x2b06eff9 0x0000.000.00000000 0x00000003 0x00000000 1510582137
0x02 9 0x00 0x8c1adb 0x001d 0x0e16.ef250910 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x03 9 0x00 0x8c08a8 0x000d 0x0e16.ef2508ee 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x04 9 0x00 0x8c0bec 0x000e 0x0e16.ef2508d5 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x05 9 0x00 0x8bf699 0x000c 0x0e16.ef250905 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x06 9 0x00 0x8bd390 0x0019 0x0e16.ef2508f2 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x07 9 0x00 0x8be999 0x000b 0x0e16.ef250909 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x08 9 0x00 0x8bed92 0x001b 0x0e16.ef2508db 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x09 9 0x00 0x8c1bb2 0x0015 0x0e16.ef2508fa 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x0a 9 0x00 0x8c220c 0x001f 0x0e16.ef250928 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x0b 9 0x00 0x8c19b6 0x001a 0x0e16.ef25090b 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x0c 9 0x00 0x8c1c1b 0x0007 0x0e16.ef250907 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x0d 9 0x00 0x8be230 0x0006 0x0e16.ef2508f0 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x0e 9 0x00 0x8bfb14 0x001e 0x0e16.ef2508d7 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x0f 9 0x00 0x8beb11 0x0021 0x0e16.ef2508ea 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x10 9 0x00 0x8c1fc8 0x0018 0x0e16.ef250930 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x11 10 0x90 0x200aae 0x5427 0x0e16.b37b06be 0x2a5cd40d 0x0000.000.00000000 0x001e3c2a 0x00000000 0
0x12 9 0x00 0x8c261b 0x0014 0x0e16.ef25092c 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x13 9 0x00 0x8c2f62 0xffff 0x0e16.ef250935 0x2b06effc 0x0000.000.00000000 0x00000003 0x00000000 1510582137
0x14 9 0x00 0x8bd7a7 0x0010 0x0e16.ef25092e 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x15 9 0x00 0x8c152d 0x0020 0x0e16.ef2508fc 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x16 9 0x00 0x8bfad1 0x0009 0x0e16.ef2508f8 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x17 9 0x00 0x8bfd07 0x000a 0x0e16.ef250923 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x18 9 0x00 0x8c0b7e 0x0013 0x0e16.ef250932 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x19 9 0x00 0x8c05ce 0x001c 0x0e16.ef2508f4 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x1a 9 0x00 0x8c2172 0x0002 0x0e16.ef25090e 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x1b 9 0x00 0x8bf997 0x000f 0x0e16.ef2508e6 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x1c 9 0x00 0x8c1247 0x0016 0x0e16.ef2508f6 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x1d 9 0x00 0x8c0912 0x0017 0x0e16.ef250916 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x1e 9 0x00 0x8c2837 0x0008 0x0e16.ef2508d9 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x1f 9 0x00 0x8bea30 0x0012 0x0e16.ef25092a 0x2b06eff9 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x20 9 0x00 0x8bef67 0x0001 0x0e16.ef2508fe 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
0x21 9 0x00 0x8c0270 0x0003 0x0e16.ef2508ec 0x2b06eff6 0x0000.000.00000000 0x00000001 0x00000000 1510582137
EXT TRN CTL::
作为对比我把事物恢复完成后的事物表放在下面:从下面看 UEL 正常了!
index state cflags wrap# uel scn dba parent-xid nub stmt_num cmt
------------------------------------------------------------------------------------------------
0x00 9 0x00 0x8c1494 0x001a 0x0e17.1ec6559b 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822733
0x01 9 0x00 0x8c1bc6 0x0012 0x0e17.1ec7ccab 0x29b253c7 0x0000.000.00000000 0x00000147 0x00000000 1510822794
0x02 9 0x00 0x8c1b02 0x001d 0x0e17.1ec7c1b4 0x29b25288 0x0000.000.00000000 0x00000003 0x00000000 1510822791
0x03 9 0x00 0x8c08cf 0x0009 0x0e17.1ec6a5c9 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822749
0x04 9 0x00 0x8c0c13 0x0008 0x0e17.1ec62b65 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822726
0x05 9 0x00 0x8bf6c0 0x0011 0x0e17.1ec7c192 0x29b25284 0x0000.000.00000000 0x00000001 0x00000000 1510822791
0x06 9 0x00 0x8bd3b6 0x000a 0x0e17.1ec3322e 0x29b25279 0x0000.000.00000000 0x00000001 0x00000000 1510822672
0x07 9 0x00 0x8be9bf 0x000e 0x0e17.1ec29e34 0x29b25279 0x0000.000.00000000 0x00000001 0x00000000 1510822646
0x08 9 0x00 0x8bedb9 0x0000 0x0e17.1ec648ac 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822729
0x09 9 0x00 0x8c1bd9 0x000c 0x0e17.1ec6acfe 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822749
0x0a 9 0x00 0x8c2232 0x0016 0x0e17.1ec34c08 0x29b2527a 0x0000.000.00000000 0x00000002 0x00000000 1510822675
0x0b 9 0x00 0x8c19dd 0x000d 0x0e17.1ec74b49 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822772
0x0c 9 0x00 0x8c1c41 0x0015 0x0e17.1ec6c3d6 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822753
0x0d 9 0x00 0x8be257 0x0020 0x0e17.1ec75cc7 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822773
0x0e 9 0x00 0x8bfb3a 0x0017 0x0e17.1ec2a108 0x29b25279 0x0000.000.00000000 0x00000001 0x00000000 1510822646
0x0f 9 0x00 0x8beb38 0x0004 0x0e17.1ec540d6 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822725
0x10 9 0x00 0x8c1fee 0x0006 0x0e17.1ec2d454 0x29b25279 0x0000.000.00000000 0x00000001 0x00000000 1510822657
0x11 9 0x00 0x200ad4 0x0002 0x0e17.1ec7c1b1 0x29b25287 0x0000.000.00000000 0x00000001 0x00000000 1510822791
0x12 9 0x00 0x8c2641 0x0014 0x0e17.1ec7f99f 0x29b25288 0x0000.000.00000000 0x00000001 0x00000000 1510822799
0x13 9 0x00 0x8c2f88 0xffff 0x0e17.1eca2af0 0x29b25288 0x0000.000.00000000 0x00000001 0x00000000 1510822819
0x14 9 0x00 0x8bd7ce 0x001c 0x0e17.1ec811df 0x29b25288 0x0000.000.00000000 0x00000001 0x00000000 1510822803
0x15 9 0x00 0x8c1554 0x000b 0x0e17.1ec6f69f 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822760
0x16 9 0x00 0x8bfaf8 0x0019 0x0e17.1ec35eff 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822678
0x17 9 0x00 0x8bfd2b 0x0010 0x0e17.1ec2ac1e 0x29b25279 0x0000.000.00000000 0x00000001 0x00000000 1510822648
0x18 9 0x00 0x8c0ba4 0x000f 0x0e17.1ec4e0c0 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822702
0x19 9 0x00 0x8c05f3 0x001b 0x0e17.1ec3888e 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822684
0x1a 9 0x00 0x8c2198 0x0003 0x0e17.1ec66c63 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822736
0x1b 9 0x00 0x8bf9be 0x0018 0x0e17.1ec4ac90 0x29b2527a 0x0000.000.00000000 0x00000001 0x00000000 1510822695
0x1c 9 0x00 0x8c126e 0x0013 0x0e17.1ec83ce6 0x29b25288 0x0000.000.00000000 0x00000001 0x00000000 1510822811
0x1d 9 0x00 0x8c0939 0x001f 0x0e17.1ec7c1c5 0x29b25288 0x0000.000.00000000 0x00000001 0x00000000 1510822791
0x1e 9 0x00 0x8c285b 0x0007 0x0e17.1ec29a92 0x29b25279 0x0000.000.00000000 0x00000001 0x00000000 1510822645
0x1f 9 0x00 0x8bea57 0x0001 0x0e17.1ec7c1c8 0x29b2528c 0x0000.000.00000000 0x00000003 0x00000000 1510822791
0x20 9 0x00 0x8bef8e 0x0005 0x0e17.1ec772fd 0x29b2527b 0x0000.000.00000000 0x00000001 0x00000000 1510822776
0x21 9 0x00 0x8c0295 0x001e 0x0e17.1ebbeac3 0x29b25279 0x0000.000.00000000 0x00000001 0x00000000 151082263
继续
第二次dump
alter system dump datafile 165 block 192
活动事物:
0x11 10 0x90 0x200aae 0x5427 0x0e16.b37b06be 0x2b1123f9 0x0000.000.00000000 0x00195ce9 0x00000000 0
查看UNDO BLOCK: 0x2b1123f9
select to_number('2b1','xxxxxxx') from dual --689
select to_number('123f9','xxxxxxx') from dual --74745
alter system dump datafile 689 block 74745
Object id on Block? Y
seg/obj: 0xc855d csc: 0xe16.dbbb380b itc: 2 flg: E typ: 2 - INDEX
brn: 0 bdba: 0xac411f81 ver: 0x01 opc: 0
inc: 0 exflg: 0
select to_number('c855d','xxxxxxx') from dual
select owner,object_name,object_type from dba_objects where object_id=820573;
OWNER
--------------------------------------------------------------------------------
OBJECT_NAME
--------------------------------------------------------------------------------
OBJECT_TYPE
---------------------------------------------------------
FDPSHARE
ARC_IND_AECS_MSN
INDEX
并非我们期望的对象,看来被恢复的事物我们找不到了
看看awr报告:
发现13点后没有awr生成!!!
二号节点今天10点后也没有生成!!!
手动生成报告:
exec dbms_workload_repository.create_snapshot
/
很慢 出现 Reliable message 等待,但可以生成,刷新mmon进程发现系统还是不能正常生成AWR报告,好吧等待事物恢复完!
刷新mmon:
alter system set "_swrf_mmon_flush"=false;
alter system set "_swrf_mmon_flush"=true;
如果还不行kill mmon
查看手动生成的awr报告,96% 的wait for a undo record!
Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
wait for a undo record 7,233,333 571.6K 79 95.9 Other
禁用并行恢复,不要给让系统hang住
alter system set fast_start_parallel_rollback = false
继续:
分析session发现801等待seq#几乎没有变化,看来该进程是有问题的,dump 看看:
801 DFS lock handle ! kill -9 31766 WAITED SHORT 41466
------------
SYS> @tracepro
input sid :
801
Oradebug for dump process
Press ENTER to continue, or CTRL+C to cancel
Oracle pid: 50, Unix process pid: 31766, image: oracle@YXJCPTDB1 (SMON)
*** SERVICE NAME
SYS$BACKGROUND) 2017-11-15 10:29:18.513 后天进程SMON在进行事物恢复,找到了恢复事物的session
大量如下信息:
Parallel Transaction recovery caught exception 30319
Parallel Transaction recovery caught error 30319
在我关闭并行恢复后大量如下信息:
Serial Transaction recovery caught exception 601
查询MOS发现许多系统因为Transaction recovery caught exception 实例重启,看来这次还算庆幸呀!
查询ash:
select SQL_ID,event,SESSION_ID,BLOCKING_SESSION,sample_time
from v$active_session_history y
where y.sample_time > sysdate - 0.25/24 and y.WAIT_CLASS!='Idle' and BLOCKING_SESSION is not null
and y.event='DFS lock handle' order by sample_time desc
没有记录,证明没有session阻塞恢复session,安心等待吧,系统目前运行正常。
等待中......
再次检查udno,发现事物恢复完成了:
@undo
UNDO Extent Use Info:
Undo Tablespace Undo extent Total
Name Status Size(GB)
-------------------- -------------------- ----------
UNDOTBS12 ACTIVE .11
EXPIRED 5.11
UNEXPIRED 391.45
UNDOTBS2 EXPIRED .53
UNEXPIRED .83
_SYSSMU85_62393027$ 段已经不活动
Tablespace
Name SEGMENT_NAME STATUS UNDO_SEGMENT_SIZE_MB
-------------------- ------------------------------ ---------- --------------------
UNDOTBS12 _SYSSMU85_62393027$ UNEXPIRED 311615.375
总结:
到此该问题告一段落,事物恢复几乎花费了一天时间,该问题是因为JOB抽大量数据发起大事物占用undo一个段,
该段以前也存在大事物,最终导致该段占用310GB,导致事物报错无可用undo空间,分析JOB发现对象存在无用索引
加之job长时间运行,停止JOB导致系统花费大量时间恢复事物,默认系统开启并行恢复事物操作,系统出现读undo
恢复事物操作,数据库出现96%的wait for a undo record等待,通过禁用并行恢复事物系统压力恢复正常,
事物恢复完成系统正常运行。
所以遇到大事物操作尽量避免异常结束事物,有可能遇到SMON恢复事物遇到不可用
session state 信息导致实例宕机,或者遇到SMON恢复事物的BUG导致实例宕机。