问题描述:
客户反应对表做delete非常缓慢,sql很简单,也有走索引.
但是每次执行要1000多秒,单独的select count(1) from GEMINI_V3.prod_repair_sheet_detail2则非常快。
原因分析:
1、查看表的信息:
表不大,本身也有索引,而且索引都不大,也没有lob字段。
2、monitor分析
语句跑起来,看来绝大部分在CPU上。这个很诡异,表本身不大,物理IO也不高,很快才是。
观察CPU的使用情况,14167就是delete的会话进程,占了一个核的CPU, 使用基本100%。
3、等待事件
直接看下会话的等待情况:
SID S# SPID USERNAME MACHINE SQL_ID PREV_SQL_ID EVENT P123 P123TEXT WT LAST_CALL_ET SECONDS_IN_WAIT BLOCKING
------ ------- ---------- ---------- ---------------------- ------------------ ------------------ ------------------------------ ------------------ ----------------------------------- ---- ------------ --------------- ------------
997 57 14167 SYS database 5x59cnqhb93x3 32j3k52kwag60 Disk file operations I/O 2/3/2 FileOperation/fileno/filetype -1 18 17 :
wait 40s
SID S# SPID USERNAME MACHINE SQL_ID PREV_SQL_ID EVENT P123 P123TEXT WT LAST_CALL_ET SECONDS_IN_WAIT BLOCKING
------ ------- ---------- ---------- ---------------------- ------------------ ------------------ ------------------------------ ------------------ ----------------------------------- ---- ------------ --------------- ------------
997 57 14167 SYS database 5x59cnqhb93x3 32j3k52kwag60 Disk file operations I/O 2/3/2 FileOperation/fileno/filetype -1 59 58 :
Disk file operations I/O等待事件分析,竟然是大量undo表空间的IO。表本身并不大,为什么需要那么多undo?
SELECT distinct filetype_id, filetype_name FROM DBA_HIST_IOSTAT_FILETYPE where FILETYPE_ID=2
FILETYPE_ID FILETYPE_NAME
----------- ------------------------------
2 Data File
SELECT name FROM v$datafile WHERE file#=3;
NAME
------------------------------------------
/home/app/oradata/genvictdb/undotbs01.dbf
4、10046分析
没思路了,做个10046跟踪下。很有意思,有一条“ select /*+ all_rows */ count(1) from "GEMINI_V3"."PROD_REPAIR_TEST_RESULT" where "SHEET_DETAIL_ID" = :1”。
为什么删除A表的数据,要查B表的数据?难不成有外键?!!
*** 2020-10-19 14:52:15.015
WAIT #139665416019160: nam='SQL*Net message from client' ela= 51654897 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1603090335015370
CLOSE #139665416019160:c=0,e=17,dep=0,type=0,tim=1603090335015536
=====================
PARSING IN CURSOR #139665416019160 len=103 dep=0 uid=0 oct=7 lid=0 tim=1603090335018120 hv=2696187811 ad='a37e19770' sqlid='5x59cnqhb93x3'
delete from GEMINI_V3.prod_repair_sheet_detail2 where sheet_id = 'a8c8f190-d078-4cf2-b996-d6848f6b7ba9'
END OF STMT
PARSE #139665416019160:c=1999,e=2502,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=854670735,tim=1603090335018119
WAIT #139665416019160: nam='Disk file operations I/O' ela= 119 FileOperation=2 fileno=3 filetype=2 obj#=-1 tim=1603090335018949
=====================
PARSING IN CURSOR #139665416121272 len=104 dep=1 uid=0 oct=3 lid=0 tim=1603090335020122 hv=3610118026 ad='a47f9e058' sqlid='830yptrbkw1wa'
select /*+ all_rows */ count(1) from "GEMINI_V3"."PROD_REPAIR_TEST_RESULT" where "SHEET_DETAIL_ID" = :1
END OF STMT
PARSE #139665416121272:c=1000,e=658,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=0,tim=1603090335020121
BINDS #139665416121272:
Bind#0
oacdty=01 mxl=128(50) mxlc=00 mal=00 scl=00 pre=00
oacflg=12 fl2=0001 frm=01 csi=852 siz=128 off=0
kxsbbbfp=7f066383e2f0 bln=128 avl=32 flg=05
value="2a9411d563b544ab9a4e116bf3784f53"
EXEC #139665416121272:c=1000,e=982,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=1814594818,tim=1603090335021254
*** 2020-10-19 14:52:15.194
FETCH #139665416121272:c=172974,e=173229,p=0,cr=11430,cu=0,mis=0,r=1,dep=1,og=1,plh=1814594818,tim=1603090335194518
STAT #139665416121272 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=11430 pr=0 pw=0 time=173232 us)'
STAT #139665416121272 id=2 cnt=0 pid=1 pos=1 obj=88515 op='TABLE ACCESS FULL PROD_REPAIR_TEST_RESULT (cr=11430 pr=0 pw=0 time=173212 us cost=1989 size=33 card=1)'
CLOSE #139665416121272:c=0,e=3,dep=1,type=3,tim=1603090335194819
BINDS #139665416121272:
Bind#0
oacdty=01 mxl=128(50) mxlc=00 mal=00 scl=00 pre=00
oacflg=12 fl2=0001 frm=01 csi=852 siz=128 off=0
kxsbbbfp=7f066383e2f0 bln=128 avl=32 flg=05
value="614ee7ac0a8a462aae931d56d5d7b7ae"
EXEC #139665416121272:c=0,e=121,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1814594818,tim=1603090335195298
FETCH #139665416121272:c=90986,e=91291,p=0,cr=11429,cu=0,mis=0,r=1,dep=1,og=1,plh=1814594818,tim=1603090335286616
CLOSE #139665416121272:c=0,e=5,dep=1,type=3,tim=1603090335286826
BINDS #139665416121272:
Bind#0
oacdty=01 mxl=128(50) mxlc=00 mal=00 scl=00 pre=00
oacflg=12 fl2=0001 frm=01 csi=852 siz=128 off=0
kxsbbbfp=7f066383e2f0 bln=128 avl=32 flg=05
value="5fdd2fa148cb4c8ea403971ddcfb93e0"
EXEC #139665416121272:c=0,e=153,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1814594818,tim=1603090335287401
FETCH #139665416121272:c=89986,e=89983,p=0,cr=11429,cu=0,mis=0,r=1,dep=1,og=1,plh=1814594818,tim=1603090335377440
5、外键分析
很显示GEMINI_V3.PROD_REPAIR_TEST_RESULT是子表,GEMINI_V3.PROD_REPAIR_SHEET_DETAIL2是父表。而子表在外键字段SHEET_DETAIL_ID上并没有创建索引。
原因很清晰了,由于子表外键字段没有索引。导致父表每一行的delete,子表都要扫描一次。父表1203150行 * 子表748073行的扫描,当然非常慢!
从AWR中也可以看到这张表的逻辑读非常高。这也造成我前面从执行计划、monitor都看起来正常,分析上走不少弯路。
父表和子表都不大,物理IO一次后读到数据库buffer中,在内存中进行多次扫描,造成了大量逻辑读!
解决方案:
子表的外键字段必须建索引!!
创建索引后,速度有质的飞越
14:52:13 sys@GENVICTD > delete from GEMINI_V3.prod_repair_sheet_detail2 where sheet_id = 'a8c8f190-d078-4cf2-b996-d6848f6b7ba9';
11256 rows deleted.
Elapsed: 00:16:51.89
15:09:06 sys@GENVICTD > rollback;
Rollback complete.
Elapsed: 00:00:00.50
15:16:58 sys@GENVICTD >
15:17:00 sys@GENVICTD >
15:17:00 sys@GENVICTD > delete from GEMINI_V3.prod_repair_sheet_detail2 where sheet_id = 'a8c8f190-d078-4cf2-b996-d6848f6b7ba9';
11256 rows deleted.
Elapsed: 00:00:00.92