文章目录
缘起
早上cpu loading异常,做了一个1小时内的awr,显示top 1事件是sql*net more data to client,一小时内发生325765次之多
查找sql_text
awr 视图也显示造成这个等待事件的sql_id是ajsvu12v742r9
select EVENT,SQL_ID,COUNT(1) times_waited,SUM(TIME_WAITED) time_waited,round(SUM(TIME_WAITED)/COUNT(1),0) avg_time_wait from dba_hist_active_sess_history where to_char(SAMPLE_TIME,'YYYY-MM-DD HH24:MI:SS')
BETWEEN '2024-05-20 08:00:00' and '2024-05-20 09:00:00' and event is not null
GROUP BY EVENT,SQL_ID
order by times_waited desc
输出:
EVENT SQL_ID TIMES_WAITED TIME_WAITED AVG_TIME_WAIT
---------------------------------------------------------------- ------------- ------------ ----------- -------------
SQL*Net more data to client ajsvu12v742r9 60 20074487 334575
log file parallel write 5 53504 10701
db file sequential read 3mnrpwf03fbug 3 8236 2745
db file sequential read 9jcqya57c3vf3 3 65861 21954
db file sequential read 5xrcv782tmnn4 3 49102 16367
但奇怪的是在v$sqlarea内无法找到sql_id对应的sql_text
SQL> select * from v$sqlarea where sql_id='ajsvu12v742r9' ;
no rows selected
awr视图dba_hist_sql_text也是如此
SQL> select * from dba_hist_sqltext where sql_id='ajsvu12v742r9';
no rows selected
查找过程
再次查询v$session,发现sql_id=ajsvu12v742r9 有status=active的session
select username, sid, serial#, status, sql_id, event, seconds_in_wait
from v$session
where sql_id='ajsvu12v742r9'
order by username, sid, serial#;
输出:
USERNAME SID SERIAL# STATUS SQL_ID EVENT SECONDS_IN_WAIT
------------------------------ ---------- ---------- -------- ------------- ---------------------------------------- ---------------
PGDSISSAP 960 4537 ACTIVE ajsvu12v742r9 SQL*Net more data to client 0
google后,发现有一篇文章(Hidden SQL – why can’t I find my SQL Text?)有类似情景。
按照文章指引,果然从v$open_cursor视图内发现线索
SELECT * FROM V$OPEN_CURSOR WHERE SQL_ID LIKE '%ajsvu%'
输出:
SADDR SID USER_NAME ADDRESS HASH_VALUE SQL_ID SQL_TEXT
---------------- ---------- ------------------------------ ---------------- ---------- ------------- ------------------------------------------------------------
000000055F9BA6B8 924 PGDSISSAP 000000053B1FB668 3060927209 ajsvu12v742r9 table_4_9_12f1d5_0_0_0
000000055C9E4B40 961 PGDSISSAP 000000053B1FB668 3060927209 ajsvu12v742r9 table_4_9_12f1d5_0_0_0
000000055F9D2928 960 PGDSISSAP 000000053B1FB668 3060927209 ajsvu12v742r9 table_4_9_12f1d5_0_0_0
000000055CA17B10 1037 PGDSISSAP 000000053B1FB668 3060927209 ajsvu12v742r9 table_4_9_12f1d5_0_0_0
000000055FA0C450 1046 PGDSISSAP 000000053B1FB668 3060927209 ajsvu12v742r9 table_4_9_12f1d5_0_0_0
000000055C9ADB08 879 PGDSISSAP 000000053B1FB668 3060927209 ajsvu12v742r9 table_4_9_12f1d5_0_0_0
000000055CA1FBE0 1049 PGDSISSAP 000000053B1FB668 3060927209 ajsvu12v742r9 table_4_9_12f1d5_0_0_0
上面输出中,有大量sql_id是ajsvu12v742r9的游标,sql_text字段是table_4_9_12f1d5_0_0_0,其中的12fd5是object_id的16进制的表现形式,转换后获得table name是PQM_PIC_UPLOAD
select owner, object_name, object_type
from dba_objects
where object_id = (select to_number(upper('12f1d5'),'XXXXXX') from dual);
输出:
OWNER OBJECT_NAME OBJECT_TYPE
------------------------------ -------------------------------------------------------------------------------------------------------------------------------- -------------------
PGDSISSAP PQM_PIC_UPLOAD TABLE
通过desc,如文章说法一致,其中一个字段是blob
SQL>desc pgdsissap.pqm_pic_upload
desc
名稱 空值 類型
------------ -------- -------------
FACT_NO NOT NULL CHAR(4)
ARTIC_NO NOT NULL VARCHAR2(25)
PIC_BLOB BLOB
PIC_FILENAME VARCHAR2(255)
PIC_FILEEXT CHAR(3)
PIC_LENGTH NUMBER(5)
PIC_WIDTH NUMBER(5)
UPLOAD_MK CHAR(1)
由于前述v$session中有抓到sql_id=ajsvu12v742r9,且status=active(而且连续抓取,交替出现,说明仍然活跃中),根据文章的提示,查找关联的prev_sql_id
note:由于文中使用的sql脚本没有找到,因此使用了最笨的办法,使用下述sql手工连续抓取,直到结果呈现
SELECT se.username,se.sid,se.serial#,se.status,se.sql_id,se.prev_sql_id,event,seconds_in_wait
FROM V$OPEN_CURSOR oc,v$session se
WHERE se.sid=oc.sid
and se.status='ACTIVE'
and oc.SQL_ID LIKE 'ajsvu12v742r9%'
输出:
USERNAME SID SERIAL# STATUS SQL_ID PREV_SQL_ID EVENT SECONDS_IN_WAIT
------------------------------ ---------- ---------- -------- ------------- ------------- ---------------------------------------- ---------------
PGDSISSAP 878 35106 ACTIVE ajsvu12v742r9 g4jpj0rsh78ky SQL*Net more data to client 0
PGDSISSAP 839 11275 ACTIVE ajsvu12v742r9 dqbfpxfj9t31t SQL*Net more data to client 0
功夫不负有心人,顺利抓取到ajsvu12v742r9 前面的sql_id:g4jpj0rsh78ky与dqbfpxfj9t31t
select sql_id,plan_hash_value,executions,disk_reads,buffer_gets,elapsed_time,sql_text from v$sqlarea where sql_id in ('g4jpj0rsh78ky','dqbfpxfj9t31t');```
```bash
SQL_ID PLAN_HASH_VALUE EXECUTIONS DISK_READS BUFFER_GETS ELAPSED_TIME SQL_TEXT
------------- --------------- ---------- ---------- ----------- ------------ --------------------------------------------------------------------------------
dqbfpxfj9t31t 1514936780 30 0 90 7832 select pic_blob from PQM_PIC_UPLOAD where fact_no ='B0CV' and artic_no ='IE9106'
g4jpj0rsh78ky 1514936780 30 0 90 7832 select pic_blob from PQM_PIC_UPLOAD where fact_no ='B0CV' and artic_no ='FZ6346'
重结果来看,这个sql与前面查到的table_name一致
结论:
具体来说,当sql_id ajsvu12v742r9访问 LOB 时,Oracle 会为 LOB 访问部分生成一个新的 SQL_ID。然而,这类语句不会在 V
S
Q
L
和
V
SQL 和 V
SQL和VSQLAREA 视图中,因此不会被自动工作负载库(AWR)捕获。此外,还有其他类型的语句也表现出这种行为。文中提到的嵌套表就是一个例子。
那么,最初生成了缺失sql_text 的 SQL_ID 的语句是什么呢?(等有时间再按照文章的指引重现看看)
最后,文章中对SQL*Net more data TO client的解释也是值得留意,顺便摘录如下:
I was surprised to find that the statement was short - only a few lines long. I was expecting a really long string that would not fit in a single SQLNet packet. After all, we are waiting on the "SQLNet more data FROM client" event, right? Which is supposed to be the time waited for an additional packet from the client when Oracle knows that more is coming. Clearly this was not the case. Rather it is waiting because Oracle is sending data back to the client that takes more than one packet (i.e. the LOB). You would expect that time would be logged under the “SQLNet more data TO client" event in this case. However, it appears that time is logged to that event only until the packet is turned over to the network. It appears that Oracle then starts logging time to the "SQLNet more data FROM client” event while it waits for the next response from the client. Therefore, the actual network transfer time is clocked to the wrong event. You can understand why the developers made this choice, because Oracle has no way of knowing how long it actually takes the packet to get to the client. It only knows how long the round trip takes. So maybe they should have named the event slightly differently, like "SQLNet more data from client including the entire round trip network time" or something. Anyway, Tanel Poder has a post that briefly mentions this quirk of the SQLNet events.