问题背景:客户反应oracle库很慢很慢 (read by other session可以结合db file sequential read等待事件一块优化)
1检查等待事件:
1 set linesize 200 2 col username for a15 3 col event for a35 4 col program for a20 5 col cpu_p for 99.99 6 select ta.*, round(ta.cpu_time / tb.total_cpu * 100, 1) cpu_usage from (select s.username, s.program, s.event, s.sql_id, sum(trunc(m.cpu)) cpu_time, count(*) sum from v$sessmetric m, v$session s where (m.physical_reads > 100 or m.cpu > 100 or m.logical_reads > 100) and m.session_id = s.sid and m.session_serial_num = s.serial# and s.status = 'ACTIVE' and username is not null group by s.username, s.program, s.event, s.sql_id order by 5 desc) ta, (select sum(cpu) total_cpu from v$sessmetric) tb where rownum < 11;
1 select event,count(1) from v$session_wait group by event order by 2 desc;
发现 read by other session 排第一。
2找到read by other session的SQL,同时可以取一个AWR报告看看TOP SQL,都指向同一SQL。
1 select sid, 2 s.username, 3 s.program, 4 s.action, 5 logon_time, 6 q.sql_text, 7 q.SQL_FULLTEXT, 8 q.sql_id 9 from v$session s 10 left join v$sql q on s.sql_hash_value = q.hash_value 11 where s.sid in (select sid 12 from v$session_wait 13 where event in ('read by other session'));
3、执行一下SQL,看看SQL是执行计划。
但是SQL很明显是走了一个错误的索引。
1 select count(*) as pageno 2 from table1 3 where targetid = :"SYS_B_0" 4 and msgId in 5 (select msgId from table2 where userId = :"SYS_B_1") 6 and classname not in (:"SYS_B_2", :"SYS_B_3", :"SYS_B_4") 7 and dateTime
4、错误的执行计划很可能是表统计信息不准确。经查询,果然是表2统计信息不准确。收集统计信息或者加hint解决问题。
1 execute dbms_stats.gather_table_stats(ownname => 'owner', tabname => 2 'table2', estimate_percent => 3 DBMS_STATS.AUTO_SAMPLE_SIZE, method_opt => 'FOR ALL COLUMNS SIZE 4 AUTO', cascade => TRUE);
备注:
read by other session这个等待事件其实是oracle IO问题一个比较常见的场景,会话a在进行把磁盘上的数据块读到内存(data buffer cache)中这个操作,
会话b,会话c 同时也请求这个数据块。因为会话a还完全读入内存(data buffer cache),就导致了b,c read by other session。所以会话a一般是db file sequential read 或 db file scattered read。
也是一种热块现象。
当出现该问题如何解决?
一般出现该问题是由于sql导致的,或者是由于磁盘设备可能导致。
当出现该问题的时候,首先需要定位sql。
方法一:通过ash获得细粒度的报告,查看top sql statement 获得sql。
方法二:通过sql语句直接获得:
1、当前正在发生的问题:
1 select sql_fulltext from v$sql a,v$session b where a.sql_id=b.sql_id and b.event='read by other session';
2、历史曾经发生的
1 select a.sql_id,sql_fulltext from v$sql a,dba_hist_active_sess_history b where a.sql_id=b.sql_id and b.event='read by other session';
往往read by other session伴随着db file sequential read事件的出现。
另外可以查看涉及对象信息,此处就是p1,p2,p3
1 SELECT p1 "file#", p2 "block#", p3 "class#" 2 FROM v$session_wait WHERE event = 'read by other session';
通过p1,p2,p3获得热点对象:
1 SELECT relative_fno, owner, segment_name, segment_type FROM dba_extents 2 WHERE file_id = &file 3 AND &block BETWEEN block_id AND block_id + blocks - 1; 4 5
另外,也可以 直接查看热点块的信息,如查看热点块导致的sql语句:
1 select sql_text 2 from v$sqltext a, 3 (select distinct a.owner, a.segment_name, a.segment_type 4 from dba_extents a, 5 (select dbarfil, dbablk 6 from (select dbarfil, dbablk from x$bh order by tch desc) 7 where rownum < 11) b 8 where a.RELATIVE_FNO = b.dbarfil 9 and a.BLOCK_ID <= b.dbablk 10 and a.block_id + a.blocks > b.dbablk) b 11 where a.sql_text like '%' || b.segment_name || '%' 12 and b.segment_type = 'TABLE' 13 order by a.hash_value, a.address, a.piece;
查看热点块对象:
1 SELECT E.OWNER, E.SEGMENT_NAME, E.SEGMENT_TYPE 2 FROM DBA_EXTENTS E, 3 (SELECT * 4 FROM (SELECT ADDR, TS#, FILE#, DBARFIL, DBABLK, TCH 5 FROM X$BH 6 ORDER BY TCH DESC) 7 WHERE ROWNUM < 11) B 8 WHERE E.RELATIVE_FNO = B.DBARFIL 9 AND E.BLOCK_ID <= B.DBABLK 10 AND E.BLOCK_ID + E.BLOCKS > B.DBABLK;
找到sql之后需要做的就是查看执行计划,判断问题所在,并进行优化。
1、对于在shared pool存在的cursor可以通过如下命令查看执行计划
1 select * from table(dbms_xplan.display_cursor('sql_id',null,'allstats'));
2、对于历史可以通过查看awr信息获得:
1 select * from table(dbms_xplan.display_awr('sql_id'));
另外对于设备引起的需要查看磁盘读写信息,可以通过vmstat 2 200进行判断。