产生latch free (row cache objects)事件一般存在两种情况,1.latch free的latch(领导)少(该种情况往往是由于大批量的并发导致,相同的进程访问同一个地址,产生争用)。2.资源争用是(latch free 管理的资源(也就是下属少)。这种情况往往是由于资源不够,例sequence cache太小,或online undo数少等 。下面这个案例就是每二种情况。
概述:
概述:
一套系统,经持续观察每晚21:00-22:00出现大批量的latch free等待事件,该事件最初是在9月25日部署完短信监控时发现。发现时已及时向领导汇报,经应用厂家确认该事件属于常态化,具体原因是由于每晚21:00-22:00点信用库要进行出账(灵活账期计算),出账其间需要进行大批量的dml操作。因此产生大量的latch free事件。考虑到出账业务复杂经跟局方沟通后决定在10月份对该部分进行优化。具体分析步骤如下,
问题现象:
1、9月25日问题及出现告警信息如下(告警阀值20,事件最高达86个)
![](http://img.blog.itpub.net/blog/attachment/201411/4/29446986_14151015105hc3.jpg?x-oss-process=style/bb)
21:00-22:00对应的CPU利用率平均在80%以上。
![](http://img.blog.itpub.net/blog/attachment/201411/4/29446986_1415101531yPzn.jpg?x-oss-process=style/bb)
2、该问题除了第天晚上21:00-22:00准时出现外,在10月31日10:00-13:00也大批量出现,对业务造成了性能上的影响。
![](http://img.blog.itpub.net/blog/attachment/201411/4/29446986_1415101552OdWO.jpg?x-oss-process=style/bb)
问题分析:
1.问题出现时查看数据库等待事件,发现latch free占用数据库45.16%的资源。
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free 4,642,874 165,659 45.16 ===========》从top5事件来看latch free消耗整个数据库的45.16%时间,消耗比重在第一位。
db file sequential read 24,051,427 62,999 17.18
enqueue 724,972 42,279 11.53
CPU time 35,971 9.81
SQL*Net message from dblink 7,208,131 17,198 4.69
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
latch free 4,642,874 165,659 45.16 ===========》从top5事件来看latch free消耗整个数据库的45.16%时间,消耗比重在第一位。
db file sequential read 24,051,427 62,999 17.18
enqueue 724,972 42,279 11.53
CPU time 35,971 9.81
SQL*Net message from dblink 7,208,131 17,198 4.69
2.查看latch free事件对应的事件p1,p1raw,p2为分别得出该事件类型为row cache objects,且在dc_rollback_segments上产生争用。
SQL>select p1raw,p2 from v$session_wait where event='latch free';
p1raw p2
----------- -------------------------
07000001EA5171B8 147
SQL> SELECT name, 'Child '||child#, gets, misses, sleeps
2 FROM v$latch_children
WHERE addr='&P1RAW'
UNION
SELECT name, null, gets, misses, sleeps
FROM v$latch
WHERE addr='&P1RAW'
; 3 4 5 6 7 8
Enter value for p1raw: 07000001EA5171B8
old 3: WHERE addr='&P1RAW'
new 3: WHERE addr='07000001EA5171B8'
Enter value for p1raw: 07000001EA5171B8
old 7: WHERE addr='&P1RAW'
new 7: WHERE addr='07000001EA5171B8'
NAME 'CHILD'||CHILD# GETS MISSES SLEEPS
---------------------------------------------------------------- ---------------------------------------------- ---------- ---------- ----------
row cache objects Child 8 906315913 1853603668 1031575192
2 FROM v$latch_children
WHERE addr='&P1RAW'
UNION
SELECT name, null, gets, misses, sleeps
FROM v$latch
WHERE addr='&P1RAW'
; 3 4 5 6 7 8
Enter value for p1raw: 07000001EA5171B8
old 3: WHERE addr='&P1RAW'
new 3: WHERE addr='07000001EA5171B8'
Enter value for p1raw: 07000001EA5171B8
old 7: WHERE addr='&P1RAW'
new 7: WHERE addr='07000001EA5171B8'
NAME 'CHILD'||CHILD# GETS MISSES SLEEPS
---------------------------------------------------------------- ---------------------------------------------- ---------- ---------- ----------
row cache objects Child 8 906315913 1853603668 1031575192
3、对当前系统进行SYSTEMDUMP分析,现次验证latch free 子类型为row cache parent object且对应的字典信息为dc_rollback_segment;
SO: 7000001e9e03af8, type: 38, owner: 7000001e696e2a0, flag: INIT/-/-/0x00
(trans) bsn = 41073789, flg = 0x121, flg2 = 0x00, prx = 0x7000001e8f2ab20, ros = 2147483647, spn = 41073789
efd = 18
parent xid: 0x0000.000.00000000
env: (scn: 0x0000.00000000 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: xid: 0x0000.000.00000000 scn: 0x0000.00000000 0sch: scn: 0x0000.00000000)
cev: (spc = 0 arsp = 0 ubk tsn: 0 rdba: 0x00000000 useg tsn: 0 rdba: 0x00000000
hwm uba: 0x00000000.0000.00 col uba: 0x00000000.0000.00
num bl: 0 bk list: 0x0)
(enqueue)
res: 0, prv: 0, sess: 0
xga: 0x0, heap: UGA
----------------------------------------
SO: 700000253335660, type: 48, owner: 7000001e9e03af8, flag: INIT/-/-/0x00
row cache enqueue: count=0 session=7000001cf42f730 object=7000001f3f6fde8, mode=S
savepoint=41073789
row cache parent object: address=7000001f3f6fde8 cid=3(dc_rollback_segments)
hash=e888d1fb typ=9 transaction=0 flags=00000002
own=7000001f3f6fec8[7000001f3f6fec8,7000001f3f6fec8] wat=7000001f3f6fed8[7000001f3f6fed8,7000001f3f6fed8] mode=S
status=VALID/-/-/-/-/-/-/-/-
request=N release=FALSE flags=0
instance lock id=QD 00000097 00000000
data=
00000097 00000004 00000004 00001425 000b5f53 5953534d 55313531 24000000
00000000 00000000 00000000 00000000 00030000 00000001 d7890000 000e4a3c
15e4f248 0b860000 00000002 00000001
----------------------------------------
(trans) bsn = 41073789, flg = 0x121, flg2 = 0x00, prx = 0x7000001e8f2ab20, ros = 2147483647, spn = 41073789
efd = 18
parent xid: 0x0000.000.00000000
env: (scn: 0x0000.00000000 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: xid: 0x0000.000.00000000 scn: 0x0000.00000000 0sch: scn: 0x0000.00000000)
cev: (spc = 0 arsp = 0 ubk tsn: 0 rdba: 0x00000000 useg tsn: 0 rdba: 0x00000000
hwm uba: 0x00000000.0000.00 col uba: 0x00000000.0000.00
num bl: 0 bk list: 0x0)
(enqueue)
res: 0, prv: 0, sess: 0
xga: 0x0, heap: UGA
----------------------------------------
SO: 700000253335660, type: 48, owner: 7000001e9e03af8, flag: INIT/-/-/0x00
row cache enqueue: count=0 session=7000001cf42f730 object=7000001f3f6fde8, mode=S
savepoint=41073789
row cache parent object: address=7000001f3f6fde8 cid=3(dc_rollback_segments)
hash=e888d1fb typ=9 transaction=0 flags=00000002
own=7000001f3f6fec8[7000001f3f6fec8,7000001f3f6fec8] wat=7000001f3f6fed8[7000001f3f6fed8,7000001f3f6fed8] mode=S
status=VALID/-/-/-/-/-/-/-/-
request=N release=FALSE flags=0
instance lock id=QD 00000097 00000000
data=
00000097 00000004 00000004 00001425 000b5f53 5953534d 55313531 24000000
00000000 00000000 00000000 00000000 00030000 00000001 d7890000 000e4a3c
15e4f248 0b860000 00000002 00000001
----------------------------------------
问题解决方案:
出现该问题主要原因是批量对db_rollback_segments字典进行访问,为了减少争用,需要增加row cache objects bucket桶,来缓解对数据字典的争用
。
1.设置_rollback_segment_count为1000(该隐含参数是指数据库中保持online回滚段的个数(非SYSTEM回滚段,sysem回滚段总是online的)。注:online后不自动offline;
alter system set "_
_
rollback_segment_count"=1000 scope=spfile sid='
oxy3a';
alter system set "_
_
rollback_segment_count"=1000 scope=spfile sid='
oxy3b'; ---------->在9i中修改该参数需要重启数据库,10g以后不用重启。
注:
*._smu_debug_mode = 33554432 ====》 在AUM模式下,默认不允许对undo segment进行online/offline/drop操作;需要设置该参数才允许。
*.event = '10511 trace name context forever, level 1'
也可以手动online undo段,设置event=’10511 trace name context forever, level 1’来禁用SMON OFFLINE UNDO SEGS;
注:
*._smu_debug_mode = 33554432 ====》 在AUM模式下,默认不允许对undo segment进行online/offline/drop操作;需要设置该参数才允许。
*.event = '10511 trace name context forever, level 1'
也可以手动online undo段,设置event=’10511 trace name context forever, level 1’来禁用SMON OFFLINE UNDO SEGS;
2.由于需要重启,因此临时手工增加 online段来解决。
set serverout on
exec prc_make_undo_online( 'undo1', 500); ------->该过程可以手动对指定undo表空间进行online回滚段。
exec prc_make_undo_online( 'undo1', 500); ------->该过程可以手动对指定undo表空间进行online回滚段。
3.修改后查看回滚段信息。
select TABLESPACE_NAME, status, count(*)
from
dba_rollback_segs
where tablespace_name <> 'SYSTEM' group by tablespace_name, status;
from
dba_rollback_segs
where tablespace_name <> 'SYSTEM' group by tablespace_name, status;
TABLESPACE_NAME STATUS COUNT(*)
------------------------------ ---------------- ----------
UNDO1 ONLINE 608 ====》online 回滚段数增加到608
UNDO1 OFFLINE 133
UNDO2 ONLINE 580 ======》online 回滚段数据增加到580
UNDO2 OFFLINE 1
------------------------------ ---------------- ----------
UNDO1 ONLINE 608 ====》online 回滚段数增加到608
UNDO1 OFFLINE 133
UNDO2 ONLINE 580 ======》online 回滚段数据增加到580
UNDO2 OFFLINE 1
问题解决后性能对比:
1.优化后21:00-22:00 CPU利用率在60%以下
![](http://img.blog.itpub.net/blog/attachment/201411/4/29446986_14151016208p52.jpg?x-oss-process=style/bb)
2.从10月31号至今latch free事件再未出现,且latch free事件在TOP5中消失。
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file sequential read 34,173,605 136,931 74.38
SQL*Net message from dblink 10,047,269 11,559 6.28
CPU time 11,137 6.05
global cache cr request 16,193,950 8,017 4.35
single-task message 9,313 3,364 1.83
-------------------------------------------------------------
附录:
----------手动online 回滚段
create or replace procedure prc_make_undo_online(
p_tbs_name in varchar2,
p_segment_count in integer )
is
v_orig_smu_debug_mode integer;
v_sid varchar2(32);
v_sql_text varchar2(512) := '';
cursor cur_offline_rbs( p_tbs_name varchar2, p_seg_count integer ) is
with my_rollback_segs as (
select *
from dba_rollback_segs r
where r.tablespace_name = upper( p_tbs_name )
and r.status = 'OFFLINE'
order by r.segment_id
)
select *
from my_rollback_segs
where rownum <= p_seg_count;
begin
if ( length( p_tbs_name ) = 0 or p_tbs_name is null ) then
dbms_output.put_line( 'The first parameter p_tbs_name is not specified, exit.' );
goto goto_end;
end if;
if ( p_segment_count is null ) then
dbms_output.put_line( 'The second parameter p_segment_count is not specified, exit.' );
goto goto_end;
end if;
select instance_name into v_sid from v$instance;
select c.kspftctxvl into v_orig_smu_debug_mode
from x$ksppi a, x$ksppcv b, x$ksppsv2 c
where a.indx = b.indx and ( a.indx + 1 ) = c.kspftctxpn
and a.ksppinm = '_smu_debug_mode';
v_sql_text := 'alter system set "_smu_debug_mode" = 4 scope = memory sid = ''' || v_sid || '''';
execute immediate v_sql_text;
for rUndoSeg in cur_offline_rbs( p_tbs_name, p_segment_count ) loop
begin
v_sql_text := 'alter rollback segment "' || rUndoSeg.segment_name || '" online';
execute immediate v_sql_text;
exception
when others then
null;
end;
end loop;
v_sql_text := 'alter system set "_smu_debug_mode" = ' || v_orig_smu_debug_mode
|| ' scope = memory sid = ''' || v_sid || '''';
execute immediate v_sql_text;
<< goto_end >>
null;
end prc_make_undo_online;
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file sequential read 34,173,605 136,931 74.38
SQL*Net message from dblink 10,047,269 11,559 6.28
CPU time 11,137 6.05
global cache cr request 16,193,950 8,017 4.35
single-task message 9,313 3,364 1.83
-------------------------------------------------------------
附录:
----------手动online 回滚段
create or replace procedure prc_make_undo_online(
p_tbs_name in varchar2,
p_segment_count in integer )
is
v_orig_smu_debug_mode integer;
v_sid varchar2(32);
v_sql_text varchar2(512) := '';
cursor cur_offline_rbs( p_tbs_name varchar2, p_seg_count integer ) is
with my_rollback_segs as (
select *
from dba_rollback_segs r
where r.tablespace_name = upper( p_tbs_name )
and r.status = 'OFFLINE'
order by r.segment_id
)
select *
from my_rollback_segs
where rownum <= p_seg_count;
begin
if ( length( p_tbs_name ) = 0 or p_tbs_name is null ) then
dbms_output.put_line( 'The first parameter p_tbs_name is not specified, exit.' );
goto goto_end;
end if;
if ( p_segment_count is null ) then
dbms_output.put_line( 'The second parameter p_segment_count is not specified, exit.' );
goto goto_end;
end if;
select instance_name into v_sid from v$instance;
select c.kspftctxvl into v_orig_smu_debug_mode
from x$ksppi a, x$ksppcv b, x$ksppsv2 c
where a.indx = b.indx and ( a.indx + 1 ) = c.kspftctxpn
and a.ksppinm = '_smu_debug_mode';
v_sql_text := 'alter system set "_smu_debug_mode" = 4 scope = memory sid = ''' || v_sid || '''';
execute immediate v_sql_text;
for rUndoSeg in cur_offline_rbs( p_tbs_name, p_segment_count ) loop
begin
v_sql_text := 'alter rollback segment "' || rUndoSeg.segment_name || '" online';
execute immediate v_sql_text;
exception
when others then
null;
end;
end loop;
v_sql_text := 'alter system set "_smu_debug_mode" = ' || v_orig_smu_debug_mode
|| ' scope = memory sid = ''' || v_sid || '''';
execute immediate v_sql_text;
<< goto_end >>
null;
end prc_make_undo_online;
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29446986/viewspace-1320485/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/29446986/viewspace-1320485/