Oracle Latch牛刀小试

最新推荐文章于 2024-06-16 11:47:28 发布

Jermaine

最新推荐文章于 2024-06-16 11:47:28 发布

阅读量2.3k

点赞数 1

分类专栏：数据库与数据仓库文章标签： oracle cache library buffer sql events

数据库与数据仓库专栏收录该内容

5 篇文章 0 订阅

订阅专栏

Latch free 等待事件，有单个参数：latch address，latch number，number of tries；在处理此等待事件的时候要注意一下几点：
latch 只应用于 sga的内存结构；不适用与数据库的 objects；sga有多个latch 用来保护各种内存结构；
session 对等待特殊的 latch；
   到9i latch free 等待事件展现了所有的latch，但是10g 开始就开始进行了详细的划分；
---------------------------------------------------------------------------------------------------------------------------
什么是latch
latch是锁的一种，它由三部分组成：PID（process id），memory address，length；它强制排除对数据结构的读取，以保证数据结构的完整性；

上图为锁与latch的区别；
------------------------------------------------------------------------------------------------------------------------------------
Latch family ：
Latch 有三种类型：parent，child，solitary ；parent和solitary 是固定在 oracle kernel 代码中的；child latch 是在instance 启动时创建的；
v$latch_parent 和 v$latch_children 视图包含了parent 和 child latch 的信息；v$latch 包含了 solitary latch 的信息同时包含了前两者的信息；

---------------------------------------------------------------
Latch acquisition：

进程在请求一个latch时会有两种模式：willing-to-wait 和 no-wait；
no-wait模式，出现时在 IMMEDIATE_GETS  和 IMMEDIATE_MISSES 列中是有统计信息的；这些列是在 V$LATCH, V$LATCH_PARENT, and V$LATCH_CHILDREN视图中的；
The willing-to-wait mode is used only on the last latch when all no-wait requests against other child latches have failed.
当进程第一个请求latch 时，如果latch可用，那么就会持有该latch；在进行数据结构修改之前，进程会将恢复信息写进 latch 恢复区；pmon 就会知道在进程持有 latch 时，如果进程死亡了，怎样去清除latch；

当latch 在第一次请求时不可用时，进程就会在cpu上进行自旋，并且会再次请求latch；这种过程将会重复直到达到 _spin_count 参数的值；在尝试获取latch时，如果获得了latche，那么spin_gets  和 misses 就会增1；否则就会在 v$session_wait中出现 latch free wait 事件，损耗cpu并且进入 sleep cycle；等下次被唤醒后将再次进入 _spin_count 中；

每个latch都有一个级别号，solitary 和 parent的固定在 oracle 的kernel 中 child 是在实例启动时创建的，同时子latch 会进行继承；

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Latch 的分类：

从9i r2 开始 latch进行分类，每类latch都有自己的_spin_count 值；在以前的版本中，是公用的，这增加了cpu的消耗；如果有 cache buffers chains latch 很高并且cpu 资源很宽松，那么就可以增加_spin_count 来减少latch；在视图 x$ksllclass 包含了9类的信息；

select indx, spin, yield, waittime
from x$ksllclass;

在x$ksllclass 视图中，每个行都是与一个 _latch_class_n 相关联的；因此当增加一个类型的latch的 _spin_count 时就需要知道 latch number 是多少
然后进行如下设置：
select latch#, name
from v$latchname
where name = ’cache buffers chains’;

LATCH# NAME
---------- -------------------------------
      97 cache buffers chains

# Make these two entries in the INIT.ORA file and bounce the instance.
# This modifies the spin count attribute of class 1 to 10000.
_latch_class_1 = "10000"
# This assigns latch# 97 to class 1.
_latch_classes = "97:1"

select indx, spin, yield, waittime
from x$ksllclass;

INDX    SPIN    YIELD WAITTIME
---------- ---------- ---------- ----------
      0    20000       0       1
               1    10000       0       1
               2    20000       0       1
      3    20000       0       1
      4    20000       0       1
      5    20000       0       1
      6    20000       0       1
      7    20000       0       1
8 rows selected.

select a.kslldnam, b.kslltnum, b.class_ksllt
from x$kslld a, x$ksllt b
where  a.kslldadr = b.addr
and b.class_ksllt > 0;

KSLLDNAM                   KSLLTNUM CLASS_KSLLT
------------------------- ---------- -----------
process allocation                3          2
cache buffers chains             97          1
----------------------------------------------------------------------------------------------------------
Latch free 等待事件告诉了你什么
当有latch free 等待事件出现的时候，就说明进程在 willing-to-wait 的模式下（_spin_count 次数超出)，获取latch失败，并转换为sleep；如果这种现象很严重，那么就会自旋而严重消耗cpu；
v$system_event 视图中的total_wait 记录了进程在 willing-to-wait 模式下获取latch的失败次数；在v$latch 中的sleep 记录了进程 sleep 的次数；

total_waits 可能会大于 sleep的次数，是因为 sleep是在get 操作完成以后，才更新数值的；

select a.total_waits, b.sum_of_sleeps
from  (select total_waits from v$system_event where event = ’latch free’) a,
(select sum(sleeps) sum_of_sleeps from v$latch) b;

TOTAL_WAITS SUM_OF_SLEEPS
----------- -------------
414031680    414031680
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Latch miss location
v$latch_misses  会保留 latch 发生misses的信息；
select location,
parent_name,
wtr_slp_count,
sleep_count,
longhold_count
from v$latch_misses
where  sleep_count > 0
order by wtr_slp_count, location;
LONGHOLD
LOCATION          PARENT_NAME       WTR_SLP_COUNT SLEEP_COUNT COUNT
-------------------- -------------------- ------------- ----------- --------
. . .
kglupc: child       library cache             7879693 11869691       0
kghupr1             shared pool             8062331    5493370       0
kcbrls: kslbegin    cache buffers chains    9776543 14043355       0
kqrpfl: not dirty row cache objects       15606317 14999100       0
kqrpre: find obj    row cache objects       20359370 20969580       0
kglhdgn: child:    library cache          23782557    9952093       0
kcbgtcr: fast path cache buffers chains    26729974 23166337       0
kglpnc: child       library cache          27385354    7707204       0

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10g 中的 latches
在10g 之前，所有的latch 等待都会显示为 latch free wait 事件；可以使用 v$session_wait 中的P2 值，在 v$latch 中查询到进程的 latch 名称；
或者使用10046 跟踪事件；P2 参数包含了指定的 latch的号；在 10g 中所有的latch 都被独立出来拥有独立的名称；
select name
from v$event_name
where  name like ’latch%’
order by 1;
---------------------------------------------------------------
Latch  的成因，分析，与处理：
出现latch竞争时，就说明一个process 吃后的latch 时间过长；通常会导致性能降低；latch竞争在并发环境中比较多见；在数据库中消除latch 竞争是不可能的；latch 等待事件会始终出现在 v$system_event 中的；问题其与其他的等待事件的关系（time_waited);应当关注 time_waited 情况；如果latch  的等待事件比较高，那么就可以从 v$latch 中查询出相关的进程：

select name, gets, misses, immediate_gets, immediate_misses, sleeps
from v$latch
order by sleeps;
IMMEDIATE IMMEDIATE
NAME                      GETS    MISSES       GETS MISSES    SLEEPS
-------------------- ---------- ---------- ----------- --------- ----------
enqueue hash chains 42770950    4279          0       0    1964
shared pool          9106650    5400          0       0    2632
row cache objects    69059887    27938       409       0    7517
enqueues             80443314    330167          0       0    13761
library cache       69447172    103349    465827    190    44328
cache buffers chains 1691040252 1166249 61532689    5909    127478
. . .
再根据相应的latch类型进行相关的处理；
--------------------------------------------------------------------------------------------------------------------------------------------
Shared pool and library cache latches：
Oracle 的共享池，是由多种类型的结构组成的；最突出的就是：dictionary cache，sql area，和library cache；可以从视图V$SGASTAT 中查询其他的结构；共享池latch 保护共享池结构；
The shared pool latch protects the shared pool structures, and it is taken when allocating and freeing memory heaps. For example, it is taken when allocating space for new SQL statements (hard parsing), PL/SQL procedures, functions, packages, and triggers as well as when it is aging or freeing chunks of space to make room for new objects.
在 9i之前，共享池内容结构是由一个 solitary shared pool latch 来保护的；从9i开始，最多可以有7个 child shared pool latches 用来保护共享池结构；9i中可以把共享池划分成过个子池，每个四个cpu可以划分出一个子池；并且 shared_pool_size 大于250M；子池可以由隐含参数：_kghdsidx_count 来调整；如果手动增加子池的大小，就需要同时增加共享池的大小，因为每个子池是有自身的结构的：lru 列表，shared pool latch；否则，实例将由于 ora 04031 而不能启动；

16 cpu ， 256 共享池的情况,以及子池的个数：
select a.ksppinm, b.ksppstvl
from x$ksppi a, x$ksppsv b
where  a.indx = b.indx
and a.ksppinm = ’_kghdsidx_count’;

KSPPINM          KSPPSTVL
------------------ ----------
_kghdsidx_count 2

select addr, kghluidx, kghlufsh, kghluops, kghlurcr, kghlutrn, kghlumxa
from x$kghlu;

ADDR          KGHLUIDX KGHLUFSH KGHLUOPS KGHLURCR KGHLUTRN KGHLUMXA
---------------- -------- ---------- ---------- -------- -------- ----------
80000001001581B8       2 41588416  496096025 14820 17463 2147483647
8000000100157E18       1 46837096 3690967191 11661 19930 2147483647

select addr, name, gets, misses, waiters_woken
from v$latch_children
where name = ’shared pool’;

ADDR          NAME                GETS    MISSES WAITERS_WOKEN
---------------- ------------- ----------- ---------- -------------
C00000004C5B06B0 shared pool          0       0          0
C00000004C5B0590 shared pool          0       0          0
C00000004C5B0470 shared pool          0       0          0
C00000004C5B0350 shared pool          0       0          0
C00000004C5B0230 shared pool          0       0          0
C00000004C5B0110 shared pool 1385021389 90748637    12734879
C00000004C5AFFF0 shared pool 2138031345  413319249    44738488

Library cache 主要包含 cursor ，sql statements，执行计划，和解析树等；当oracle在 library cache 结构中，进行，modifying inspecting pinning，locking，loading，或者 executing object 的时候就会请求 library cache latch；
Child library cache 可以通过查询 v$latch_children 中查出；其大小通常比 cpu_count 要大；这个值的大小是由 _kgl_latch_count 来决定的；从9i开始，在 v$sqlarea 中添加了 child_latch 列，以查看cursors的情况；
---------------------------------------------------------------------------------------------------------------------------------------------------------
共享池与library cache 竞争之 --------- parsing
硬解析，将会产生严重的竞争；在进行硬解析式需要重新应用cursor；硬解析是非常昂贵的操作；并且在进行硬解析时子library cache latch 必须被持有；
找出硬解析：
• select a.*, sysdate-b.startup_time days_old
from v$sysstat a, v$instance b
where a.name like ’parse%’;

STATISTIC# NAME                   CLASS    VALUE DAYS_OLD
---------- ------------------------- ----- ---------- ----------
230 parse time cpu             64 33371587  4.6433912
231 parse time elapsed          64 63185919  4.6433912
232 parse count (total)       64 2137380227  4.6433912
233 parse count (hard)          64 27006791  4.6433912
234 parse count (failures)    64    58945  4.6433912
Note   A parse failure is related to the “ORA-00942: table or view does not exist” error or out of shared memory
找出当前执行大量硬解析的会话：
• select a.sid, c.username, b.name, a.value,
round((sysdate - c.logon_time)*24) hours_connected
from v$sesstat a, v$statname b, v$session c
where  c.sid       = a.sid
and a.statistic# = b.statistic#
and a.value    > 0
and b.name    = ’parse count (hard)’
order by a.value;

SID USERNAME NAME                   VALUE HOURS_CONNECTED
---- ---------- ------------------ ---------- ---------------
510 SYS       parse count (hard)       12             4
413 PMAPPC    parse count (hard)       317             51
37 PMHCMC    parse count (hard)    27680          111
257 PMAPPC    parse count (hard)    64652             13
432 PMAPPC    parse count (hard)    105505             13

在10g 中，v$sess_time_model 视图能够硬解析的更详细的信息：
• select *
from v$sess_time_model
where  sid = (select max(sid) from v$mystat);

SID STAT_ID STAT_NAME                                           VALUE
---- ---------- ------------------------------------------------ ----------
148 3649082374 DB time                                           11141191
148 2748282437 DB CPU                                           9530592
148 4157170894 background elapsed time                                  0
148 2451517896 background cpu time                                     0
148 4127043053 sequence load elapsed time                               0
  148 1431595225 parse time elapsed                               3868898
  148  372226525 hard parse elapsed time                            3484672
148 2821698184 sql execute elapsed time                         9455020
148 1990024365 connection management call elapsed time             6726
  148 1824284809 failed parse elapsed time                               0
  148 4125607023 failed parse (out of shared memory) elapsed time       0
  148 3138706091 hard parse (sharing criteria) elapsed time          11552
  148  268357648 hard parse (bind mismatch) elapsed time             4440
148 2643905994 PL/SQL execution elapsed time                      70350
148  290749718 inbound PL/SQL rpc elapsed time                         0
148 1311180441 PL/SQL compilation elapsed time                   268477
148  751169994 Java execution elapsed time                            0

The hard parse statistics in the preceding output can be grouped as such:
1. parse time elapsed
   2. hard parse elapsed time
      3. hard parse (sharing criteria) elapsed time
         4. hard parse (bind mismatch) elapsed time
2. failed parse elapsed time
   3. failed parse (out of shared memory) elapsed time

查询出执行较多的 sql ，可以对其使用绑定变量：
• select hash_value, substr(sql_text,1,80)
from v$sqlarea
where  substr(sql_text,1,40) in (select substr(sql_text,1,40)
from v$sqlarea
having count(*) > 4
group by substr(sql_text,1,40))
order by sql_text;

HASH_VALUE SUBSTR(SQL_TEXT,1,80)
---------- -----------------------------------------------------------------
2915282817 SELECT revenue.customer_id, revenue.orig_sys, revenue.service_typ
2923401936 SELECT revenue.customer_id, revenue.orig_sys, revenue.service_typ
303952184 SELECT revenue.customer_id, revenue.orig_sys, revenue.service_typ
416786153 SELECT revenue.customer_id, revenue.orig_sys, revenue.service_typ
2112631233 SELECT revenue.customer_id, revenue.orig_sys, revenue.service_typ
3373328808 select region_id from person_to_chair where chair_id = 988947
407884945 select region_id from person_to_chair where chair_id = 990165
3022536167 select region_id from person_to_chair where chair_id = 990166
3204873278 select region_id from person_to_chair where chair_id = 990167
643778054 select region_id from person_to_chair where chair_id = 990168
2601269433 select region_id from person_to_chair where chair_id = 990169
3453662597 select region_id from person_to_chair where chair_id = 991393
3621328440 update plan_storage set last_month_plan_id = 780093, pay_code
2852661466 update plan_storage set last_month_plan_id = 780093, pay_code
380292598 update plan_storage set last_month_plan_id = 780093, pay_code
2202959352 update plan_storage set last_month_plan_id = 780093, pay_code
. . .
• 硬解析一般是要通过应用来的更改的，如果公数据库方面更改的话，就是调整 cursor_sharing 参数为 force；

Caution   The CURSOR_SHARING feature has bugs in the earlier releases of Oracle8i Database. It is not recommended in environments with materialized views because it may cause prolonged library cache pin waits. Also, setting the CURSOR_SHARING to FORCE may cause the optimizer to generate unexpected execution plans because the optimizer does not know the values of the bind variables. This may positively or negatively impact the database performance. Starting in Oracle9i Database, the optimizer will peek at the bind variable values in the session’s PGA before producing an execution plan. This behavior is controlled by the parameter _OPTIM_PEEK_USER_BINDS. However, this applies to statements that require hard parsing only, which means the execution plan is based on the first value that is bound to the variable

Whenever a SQL statement arrives, Oracle checks to see if the statement is already in the library cache. If it is, the statement can be executed with little overhead; this process is known as a soft parse. While hard parses are bad, soft parses are not good either. The library cache latch is acquired during a soft parse operation. Oracle still has to check the syntax and semantics of the statement, unless the statement is cached in the session’s cursor cache. You can reduce the library cache latch hold time by properly setting the SESSION_CACHED_CURSORS parameter. (See Oracle Metalink notes #30804.1 and #62143.1 for more information.) However, the best approach is to reduce the number of soft parses, which can only be done through the application. The idea is to parse once, execute many instead of parse once, execute once. You can find the offending statements by querying the V$SQLAREA view for statements with high numbers of PARSE_CALLS.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Shared pool latch 竞争之 =====oversized shared pool
从9i开始，由于有共享子池的划分，所以较大的共享池设置问题已经得到改善；但是在之前的版本中会引起共享池的latch 竞争；因为共享池的空闲内存会被分类，并且由异一组 buckets 来维护，或者被记录到一个free list上；所以当进程需要获取latch时，就需要花费更过的时间来扫描这个很长的free list；在高并发的情况下，这将会触发严重的的共享池竞争；
使用：alter session set events 'immediate trace name heapdump level2' 可以查看共享池的空闲列表，这将在udump中产生一个trace文件，查看文件中的 bucket；也可以使用下列sql来查询共享池的free memory chunks；仅限于在10g r2 中使用；
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump heapdump 2
Statement processed.
SQL> oradebug tracefile_name
/u01/admin/webmon/udump/orcl_ora_17550.trc
SQL> exit

$ grep Bucket /u01/admin/webmon/udump/orcl_ora_17550.trc > tmp.lst
$ sed ’s/size=/ksmchsiz>=/’ tmp.lst > tmp2.lst
$ sed ’s/ Bucket //’ tmp2.lst | sort –nr > tmp.lst

# Create a shell script based on the following and run it to generate
# the reusable query for the database.
echo ’select ksmchidx, (case’
cat tmp.lst | while read LINE
do
echo $LINE | awk ’{print "when " $2 " then " $1}’
done
echo ’end) bucket#,’
echo ’    count(*) free_chunks,’
echo ’    sum(ksmchsiz) free_space,’
echo ’    trunc(avg(ksmchsiz)) avg_chunk_size’
echo ’from x$ksmsp’
echo "where  ksmchcls = ’free’"
echo ’group by ksmchidx, (case’;
cat tmp.lst | while read LINE
do
echo $LINE | awk ’{print "when " $2 " then " $1’}
done
echo ’end);’

同时可以使用 dbms_shared_pool.keep ，以及v$db_object_cache;

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Library cache latches  竞争之 --- statements with high version counts；
oracle使用 child cursor 来辨别 sql语句，但是不能共享的，因为每个子cursor 会指向潜在的不同的对象；例如三个用户都有 test表，那么 select
* from test 就会有相同的 hash值，但是有不同子 cursor ；所以对于一个语句的哈市进行解析时，就要与已存的进行匹配；在检查过程中，就需要先持有latch；使用相同的唯一的对象名称能够减少此种latch ，可以查询v$sqlarea 视图：
select version_count, sql_text
from v$sqlarea
where  version_count > 20
order by version_count, hash_value;
Note   High version counts may also be caused by a bug related to the SQL execution progression monitoring feature in Oracle8i Database. (See Oracle Metalink note #62143.1) The bug prevents SQL statements from being shared. This feature can be turned off by setting the _SQLEXEC_PROGRESSION_COST parameter to 0, which in turn suppresses all data in the V$SESSION_LONGOPS view.

-- The columns in uppercase are relevant to Oracle9i Database.
-- Oracle Database 10g Release 1 has eight additional columns and it
-- also contains the child cursor address and number.
select a.*, b.hash_value, b.sql_text
from v$sql_shared_cursor a, v$sqltext b, x$kglcursor c
where  a.unbound_cursor       || a.sql_type_mismatch    ||
a.optimizer_mismatch    || a.outline_mismatch    ||
a.stats_row_mismatch    || a.literal_mismatch    ||
a.sec_depth_mismatch    || a.explain_plan_cursor ||
a.buffered_dml_mismatch  || a.pdml_env_mismatch    ||
a.inst_drtld_mismatch || a.slave_qc_mismatch    ||
a.typecheck_mismatch    || a.auth_check_mismatch ||
a.bind_mismatch       || a.describe_mismatch    ||
a.language_mismatch    || a.translation_mismatch  ||
a.row_level_sec_mismatch || a.insuff_privs       ||
a.insuff_privs_rem    || a.remote_trans_mismatch ||
a.LOGMINER_SESSION_MISMATCH || a.INCOMP_LTRL_MISMATCH ||
a.OVERLAP_TIME_MISMATCH    || a.sql_redirect_mismatch ||
a.mv_query_gen_mismatch    || a.USER_BIND_PEEK_MISMATCH ||
a.TYPCHK_DEP_MISMATCH    || a.NO_TRIGGER_MISMATCH    ||
a.FLASHBACK_CURSOR <> ’NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN’
and a.address = c.kglhdadr
and b.hash_value = c.kglnahsh
order by b.hash_value, b.piece;
-----------------------------------------------------------------------------------------------------------------------------------------------------

Cache buffer chains latches

当数据块读取到sga后，其buffer headers 就会被放到 hash chains 列表上；这个内存结构是由一组的 child cache buffer chains latches 来决定的；如上图所示;
当对一个hash chain 上的block进行操作时，就需要先获取一个 cache buffer chains latch；
Note Starting in Oracle9i Database, the cache buffers chains latches can be shared for read-only. This reduces some contention, but will by no means eliminate the contention all together. We have many Oracle9i production databases that are plagued by the cache buffers chains latch contention.
对于特定 block  header 的 hash bucket是以 data block address（dba）的modulus 和参数_db_block_hash_buckets 来决定的；
例如：hash bucket=mod（dba,_DB_BLOCK_HASH_BUCKETS）;buffer header 的竞争可以通过v$BH 和 X$BH 来查询；
可以将 buffer header dump出来：
Alter system set event 'immediate trace name buffers level 1';
直到 oracle 8i ，每个hash bucket 有一个 cache buffers chains latch ，每个bucket 有一个chain，也就是说：hash latches，hash buckets，hash chains 是1:1:1 的；默认的 hash bucket 是db_block_buffers的四分之一；这个值可以有参数 _DB_BLOCK_HASH_BUCKETS参数来修改；
从8i 开始 hash latch 与 bucket变为 1：M ，而 bucket 与chain仍旧为 1:1；多个hash chain 可能被一个latch保护；为了做到一点 oracle就减少了默认的hash latch的个数；默认的hash latch 数是与 db_block_buffers 参数为基础的；当buffer cache 少于 1g的时候通常是1024；也可以使用_DB_BLOCK_HASH_LATCHES 参数来修改；可以使用如下sql，查询当前的 hash latch ：
select count(distinct(hladdr))
from x$bh;

COUNT(DISTINCT(HLADDR))
-----------------------
1024

select count(*)
from v$latch_children
where  name = ’cache buffers chains’;

COUNT(*)
----------
1024
默认的 hash buckets 打大小是 2*DB_block_buffers，也可以通过参数_DB_BLOCK_HASH_BUCKETS 参数来决定；
在10g中，使用不同的机制来设定 hash buckets 的默认数；通常为 1/4 db_cache_size，但是深入的测试发现 hash bucket 的数量始终都会多余 db_cache_size 的；

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Contention for Cache Buffers Chains Latches—Inefficient SQL Statements

此竞争主要是由效率低的sql引起的；在高并发的情况下，花费在 latch free wait 上的事件是十分宝贵的；一个很熟悉的场景就是：多个session同时执行同样的sql去读取相同的数据；
一定要注意以下几点：
• Every logical read requires a latch get operation and a CPU.
• The only way to get out of the latch get routine is to get the latch.
• Only one process can own a cache buffers chains latch at any one time, and the latch covers many data blocks, some of which may be needed by another process. (Again, as mentioned in a previous note, Oracle9i Database allows the cache buffers chains latches to be shared for read-only.)

较少的逻辑读就意味着较少的latch 竞争；必须检查出现 cache buffer chains latch 的sql，并优化，以减少逻辑读；sql是 buffer_get是主要的标准；

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Contention for Cache Buffers Chains Latches—hot blokcs
热块是另一个引起此现象的原因，此情况提高latch的没有什么效果的；
此时可以查看 latch free wait event的中的 p1rw 列；它是latch address，如果session 等待相同的 latch address，那么就有热块；
select sid, p1raw, p2, p3, seconds_in_wait, wait_time, state
from v$session_wait
where  event = ’latch free’
order by p2, p1raw;

SID P1RAW          P2  P3 SECONDS_IN_WAIT  WAIT_TIME STATE
---- ---------------- --- --- --------------- ---------- ------------------
38 00000400837D7800  98 1             1       2 WAITED KNOWN TIME
42 00000400837D7800  98 1             1       2 WAITED KNOWN TIME
44 00000400837D7800  98 3             1       4 WAITED KNOWN TIME
58 00000400837D7800  98 2             1       10 WAITED KNOWN TIME
85 00000400837D7800  98 3             1       12 WAITED KNOWN TIME
214 00000400837D7800  98 1             1       2 WAITED KNOWN TIME
186 00000400837D7800  98 3             1       14 WAITED KNOWN TIME
149 00000400837D7800  98 2             1       3 WAITED KNOWN TIME
132 00000400837D7800  98 2             1       2 WAITED KNOWN TIME
101 00000400837D7800  98 3             1       4 WAITED KNOWN TIME
222 00000400837D7800  98 3             1       12 WAITED KNOWN TIME
229 00000400837D7800  98 3             1       4 WAITED KNOWN TIME
230 00000400837D7800  98 3             1       11 WAITED KNOWN TIME
232 00000400837D7800  98 1             1       20 WAITED KNOWN TIME
257 00000400837D7800  98 3             1       16 WAITED KNOWN TIME
263 00000400837D7800  98 3             1       5 WAITED KNOWN TIME
117 00000400837D7800  98 4             1       4 WAITED KNOWN TIME
102 00000400837D7800  98 3             1       12 WAITED KNOWN TIME
47 00000400837D7800  98 3             1       11 WAITED KNOWN TIME
49 00000400837D7800  98 1             1       2 WAITED KNOWN TIME

99 00000400837D9300  98 1             1       32 WAITED KNOWN TIME

51 00000400837DD200  98 1             1       1 WAITED KNOWN TIME

43 00000400837DE400  98 1             1       2 WAITED KNOWN TIME
130 00000400837DE400  98 1             1       10 WAITED KNOWN TIME
89 00000400837DE400  98 1             1       2 WAITED KNOWN TIME
62 00000400837DE400  98 0             1       -1 WAITED KNOWN TIME
150 00000400837DE400  98 1             1       9 WAITED KNOWN TIME
195 00000400837DE400  98 1             1       3 WAITED KNOWN TIME
67 00000400837DE400  98 1             1       2 WAITED KNOWN TIME

下一步就是查看哪些blocks 被latch包含了；同时应当捕获相关的sql；因为一个 cache buffers chains 会包含很多个块的；从sql中再确认具体的热块；在8i以上的版本可以使用下列语句来进行确认：通常情况下hot block 都有较高的 touch count 值；然而，当当以个block moved from the cold to the hot end of the LRU list 其值会reset 为0 ；

select a.hladdr, a.file#, a.dbablk, a.tch, a.obj, b.object_name
from x$bh a, dba_objects b
where  (a.obj = b.object_id  or  a.obj = b.data_object_id)
and a.hladdr = ’00000400837D7800’
union
select hladdr, file#, dbablk, tch, obj, null
from x$bh
where  obj in (select obj from x$bh where hladdr = ’00000400837D7800’
minus
select object_id from dba_objects
minus
select data_object_id from dba_objects)
and hladdr = ’00000400837D7800’
order by 4;

HLADDR          FILE#  DBABLK  TCH       OBJ OBJECT_NAME
---------------- ----- ------- ---- ----------- --------------------
00000400837D7800 16  105132 0    19139 ROUTE_HISTORY
00000400837D7800 16  106156 0    19163 TELCO_ORDERS
00000400837D7800 26 98877 0    23346 T1
00000400837D7800 16 61100 0    19163 TELCO_ORDERS
00000400837D7800 16 26284 0    19059 FP_EQ_TASKS
00000400837D7800    7  144470 0    18892 REPORT_PROCESS_QUEUE
00000400837D7800    8  145781 0    18854 PA_EQUIPMENT_UNION
00000400837D7800 249  244085 0  4294967295
00000400837D7800    7 31823 1    18719 CANDIDATE_EVENTS
00000400837D7800 13  100154 1    19251 EVENT
00000400837D7800    7 25679 1    18730 CANDIDATE_ZONING
00000400837D7800    7 8271 1    18719 CANDIDATE_EVENTS
00000400837D7800    7 32847 2    18719 CANDIDATE_EVENTS
00000400837D7800    8 49518 2    18719 CANDIDATE_EVENTS
00000400837D7800    7 85071 2    18719 CANDIDATE_EVENTS
00000400837D7800 275 76948 2  4294967295
00000400837D7800    7 41039 3    18719 CANDIDATE_EVENTS
00000400837D7800    7 37967 4    18719 CANDIDATE_EVENTS
00000400837D7800    8 67950 4    18719 CANDIDATE_EVENTS
00000400837D7800    7 33871 7    18719 CANDIDATE_EVENTS
00000400837D7800    7 59471 7    18719 CANDIDATE_EVENTS
00000400837D7800    8 8558 24    18719 CANDIDATE_EVENTS

Hot block 是应用问题，应当从应用中找出为何要重复的读取同一种block；也可以将 hot block进行分散：
• Deleting and reinserting some of the rows by ROWID.
• Exporting the table, increasing the PCTFREE significantly, and importing the data. This minimizes the number of rows per block, spreading them over many blocks. Of course, this is at the expense of storage and full table scans operations will be slower.
• Minimizing the number of records per block in the table. This involves dumping a few data blocks to get an idea of the current number of rows per block. Refer to the “Data Block Dump” section in Appendix C for the syntax. The “nrow” in the trace file shows the number of rows per block. Export and truncate the table. Manually insert the number of rows that you determined is appropriate and then issue the ALTER TABLE table_name MINIMIZE RECORDS_PER_BLOCK command. Truncate the table and import the data.
• For indexes, you can rebuild them with higher PCTFREE values, bearing in mind that this may increase the height of the index.
• Consider reducing the block size. Starting in Oracle9i Database, Oracle supports multiple block sizes. If the current block size is 16K, you may move the table or recreate the index in a tablespace with an 8K block size. This too will negatively impact full table scans operations. Also, various block sizes increase management complexity.
• --------------------------------------------------------------------------------------
Contention for Cache Buffers Chains Latches—Long Hash Chains

一个hash bucket 可以包含多个数据块；并且由 hash bucket 的 hash chain 的一个 pointers 来linked；在大型数据库中一个hash chain 的 blocks 数可以有数百个;当进程持有 latch 并且请求block时，就需要扫描 hash chain；这种现象称为：chasing the chain；在扫描的过程中对latch的持有事件会变长，可能会导致其他process 不能获得latch；
到8.0 后判断 chain的长度是比较容易的因为 hash latches,hash bucket ,hash chains 的比例是1:1:1，并且 hash chain的长度是与 latch保护的 block相等的：
select hladdr, count(*)
from x$bh
group by hladdr
order by 2;
到8i ，比例变为了 1：m，也就是说不能再确定特定的 hash chain的长度；在一定程度上只能判定特定latch的包含的 block数；上面的语句将报告latch的 block数目；在判断一个hash latch 是否过载之前，hash latch ，hash bucket 的比率一定要作为优先考虑的条件；

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Cache buffers lru chain latches：
对于 hash chain ，buffer headers 同样也是与其他的列表相联系的比如：lru，lruw，ckpt-q； LRU 和 LRUW 是oracle中比较原始的两个列表；LRU 包含了 buffer的各种状态，但是LRUW 只是包含了脏数据的buffer的；LRU 和LRUW 是护持的，他们合在一起称为一个 working set；每个set 都被
一个 cache buffer lru chain latch 来决定的；换句话说就是 cache buffer lru chain latches的数量决定了 working set的数量；通过X$kcbwds 来查询 working set 的数目；addr 是 set_latch的地址：

LRU + LRUW = A Working Set
select set_id, set_latch
from x$kcbwds
order by set_id;

SET_ID SET_LATC
---------- --------
1 247E299C
2 247E2E68
3 247E3334
4 247E3800
5 247E3CCC
6 247E4198
7 247E4664
8 247E4B30

select addr
from v$latch_children
where name = ’cache buffers lru chain’
order by addr;

ADDR
--------
247E299C
247E2E68
247E3334
247E3800
247E3CCC
247E4198
247E4664
247E4B30

在典型情况下，前台进程在寻找 free buffer 时，需要读取 lru 列表；dbwr 后台进程通过读取 lru 列表，来移除 lruw的buffer 或者，向LRUW中添加脏数据；所有的进程必须通过 cache buffers lru chain latch 来进行对working set的任何操作；

DB_2K_CACHE_SIZE, DB_4K_CACHE_SIZE, DB_8K_CACHE_SIZE, DB_16K_CACHE_SIZE, DB_32K_CACHE_SIZE, DB_CACHE_SIZE, DB_KEEP_CACHE_SIZE, and DB_RECYCLE_CACHE_SIZE  池中的 buffers 是在 cache buffer lru chain latch 之间进行分配的；每个缓冲池，应当有一个 cache buffer lru chain latch；否则工作集就会太长；在9i 与 10g 中默认的数是 cpu的 4 倍；除非 db_writer_processes 的数目大于4 ，此时其值是由 db_writer_processes 来决定的；在增加cpu数目时，可以通过_DB_BLOCK_LRU_LATCHES 来进行修改；
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ROW cache objects latches：

Row cache object latch 用于保护 oracle的数据字典；从oracle 7 开始数据字典变成了共享池的一部分；在此之前每个数据字典对象有一个独立的DC_*初始化参数；这就意味着数据字典不能再进行直接是优化；而是只能通过优化 shared_POOL_SIZE 参数来间接的优化；视图v$rowcache记录每个数据字典对象的信息；
select cache#, type, parameter, gets, getmisses, modifications mod
from v$rowcache
where  gets > 0
order by gets;

CACHE# TYPE       PARAMETER             GETS  GETMISSES MOD
------ ----------- ------------------ ---------- ---------- ------
7 SUBORDINATE dc_user_grants       1615488       75    0
2 PARENT    dc_sequences       2119254    189754 100
15 PARENT    dc_database_links    2268663       2    0
10 PARENT    dc_usernames       7702353       46    0
8 PARENT    dc_objects          11280602    12719 400
7 PARENT    dc_users          81128420       78    0
16 PARENT    dc_histogram_defs 182648396    51537    0
11 PARENT    dc_object_ids    250841842    3939    75

Row cache 的优化是非常有限度的；最有效就是减少对数据字典的读取；比如如果序列有问题，就应该优化序列；多表或者基于视图的视图都会增加竞争；通常的做法就是增加 shared_pool_size;

Jermaine

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Oracle Latch牛刀小试

<br />Latch free 等待事件，有单个参数：latch address，latch number，number of tries；在处理此等待事件的时候要注意一下几点：<br /> latch 只应用于 sga的内存结构；不适用与数据库的 objects；sga有多个latch 用来保护各种内存结构；<br /> session 对等待特殊的 latch；<br /> 到9i latch free 等待事件展现了所有的latch，但是10g 开始就开始进行了详
复制链接

扫一扫

专栏目录