Latch Lock

最新推荐文章于 2024-10-15 18:50:01 发布

liuzhaomin

最新推荐文章于 2024-10-15 18:50:01 发布

阅读量228

点赞数

文章标签： Cache Oracle SQL 数据结构金山

本文链接：https://blog.csdn.net/liuzhaomin/article/details/83875101

版权

latch：保护内存的，随机机制

lock：保护事务的，队列机制

我从西安数据库采集的数据

select addr, name, misses from v$latch_children where misses>50 and name='cache buffers chains';

查看cbc中哪个栓错过次数超过50

ADDR NAME MISSES
---------------- -------------------- ----------
0000000139193ED0 cache buffers chains 68

select dbarfil, dbablk from x$bh where hladdr='0000000139193ED0'

获取文件号和块号

select owner, segment_name, segment_type from dba_extents where file_id=4 and block_id between block_id+blocks;

查到到底是哪个用户哪个表造成栓竞争激烈。

select name, count(*) from v$latch_children group by name;

所有的门栓按名称分组，查看每个分组的个数

select name, count(*) from v$latch_children where name='cache buffers chains' group by name

NAME COUNT(*)
---------------------------------------- ----------
cache buffers chains 4096

数据缓冲区一共有4096个栓

select count(*) from v$bh;

COUNT(*)
--------
60129

内存中里一共有60129个数据块。

	Latches	Locks
Purpose	Serve a single purpose: to provide exclusive access to memory structures. (Starting in Oracle9i Database, the cache buffers chains latches are shareable for read-only.)	Serve two purposes: to allow multiple processes to share the same resource when the lock modes are compatible and to enforce exclusive access to the resource when the lock modes are incompatible.
Jurisdiction	Apply only to data structures in the SGA. Protect memory objects, which are temporary. Control access to a memory structure for a single operation. Not transactional.	Protect database objects such as tables, data blocks, and state objects. Application driven and control access to data or metadata in the database. Transactional.
Acquisition	Can be requested in two modes: willing-to-wait or no-wait.	Can be requested in six different modes: null, row share, row exclusive, share, share row exclusive, or exclusive.
Scope	Information is kept in the memory and is only visible to the local instance—latches operate at instance level.	Information is kept in the database and is visible to all instances accessing the database—locks operate at database-level.
Complexity	Implemented using simple instructions, typically, test-and-set, compare-and-swap, or simple CPU instructions. Implementation is port specific because the CPU instructions are machine dependent. Lightweight.	Implemented using a series of instructions with context switches. Heavyweight.
Duration	Held briefly (in microseconds).	Normally held for an extended period of time (transactional duration).
Queue	When a process goes to sleep after failing to acquire a latch, its request is not queued and serviced in order (with a few exceptions—for example, the latch wait list latch has a queue). Latches are fair game and up for grabs.	When a process fails to get a lock, its request is queued and serviced in order, unless the NOWAIT option is specified.
Deadlock	Latches are implemented in such a way that they are not subject to deadlocks.	Locks support queuing and are subject to deadlocks. A trace file is generated each time a deadlock occurs.

Latch of Oracle
这应该是我写的最痛苦的一篇文章，向各位阐述Oracle的Latch机制，Latch，用金山词霸翻译是门插栓，闭锁，专业术语叫锁存器，我开始接触时就不大明白为什么不写Lock，不都是锁吗？只是翻译不同而以？研究过后才知道两者有很大的区别。
Latch是Oracle提供的轻量级锁资源，他用于快速，短时间的锁定资源，防止多个并发进程同时修改访问某个共享资源，他只工作在内存中，我们可以不大准确的说，内存中资源的锁叫latch，数据库对象（表，索引等）的锁叫Lock。比如数据缓存中的某个块要被读取，我们会获得这个块的latch，这个过程叫做pin，另外一个进程恰好要修改这个块，他也要pin这个块，此时他必须等待，当前一个进程释放latch后才能pin住，然后修改，如果多个进程同时请求的话，他们之间将出现竞争，没有一个入队机制，一旦前面进程释放所定，后面的进程就蜂拥而上，没有先来后到的概念，这个和Lock是有本质区别的，这一切都发生的非常快，因为Latch的特点是快而短暂，当然这个只是大致过程，细节部分在后面讨论
先来看下Latch和Lock的区别，
1． Latch是对内存数据结构提供互斥访问的一种机制，而Lock是以不同的模式来套取共享资源对象，各个模式间存在着兼容或排斥，从这点看出，Latch的访问，包括查询也是互斥的，任何时候，只能有一个进程能pin住内存的某一块，幸好这个过程是相当的短暂，否则系统性能将没的保障，现在从9I开始，允许多个进程同时查询相同的内存块，但性能并没有想象中的好。
2． Latch只作用于内存中，他只能被当前实例访问，而L ock作用于数据库对象，在RAC体系中实例间允许Lock检测与访问
3． Latch是瞬间的占用，释放，Lock的释放需要等到事务正确的结束，他占用的时间长短由事务大小决定
4． Latch是非入队的，而Lock是入队的
5． Latch不存在死锁，而Lock中存在（死锁在Oracle中是非常少见的）
看看下面这个例子，你会感觉到Latch的存在
SQL> CREATE TABLE MYTEST AS SELECT OBJECT_NAME FROM USER_OBJECTS WHERE ROWNUM <= 4;
Table created
SQL> SET TIMING ON
SQL>
DECLARE lv_name VARCHAR2(25) := '';
BEGIN
   FOR i IN 1..100000 LOOP
      SELECT OBJECT_NAME INTO lv_name FROM MYTEST WHERE ROWNUM = 1;
   END LOOP;
END;
/
PL/SQL procedure successfully completed
Executed in 3.359 seconds
这个进程不断的访问表上的同一个数据块，他先会物理读取数据块到数据缓冲区，然后在内存中不断的获取这个块的latch，现在只有单个进程，运行的还好，10万次用了3秒多，但当我拉出4个窗口同时并发的运行这个语句时，问题就出现了，多个进程PIN同一个数据块，每个大概花了15秒，并且看到他们一个一个的结束，到最后只剩一个时一闪就过去了，因为没人和他抢了，这个实验展现了Latch竞争的现象，对于9I提出的查询可以共享Latch在此我表示了质疑。

现在来看看进程获取Latch的详细过程，任何时候，只有一个进程可以访问内存中的某一个块（9I提出的Latch共享我不想考虑），如果进程因为别的进程正占用块而无法获得Latch时，他会对CPU进行一次spin(旋转)，时间非常的短暂，spin过后继续获取，不成功仍然spin，直到spin次数到达阀值限制（这个由隐含参数_spin_count指定），此时进程会停止spin,进行短期的休眠，休眠过后会继续刚才的动作，直到获取块上的Latch为止。进程休眠的时间也是存在算法的，他会随着spin次数而递增，以厘秒为单位，如1，1，2，2，4，4，8，8，。。。休眠的阀值限制由隐含参数_max_exponential_sleep控制，默认是2秒，如果当前进程已经占用了别的Latch，则他的休眠时间不会太长（过长会引起别的进程的Latch等待），此时的休眠最大时间有隐含参数_max_sleep_holding_latch决定，默认是4厘秒。这种时间限制的休眠又称为短期等待，另外一种情况是长期等待锁存器（Latch Wait Posting），此时等待进程请求Latch不成功，进入休眠，他会向锁存器等待链表（Latch Wait List）压入一条信号，表示获取Latch的请求，当占用进程释放Latch时会检查Latch Wait List，向请求的进程传递一个信号，激活休眠的进程。Latch Wait List是在SGA区维护的一个进程列表，他也需要Latch来保证其正常运行，默认情况下share pool latch和library cache latch是采用这个机制，如果将隐含参数_latch_wait_posting设置为2，则所有Latch都采用这种等待方式，使用这种方式能够比较精确的唤醒某个等待的进程，但维护Latch Wait List需要系统资源，并且对Latch Wait List上Latch的竞争也可能出现瓶颈。
如果一个进程请求，旋转，休眠Latch用了很长时间，他会通知PMON进程，查看Latch的占用进程是否已经意外终止或死亡，如果是则PMON会清除释放占用的Latch资源。
现在大家可以明白，对Latch获取的流程了，请求-SPIN-休眠-请求-SPIN-休眠。。。占用，这里有人会问为什么要SPIN，为什么不直接休眠等待？这里要明白休眠意味着什么，他意味着暂时的放弃CPU，进行上下文切换（context switch），这样CPU要保存当前进程运行时的一些状态信息，比如堆栈，信号量等数据结构，然后引入后续进程的状态信息，处理完后再切换回原来的进程状态，这个过程如果频繁的发生在一个高事务，高并发进程的处理系统里面，将是个很昂贵的资源消耗，所以他选择了spin，让进程继续占有CPU，运行一些空指令，之后继续请求，继续spin，直到达到_spin_count值，这时会放弃CPU，进行短暂的休眠，再继续刚才的动作，Oracle软件就是这么设计的，世界大师们的杰作，自然有他的道理，我就不在这上面再费文字了。

系统发生关于Latch的等待是没发避免的，因为这是Oracle的运作机制，当你看到很高的Latch get时并不意味着你的系统需要调整，有时候很高的get值背后只有很短的等待时间，我们调整的对象应该以消耗的时间来圈定，而不是看到一个很高的获取次数值，当然，获取值异常的高出别的等待时间几十万倍时我们还是要关心的，Oracle关于Latch的等待非常繁多，主要的包括share pool,library cache,cache buffer chains,buffer busy wait，每一个的调整几乎都可以写几页纸，以后慢慢完成吧

闩锁统计信息
Latch是一种低级排队机制，用于防止对内存结构的并行访问，保护系统全局区(SGA)共享内存结构。Latch是一种快速地被获取和释放的内存锁。如果latch不可用，就会记录latch free miss。

有两种类型的Latch:willing to wait和（immediate）not willing to wait。

对于愿意等待类型(willing-to-wait)的latch,如果一个进程在第一次尝试中没有获得latch,那么它会等待并且再尝试一次,如果经过_spin_count次争夺不能获得latch,然后该进程转入睡眠状态，百分之一秒之后醒来，按顺序重复以前的步骤。在8i/9i中默认值是_spin_count=2000。睡眠的时间会越来越长。

　　对于不愿意等待类型(not-willing-to-wait)的latch，如果该闩不能立即得到的话，那么该进程就不会为获得该闩而等待。它将继续执行另一个操作。

　　大多数Latch问题都可以归结为以下几种:

　　没有很好的是用绑定变量(library cache latch和shared pool cache)、重作生成问题(redo allocation latch)、缓冲存储竞争问题(cache buffers LRU chain)，以及buffer cache中的存在"热点"块(cache buffers chain)。

另外也有一些latch等待与bug有关，应当关注Metalink相关bug的公布及补丁的发布。

当latch miss ratios大于0.5%时，就需要检查latch的等待问题。

如果SQL语句不能调整，在8.1.6版本以上，可以通过设置CURSOR_SHARING = force在服务器端强制绑定变量。设置该参数可能会带来一定的副作用，可能会导致执行计划不优，另外对于Java的程序，有相关的bug，具体应用应该关注Metalink的bug公告。

下面对几个重要类型的latch等待加以说明：

1)    latch free：当‘latch free’在报告的高等待事件中出现时，就表示可能出现了性能问题，就需要在这一部分详细分析出现等待的具体的latch的类型，然后再调整。

2)    cache buffers chain：cbc latch表明热块。为什么这会表示存在热块？为了理解这个问题，先要理解cbc的作用。ORACLE对buffer cache管理是以hash链表的方式来实现的（oracle称为buckets，buckets的数量由_db_block_hash_buckets定义）。cbc latch就是为了保护buffer cache而设置的。当有并发的访问需求时，cbc会将这些访问串行化，当我们获得cbc latch的控制权时，就可以开始访问数据，如果我们所请求的数据正好的某个buckets中，那就直接从内存中读取数据，完成之后释放cbc latch，cbc latch就可以被其他的用户获取了。cbc latch获取和释放是非常快速的，所以这种情况下就一般不会存在等待。但是如果我们请求的数据在内存中不存在，就需要到物理磁盘上读取数据，这相对于latch来说就是一个相当长的时间了，当找到对应的数据块时，如果有其他用户正在访问这个数据块，并且数据块上也没有空闲的ITL槽来接收本次请求，就必须等待。在这过程中，我们因为没有得到请求的数据，就一直占有cbc latch，其他的用户也就无法获取cbc latch，所以就出现了cbc latch等待的情况。所以这种等待归根结底就是由于数据块比较hot的造成的。
解决方法可以参考前面在等待事件中的3)buffer busy wait中关于热块的解决方法。

3)    cache buffers lru chain：该latch用于扫描buffer的LRU链表。三种情况可导致争用：1）buffer cache太小；2）buffer cache的过度使用，或者太多的基于cache的排序操作；3）DBWR不及时。解决方法：查找逻辑读过高的statement，增大buffer cache。

4)    Library cache and shared pool争用：
library cache是一个hash table，我们需要通过一个hash buckets数组来访问（类似buffer cache）。library cache latch就是将对library cache的访问串行化。当有一个sql（或者PL/SQL procedure，package，function，trigger）需要执行的时候，首先需要获取一个latch，然后library cache latch就会去查询library cache以重用这些语句。在8i中，library cache latch只有一个。在9i中，有7个child latch，这个数量可以通过参数_KGL_LATCH_ COUNT修改（最大可以达到66个）。当共享池太小或者语句的reuse低的时候，会出现‘shared pool’、‘library cache pin’或者‘library cache’ latch的争用。解决的方法是：增大共享池或者设置CURSOR_SHARING=FORCE|SIMILAR，当然我们也需要tuning SQL statement。为减少争用，我们也可以把一些比较大的SQL或者过程利用DBMS_SHARED_POOL.KEEP包来pinning在shared pool中。
shared pool内存结构与buffer cache类似，也采用的是hash方式来管理的。共享池有一个固定数量的hash buckets，通过固定数量的library cache latch来串行化保护这段内存的使用。在数据启动的时候，会分配509个hash buctets，2*CPU_COUNT个library cache latch。当在数据库的使用中，共享池中的对象越来越多，oracle就会以以下的递增方式增加hash buckets的数量：509,1021,4093,8191,32749,65521,131071,4292967293。我们可以通过设置下面的参数来实现_KGL_BUCKET_COUNT，参数的默认值是0，代表数量509，最大我们可以设置为8，代表数量131071。
我们可以通过x$ksmsp来查看具体的共享池内存段情况，主要关注下面几个字段：
KSMCHCOM—表示内存段的类型
ksmchptr—表示内存段的物理地址
ksmchsiz—表示内存段的大小
ksmchcls—表示内存段的分类。recr表示a recreatable piece currently in use that can be a candidate for flushing when the shared pool is low in available memory; freeabl表示当前正在使用的，能够被释放的段；free表示空闲的未分配的段；perm表示不能被释放永久分配段。
降低共享池的latch争用，我们主要可以考虑如下的几个事件：
1、使用绑定变量
2、使用cursor sharing
3、设置session_cached_cursors参数。该参数的作用是将cursor从shared pool转移到pga中。减小对共享池的争用。一般初始的值可以设置为100，然后视情况再作调整。
4、设置合适大小的共享池

5)    Redo Copy：这个latch用来从PGA中copy redo records到redo log buffer。latch的初始数量是2*COU_OUNT，可以通过设置参数_LOG_SIMULTANEOUS_COPIES在增加latch的数量，减小争用。

6)    Redo allocation：该latch用于redo log buffer的分配。减小这种类型的争用的方法有3个：
增大redo log buffer
适当使用nologging选项
避免不必要的commit操作

7)    Row cache objects：该latch出现争用，通常表明数据字典存在争用的情况，这往往也预示着过多的依赖于公共同义词的parse。解决方法：1）增大shared pool 2）使用本地管理的表空间，尤其对于索引表空间

Latch事件
建议解决方法

Library cache
使用绑定变量;调整shared_pool_size.

Shared pool
使用绑定变量;调整shared_pool_size.

Redo allocation
减小redo的产生；避免不必要的commits.

Redo copy
增加_log_simultaneous_copies.

Row cache objects
增加shared_pool_size

Cache buffers chain
增大_DB_BLOCK_HASH_BUCKETS；make it prime.

Cache buffers LRU chain
使用多个缓冲池；调整引起大量逻辑读的查询

latch竞争总结

在Oracle中，Latch的概念是非常重要的，v$latch表的每一行包括了对不同类型latch的统计，每一列反映了不同类型的latch请求的活动情况。不同类型的latch请求之间的区别在于，当latch不可立即获得时，请求进程是否继续进行。按此分类，latch请求的类型可分为两类：willing-to-wait和immediate。

latch free，相信跟大家并不陌生，在v$session_wait和Top5中会出现，当然出现类似的内容，就证明Latch产生了竞争，并且已经影响到了你的系统性能。

首先我们来列举一下Latch出现竞争的几种常见情况：
      1、cache buffers chains
   2、shared pool
   3、library cache

当然，我们需要一个一个来进行解释和分析，首先我们先来说下cache buffers chains

关于LATCH产生得解释：

Blocks in the buffer cache are placed on linked lists(cache buffer chains) which hang off a hash table.The hash chain that a block is placed on is based on the DBA and CLASS of the block. Each hash chain is protected by a single child latch. Processes need to get the relevant latch to allow them the scan a hash chain for a buffer so that the linked list does not change underneath them.
Contention for these latches can be caused by:

- Very long buffer chains.
- very very heavy access to the same blocks.

现在对数据库的实际操作过程：

SQL> select count(*)
  from v$latch_children
where misses > 0
and name = 'cache buffers chains';

COUNT(*)
----------
2048

SQL> select addr, name, misses
  from v$latch_children
where misses > 0
and name = 'cache buffers chains'
order by misses desc;
ADDR NAME MISSES

-------- ---------------------------------------------------------------- ----------
69CC28BC cache buffers chains 1591
69A3CF1C cache buffers chains 1591
69CBDDFC cache buffers chains 1589
69B92DFC cache buffers chains 1586
69C5DEBC cache buffers chains 1585
69AB0354 cache buffers chains 1585
69A70F9C cache buffers chains 1585
69A81F54 cache buffers chains 1585

SQL> select bh.addr, obj.name obj_name, bh.tch touch
  from x$bh bh,
   sys.file$ f,
   v$datafile fl,
   sys.obj$ obj,
   sys.ts$ ts
where fl.file# = f.file#
and bh.file# = fl.file#
and obj.dataobj# = bh.obj
and bh.ts# = ts.ts#
and bh.HLADDR in(
   select addr from v$latch_children where misses>0 and name='cache buffers chains'
)
and bh.tch > 0
order by bh.tch desc;

ADDR OBJ_NAME                         TOUCH
-------- ------------------------------ ----------
B6FD3078 IDX_GCTID_IUID_GM634          24
B6FD3078 REG_LOG                      8
B6FD2F9C AGENT_CARD_TYPE             7
B6FD3078 RESELLER_AGENTCARD_PRICE    6
B6FD3078 RESELLER_LOG                6
B6FD3078 IDX_ACL_AGENTID_LOGTIME       6
B6FD3078 RESELLER_LOG                6

就是上面涉及到的这些对象，造成LATCH

SQL> select COUNT(*)
  from x$bh bh, sys.file$ f, v$datafile fl, sys.obj$ obj, sys.ts$ ts
where fl.file# = f.file#
and bh.file# = fl.file#
and obj.dataobj# = bh.obj
and bh.ts# = ts.ts#
and bh.HLADDR = [x$bh.addr] --物理地址
and bh.tch > 0;

COUNT(*)
----------
51

一段文档资料：

Under 8.0, the default was next_prime(db_block_buffers/4), and the
number of _db_block_hash_latches was 1:1 with the number of buckets.
Under 8i, the world changed a lot. The default number of hash buckets
is 2 * db_block_buffers, but the latches work differently. It's
really not necessary to have one latch per hash chain, so, Oracle made
them a pooled resource. When you need to interrogate a hash chain,
you grab a latch from the pool and assign it to a hash chain. That
prevents anyone else from modifying the chain or it's contents while
your process is using it. So, in 8i, the size of the latch pool is
dynamic but is set to 1024 for most cases. It's smaller for very
small buffer caches and larger for very large buffer caches. The
formula is:
if (db_block_buffers < 2052) then
db_block_hash_latches = 2^trunc(log(2,db_block_buffers - 4) - 1)
else if(2052 =< db_block_buffers <= 131075) then
db_block_hash_latches = 1024
else if(db_block_buffers > 131075)
db_block_hash_latches = 2^trunc(log(2,db_block_buffers - 4) - 6)
end if

So, under 8i, you probably don't need to touch _db_block_hash_buckets,
as 2 * db_block_buffers is almost certainly more than adequate. And
unless you're dealing huge numbers of concurrent users and a
relatively small buffer cache, you probably don't need to mess with
_db_block_hash_latches, either.

增大 _db_buffer_hash_latches 可以更快速的查找到 blocks 并且降低 cache buffer chains 等待
我的操作系统是9I,db_block_buffers 为DB_CACHE_SIZE，如果我调整此参数，那么我应该是采取
db_block_hash_latches = 2^trunc(log(2,DB_CACHE_SIZE - 4) - 6)

SQL> show parameter db_cache_size;

NAME TYPE VALUE
------------------------------------ -------------------------------- ------------------------------
db_cache_size big integer 1073741824

SQL> select name,
   value,
   decode(isdefault, 'TRUE', 'Y', 'N') as "Default",
   decode(ISEM, 'TRUE', 'Y', 'N') as SesMod,
   decode(ISYM, 'IMMEDIATE', 'I', 'DEFERRED', 'D', 'FALSE', 'N') as SysMod,
   decode(IMOD, 'MODIFIED', 'U', 'SYS_MODIFIED', 'S', 'N') as Modified,
   decode(IADJ, 'TRUE', 'Y', 'N') as Adjusted,
   description
  from ( --GV$SYSTEM_PARAMETER
      select x.inst_id as instance,
            x.indx + 1,
            ksppinm as name,
            ksppity,
            ksppstvl as value,
            ksppstdf as isdefault,
            decode(bitand(ksppiflg / 256, 1), 1, 'TRUE', 'FALSE') as ISEM,
            decode(bitand(ksppiflg / 65536, 3),
                     1,
                     'IMMEDIATE',
                     2,
                     'DEFERRED',
                     'FALSE') as ISYM,
            decode(bitand(ksppstvf, 7), 1, 'MODIFIED', 'FALSE') as IMOD,
            decode(bitand(ksppstvf, 2), 2, 'TRUE', 'FALSE') as IADJ,
            ksppdesc as description
      from x$ksppi x, x$ksppsv y
      where x.indx = y.indx
         and substr(ksppinm, 1, 1) = '_'
         and x.inst_id = USERENV('Instance'))
where name = '_db_block_hash_latches'
order by name;

NAME VALUE
------------------------------ ---------------
_db_block_hash_latches 2048

SQL> select power(2,trunc(log(2,1073741824 - 4) - 6)) from dual;

POWER(2,TRUNC(LOG(2,1073741824-4)-6))
-------------------------------------
8388608

总得来说，LATCH产生得原因还是从应用入手，不要期望通过调整某些参数达到立杆见影的效果，热块、大的逻辑读和物理读，全表扫描都是会导致产生LATCH得原因。