一套AIX上的11.1.0.7系统,应用启动时出现大量row cache lock等待,具体的systemstate dump信息如下:
FILE VERSIONS
-----------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/11.1.0/database
System name: AIX
Node name: HA5_4ADB01
Release: 3
Version: 5
Machine: 000687C2D900
Instance name: HN4A1


TRACE FILE
---------------------------
Filename=HN4A1_ora_2061038.trc



Chains most likely to have caused the hang:
 [a] Chain 1 Signature: 'row cache lock'<='row cache lock' (cycle)
     Chain 1 Signature Hash: 0x75bdd0c
 [b] Chain 2 Signature: 'row cache lock'<='row cache lock' (cycle)
     Chain 2 Signature Hash: 0x75bdd0c
 [c] Chain 3 Signature: 'row cache lock'<='row cache lock' (cycle)
     Chain 3 Signature Hash: 0x75bdd0c

===============================================================================
Cycles:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
    Oracle session identified by:
    {
                instance: 1 (hn4a.hn4a1)
                   os id: 2114372
              process id: 211, oracle@HA5_4ADB01
              session id: 610
        session serial #: 5
    }
    is waiting for 'row cache lock' with wait info:
    {
                      p1: 'cache id'=0x7
                      p2: 'mode'=0x0
                      p3: 'request'=0x5
            time in wait: 0.491964 sec
      heur. time in wait: 26.859741 sec
           timeout after: 2.508036 sec
                 wait id: 311
                blocking: 1 session
            wait history:
              1.     event: 'row cache lock'
                   wait id: 310             p1: 'cache id'=0x7
               time waited: 2.929713 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
              2.     event: 'row cache lock'
                   wait id: 309             p1: 'cache id'=0x7
               time waited: 2.929726 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
              3.     event: 'row cache lock'
                   wait id: 308             p1: 'cache id'=0x7
               time waited: 2.929720 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
    }
    and is blocked by
 => Oracle session identified by:
    {
                instance: 1 (hn4a.hn4a1)
                   os id: 2118476
              process id: 209, oracle@HA5_4ADB01
              session id: 616
        session serial #: 5
    }
    which is waiting for 'row cache lock' with wait info:
    {
                      p1: 'cache id'=0x7
                      p2: 'mode'=0x0
                      p3: 'request'=0x5
            time in wait: 0.491910 sec
      heur. time in wait: 26.859705 sec
           timeout after: 2.508090 sec
                 wait id: 308
                blocking: 1 session
            wait history:
              1.     event: 'row cache lock'
                   wait id: 307             p1: 'cache id'=0x7
               time waited: 2.929713 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
              2.     event: 'row cache lock'
                   wait id: 306             p1: 'cache id'=0x7
               time waited: 2.929729 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
              3.     event: 'row cache lock'
                   wait id: 305             p1: 'cache id'=0x7
               time waited: 2.929715 sec    p2: 'mode'=0x0
                                            p3: 'request'=0x5
    }
    and is blocked by the session at the start of the chain.

Chain 1 Signature: 'row cache lock'<='row cache lock' (cycle)
Chain 1 Signature Hash: 0x75bdd0c

...==>many similar chains

PROCESS 211:
  ----------------------------------------
  SO: 0x7000008a1370178, type: 2, owner: 0x0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x7000008a1370178, name=process, file=ksu.h LINE:10706 ID:, pg=0
  (process) Oracle pid:211, ser:3, calls cur/top: 0x7000008ac61a0e8/0x7000008ac61a0e8
            flags : (0x0) -
            flags2: (0x0),  flags3: (0x0)
            int error: 0, call error: 0, sess error: 0, txn error 0
  ksudlp FALSE at location: 0
  (post info) last post received: 0 0 201
              last post received-location: kqr.h LINE:2181 ID:kqrbgl: compatible mode
              last process to post me: 70000089134b800 1 6
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: 0x7000008a153a4c8
    O/S info: user: oracle, term: UNKNOWN, ospid: 2114372
    OSD pid info: Unix process pid: 2114372, image: oracle@HA5_4ADB01
    Short stack dump:
0 start with grantee#=:1 and privilege#>0
      hash=8973f7c629f09e3f081962c2a3c59c94 idn=0
众多wait chain的源头均是row cache lock,systemstate转储中该等待事件相关的p1,也就是cache id均为7;cache#为7的字典缓存代表着dc_users,即字典中的用户信息(包括用户名,密码等信息):
SQL> select cache#,parameter from v$rowcache group by cache#,parameter order by cache#;

    CACHE# PARAMETER
---------- --------------------------------
         0 dc_tablespaces
         1 dc_free_extents
         2 dc_segments
         3 dc_rollback_segments
         4 dc_used_extents
         5 dc_tablespace_quotas
         6 dc_files
         7 dc_users
         8 dc_object_grants
         8 dc_objects
         9 dc_qmc_cache_entries

    CACHE# PARAMETER
---------- --------------------------------
        10 dc_usernames
        11 dc_object_ids
        12 dc_constraints
        13 dc_sequences
        14 dc_profiles
        15 dc_database_links
        16 dc_histogram_data
        16 dc_histogram_defs
        17 dc_global_oids
        18 dc_outlines
        19 dc_partition_scns

    CACHE# PARAMETER
---------- --------------------------------
        19 dc_table_scns
        20 rule_info
        21 rule_fast_operators
        21 rule_or_piece
        22 dc_awr_control
        23 dc_qmc_ldap_cache_entries
        24 outstanding_alerts
        25 dc_hintsets
        26 global database name
        27 qmtmrcin_cache_entries
        28 qmtmrctn_cache_entries

    CACHE# PARAMETER
---------- --------------------------------
        29 qmtmrcip_cache_entries
        30 qmtmrctp_cache_entries
        31 qmtmrciq_cache_entries
        32 qmtmrctq_cache_entries
        33 kqlsubheap_object
        34 realm auth
        34 realm cache
        35 Command rule cache
        36 Realm Object cache
        36 Realm Subordinate Cache
        37 event map

    CACHE# PARAMETER
---------- --------------------------------
        38 format
        39 audit collector
        40 Rule Set Cache

47 rows selected.
大量的dc_users字典缓存锁可能引起后续会话登录挂起,用户登录无法成功,乃至整个实例hang住。那么又是什么样的操作可能引起这种row cache lock并发锁等待事件呢? 应用维护人员反馈问题发生时部分应用程序配置文件中的数据库用户密码不正确,同时该应用会在短期内发起大量会话登录数据库,但应用人员观察到这一问题后并没有联想到可能引发进一步的实例上的dc_user row cache lock问题,仅尝试修改密码并重启该部分的应用程序。至此该问题的root cause已经较为明了:由于短期内出现大量连接请求但连接发起端给出的密码均不正确,引起了Server端严重的用户信息字典缓存锁(users dictionary cache