一、案例背景
某项目Oracle 11g rac环境下,修改用户密码时会话被hang住,并且客户端用户在登录时也会被hang住。
现象如下:
SQL> alter user XXX identified by xxxxx;----->在修改时就会hang住,会话一直在等待
SQL> conn XXX/xxxx;----->无法进入用户会话
此时抓取hang Analyze以及做Systemstate Analysis获取trace分析
sqlplus -prelim / as sysdba
oradebug setorapname reco
oradebug unlimit
oradebug -g all hanganalyze 3
--等一分钟后再次执行hanganalyze分析
oradebug -g all hanganalyze 3
oradebug -g all dump systemstate 266
oradebug -g all dump systemstate 266
分析dump文件发现有上百个”library cache lock“等待事件,并且源头都是来自于同一个会话。
附trace log:
*** 2019-02-21 17:33:14.660
===============================================================================
HANG ANALYSIS:
instances (db_name.oracle_sid): zjedb.zjedb1, zjedb.zjedb2
oradebug_node_dump_level: 3
analysis initiated by oradebug
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 17:33:14 ]
NOTE: scheduling delay has not been sampled for 0.340580 secs 0.000000 secs from [ 17:33:10 - 17:33:15 ], 5 sec avg
0.000000 secs from [ 17:32:15 - 17:33:15 ], 1 min avg
0.000000 secs from [ 17:28:14 - 17:33:15 ], 5 min avg
vktm time drift history
===============================================================================
Chains most likely to have caused the hang:
[a] Chain 1 Signature: <not in a wait><='library cache lock'
Chain 1 Signature Hash: 0x24734cf
[b] Chain 2 Signature: <not in a wait><='library cache lock'
Chain 2 Signature Hash: 0x24734cf
[c] Chain 3 Signature: <not in a wait><='library cache lock'
Chain 3 Signature Hash: 0x24734cf
===============================================================================
Non-intersecting chains:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (zjedb.zjedb1)
os id: 8118
process id: 96, oracle@zjerac1
session id: 6
session serial #: 36237
}
is waiting for 'library cache lock' with wait info:
{
p1: 'handle address'=0x89d9bbab0
p2: 'lock address'=0x89e0f7560
p3: '100*mode+namespace'=0x4f0003
time in wait: 6 min 47 sec
timeout after: 8 min 12 sec---->此会话已被阻塞6min以上
wait id: 7
blocking: 0 sessions
current sql: <none>
short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+2022<-kslwaitctx()+163<-kjusuc()+3400<-ksipgetctxi()+1759<-kqlmLock()+2679<-kqlmClusterLock()+191<-kgllkal()+2183<-kglLock()+1308<-kglget()+343<-kziasfc()+3126<-kpolnb()+6976<-kpoauth()+872<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
wait history:
* time between current wait and wait #1: 0.000102 sec
1. event: 'library cache: mutex X'
time waited: 0.010957 sec
wait id: 6 p1: 'idn'=0x76e11533
p2: 'value'=0xc400000000
p3: 'where'=0x50
* time between wait #1 and #2: 0.000045 sec
2. event: 'latch: ges resource hash list'
time waited: 0.003486 sec
wait id: 5 p1: 'address'=0x8723bf798
p2: 'number'=0x70
p3: 'tries'=0x0
* time between wait #2 and #3: 0.000034 sec
3. event: 'library cache lock'
time waited: 5 min 13 sec
wait id: 4 p1: 'handle address'=0x89d9bbab0
p2: 'lock address'=0x89e0f7460
p3: '100*mode+namespace'=0x4f0002
}
and is blocked by---->阻塞的源头
=> Oracle session identified by:
{
instance: 1 (zjedb.zjedb1)
os id: 7964
process id: 140, oracle@zjerac1
session id: 2273 ----->此为阻塞源头的会话sid
session serial #: 52199
}
which is not in a wait:
{
last wait: 0.997701 sec ago
blocking: 388 sessions----->阻塞了300多个会话,等待很严重
current sql: <none>
short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-__nanosleep()+16<-kpolnb()+6976<-kpoauth()+872<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
wait history:
1. event: 'library cache lock'
time waited: 6 min 46 sec
wait id: 7 p1: 'handle address'=0x89d9bbab0
p2: 'lock address'=0x88edfd2a8
p3: '100*mode+namespace'=0x4f0003
* time between wait #1 and #2: 0.000314 sec
2. event: 'library cache load lock'
time waited: 0.001645 sec
wait id: 6 p1: 'object address'=0x89d9bbab0
p2: 'lock address'=0x88edfd0a8
p3: '100*mask+namespace'=0x4f0003
* time between wait #2 and #3: 0.000079 sec
3. event: 'latch: ges resource hash list'
time waited: 0.002859 sec
wait id: 5 p1: 'address'=0x8723bf798
p2: 'number'=0x70
p3: 'tries'=0x0
}
Chain 1 Signature: <not in a wait><='library cache lock'
Chain 1 Signature Hash: 0x24734cf
-------------------------------------------------------------------------------
===============================================================================
Intersecting chains:
-------------------------------------------------------------------------------
Chain 2:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (zjedb.zjedb1)
os id: 8149
process id: 160, oracle@zjerac1
session id: 7
session serial #: 24737
}
is waiting for 'library cache lock' with wait info:
{
p1: 'handle address'=0x89d9bbab0
p2: 'lock address'=0x88f8d4ed0
p3: '100*mode+namespace'=0x4f0003
time in wait: 6 min 47 sec
timeout after: 8 min 12 sec
wait id: 6
blocking: 0 sessions
current sql: <none>
short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+2022<-kslwaitctx()+163<-kjusuc()+3400<-ksipgetctxi()+1759<-kqlmLock()+2679<-kqlmClusterLock()+191<-kgllkal()+2183<-kglLock()+1308<-kglget()+343<-kziasfc()+3126<-kpolnb()+6976<-kpoauth()+872<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
wait history:
* time between current wait and wait #1: 0.000130 sec
1. event: 'latch: ges resource hash list'
time waited: 0.005830 sec
wait id: 5 p1: 'address'=0x8723bf798
p2: 'number'=0x70
p3: 'tries'=0x0
* time between wait #1 and #2: 0.000044 sec
2. event: 'library cache lock'
time waited: 5 min 13 sec
wait id: 4 p1: 'handle address'=0x89d9bbab0
p2: 'lock address'=0x88f8d4dd0
p3: '100*mode+namespace'=0x4f0002
* time between wait #2 and #3: 0.000216 sec
3. event: 'latch: ges resource hash list'
time waited: 0.005371 sec
wait id: 3 p1: 'address'=0x8723bf798
p2: 'number'=0x70
p3: 'tries'=0x0
}
and is blocked by 'instance: 1, os id: 7964, session id: 2273',-->其余会话也是被这个sid阻塞
which is a member of 'Chain 1'.
Chain 2 Signature: <not in a wait><='library cache lock'
Chain 2 Signature Hash: 0x24734cf
----------------------------------------
SO: 0x87998d798, type: 4, owner: 0x869081560, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x869081560, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 2273 ser: 52201 trans: (nil), creator: 0x869081560
flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x9) -/-/INC
DID: , short-term DID:
txn branch: (nil)
edition#: 0 oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
ksuxds FALSE at location: 0
service name: zjedb
client details:---->被阻塞的客户端信息
O/S info: user: WIN-0ELFKQG4A9V$, term: unknown, ospid: 1234
machine: WIN-0ELFKQG4A9V program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
Current Wait Stack:
0: waiting for 'library cache lock'
handle address=0x89d9bbab0, lock address=0x886dac9b0, 100*mode+namespace=0x4f0002
wait_id=2 seq_num=3 snap_id=1
wait times: snap=1 min 54 sec, exc=1 min 54 sec, total=1 min 54 sec
wait times: max=15 min 0 sec, heur=1 min 54 sec
wait counts: calls=231 os=231
in_wait=1 iflags=0x15a2
There is at least one session blocking this session.
Dumping 1 direct blocker(s):
inst: 1, sid: 1145, ser: 52551
Dumping final blocker:
inst: 1, sid: 1145, ser: 52551
Wait State:
fixed_waits=0 flags=0x22 boundary=(nil)/-1
Session Wait History:
elapsed time of 0.000459 sec since current wait
0: waited for 'SQL*Net message from client'
二、处理方式
通过分析之后已经知道了’library cache lock’都是被同一个会话锁阻塞,那么将其手动kill方可解决故障。
alter system kill session 'sid,serial#' immediate;
三、拓展研究
在11g中存在”密码验证延迟“功能,此特性主要用来防止数据库密码被暴力破解,在极少数场景的生产系统业务修改密码,会导致大量会话在等待登录验证,造成大量’library cache lock’。
附MOS文档
文档 ID 1309738.1
In Oracle 11g Release 2 and higher, in order to disable the wait between login failures the event 28401 needs to be explicitly enabled:
The event can be set as follows:
alter system set event =“28401 TRACE NAME CONTEXT FOREVER, LEVEL 1” scope=spfile;
To unset the event, set as follows:
SQL> Alter system set event= '28401 trace name context off' scope=spfile ;
---->密码验证延迟关闭
For more information see:
Document:7715339.8 Logon failures causes “row cache lock” waits - Allow disable of logon delay
The following functions match per above bug:
nanosleep nanosleep kziasfc kpolnb kpoauth
Note: Care should be taken when setting this event, as this is disabling the sleep time which can leave the system more vulnerable.