1.两个节点查询v$asm_disk均卡住,等待enq: DD - contention,阻塞源头为rbal进程,rbal进程未发生阻塞,未在异常等待事件上。
2.阻塞源头RBAL,在CPU上运行。没有在做rebalance磁盘平衡。
3.diag诊断日志中,阻塞源头均指向rbal进程,rbal没有发生等待。
Chains most likely to have caused the hang:
[a] Chain 1 Signature: <not in a wait><='rdbms ipc reply'<='enq: DD - contention'
Chain 1 Signature Hash: 0x7bd12357
[b] Chain 2 Signature: <not in a wait><='rdbms ipc reply'<='enq: DD - contention'
Chain 2 Signature Hash: 0x7bd12357
[c] Chain 3 Signature: <not in a wait><='rdbms ipc reply'<='enq: DD - contention'
Chain 3 Signature Hash: 0x7bd12357
===============================================================================
Non-intersecting chains:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 2 (+asm.+asm2)
os id: 89375
process id: 42, oracle@dg91 (TNS V1-V3)
session id: 2605
session serial #: 781
}
is waiting for 'enq: DD - contention' with wait info:
{
p1: 'name|mode'=0x44440006
p2: 'disk group'=0x0
p3: 'type'=0x1
time in wait: 21 min 38 sec
timeout after: never
wait id: 4
blocking: 0 sessions
current sql: select grpnum_kfdsk, number_kfdsk, compound_kfdsk, incarn_kfdsk, mntsts_kfdsk, hdrsts_kfdsk, compat_kfdsk, mode_kfdsk, state_kfdsk, redun_kfdsk, libnam_kfdsk, totmb_kfdsk, usedmb_kfdsk, asmname_kfdsk, failname_kfdsk, label_kfdsk, path_kfdsk, udid_kfdsk, kfkid_kfdsk, crdate_kfdsk, mtdate_kfdsk, timer_kfdsk , dbcompat_k
short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+2022<-kslwaitctx()+163<-ksqcmi()+2848<-ksqgtlctx()+3501<-ksqgelctx()+557<-kfgUseDmt()+655<-kfgTableCb()+1718<-kfdDskTableCbInternal()+233<-kfdDskTableCb()+56<-qerfxFetch()+3164<-opifch2()+2766<-kpoal8()+2833<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+245
wait history:
* time between current wait and wait #1: 0.000124 sec
1. event: 'SQL*Net message to client'
time waited: 0.000001 sec
wait id: 3 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #1 and #2: 0.003768 sec
2. event: 'SQL*Net message from client'
time waited: 0.000418 sec
wait id: 2 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #2 and #3: 0.000015 sec
3. event: 'SQL*Net message to client'
time waited: 0.000002 sec
wait id: 1 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
}
and is blocked by
=> Oracle session identified by:
{
instance: 2 (+asm.+asm2)
os id: 420752
process id: 27, oracle@dg91 (TNS V1-V3)
session id: 1675
session serial #: 29811
}
which is waiting for 'rdbms ipc reply' with wait info:
{
p1: 'from_process'=0x12
p2: 'timeout'=0x7fec666b
time in wait: 2.078555 sec
timeout after: 0.000000 sec
wait id: 642263
blocking: 11 sessions
current sql: select name_kfgrp, number_kfgrp, incarn_kfgrp, compat_kfgrp, dbcompat_kfgrp, state_kfgrp, flags32_kfgrp, type_kfgrp, refcnt_kfgrp, sector_kfgrp, blksize_kfgrp, ausize_kfgrp , totmb_kfgrp, freemb_kfgrp, coldmb_kfgrp, hotmb_kfgrp, minspc_kfgrp, usable_kfgrp, offline_kfgrp, lflags_kfgrp from x$kfgrp
short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksliwat()+2022<-kslwaitctx()+163<-kslwait()+141<-ksarcr()+219<-ksbwcoex()+35<-kfgbSendWithPin()+442<-kfgbSendShallow()+137<-kfgDiscoverShallow()+268<-kfgGlobalOpen()+264<-kfgDiscoverDeep()+302<-kfgDiscoverGroup()+869<-kfgTableCb()+2339<-kfgGrpTableCbInternal()+4169<-kfgGrpTableCb()+56<-qerfxFetch()+3164<-opifch2()+2766<-kpoal8()+2833<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-
wait history:
* time between current wait and wait #1: 0.000065 sec
1. event: 'rdbms ipc reply'
time waited: 1.999940 sec
wait id: 642262 p1: 'from_process'=0x12
p2: 'timeout'=0x7fec666d
* time between wait #1 and #2: 0.000064 sec
2. event: 'rdbms ipc reply'
time waited: 1.999885 sec
wait id: 642261 p1: 'from_process'=0x12
p2: 'timeout'=0x7fec666f
* time between wait #2 and #3: 0.000067 sec
3. event: 'rdbms ipc reply'
time waited: 1.999927 sec
wait id: 642260 p1: 'from_process'=0x12
p2: 'timeout'=0x7fec6671
}
and is blocked by
=> Oracle session identified by:
{
instance: 2 (+asm.+asm2)
os id: 70866
process id: 18, oracle@dg91 (RBAL)
session id: 1117
session serial #: 1
}
which is not in a wait:
{
last wait: 21410 min 11 sec ago
blocking: 12 sessions
current sql: <none>
short stack: <none: error encountered - ORA-32515: cannot issue ORADEBUG command 'SHORT_STACK' to process 'Unix process pid: 70866, image: oracle@dg91 (RBAL)'; prior command execution time exceeds 30000 ms>
wait history:
1. event: 'CSS operation: action'
time waited: 0.000003 sec
wait id: 67025744 p1: 'function_id'=0x43
* time between wait #1 and #2: 0.000002 sec
2. event: 'GPnP Termination'
time waited: 0.006598 sec
wait id: 67025743
* time between wait #2 and #3: 0.000002 sec
3. event: 'GPnP Get Item'
time waited: 0.006473 sec
wait id: 67025742
}
Chain 1 Signature: <not in a wait><='rdbms ipc reply'<='enq: DD - contention'
4.gpnp日志中,一直在刷以下日志
尝试方式:
1.kill gpnp进程没效果
2.重启集群 集群起不来 cssd 无法启动
/var/log/message 报链路错误,部分ASM磁盘从存储端断开,添加到其它服务器使用,服务器端未做清理磁盘链路操作,路径还在链路不在了。导致cssd扫描磁盘时处于异常状态。
最后重启操作系统解决的。释放掉报错的磁盘链路,CSS正常启动成功,怀疑是异常的路径影响CSS启动。