测试的时候数据库外键导致死锁_详述一则数据库死锁故障的分析过程

导读:客户的监控告警频繁提示系统xx数据库死锁增长个数高于当前阈值_当前值1.00。下面是详细的故障分析诊断过程,以及详细的解决方案描述。

本文分为三部分:

1.背景概述

2.故障分析

3.根本解决方案及建议

背景概述

客户的监控告警频繁提示系统xx数据库死锁增长个数高于当前阈值_当前值1.00。下面是详细的故障分析诊断过程,以及详细的解决方案描述。

故障分析

2.1 故障现象

登录到系统,从数据库到alert日志可以发现的确存在很多ORA-60的信息,截取部分如下:

2020-04-23T19:32:00.644961+08:00

XXXDB(4):ORA-00060: Global Enqueue Services Deadlock detected. See Note 60.1 at My Oracle for Troubleshooting ORA-60 Errors. More info in file

/oracle/app/oracle/diag/rdbms/z1d1v19c/xxxxxxxx2/trace/xxxxxxxx2_ora_127408.trc.

2020-04-23T19:32:01.000382+08:00

Dumping diagnostic data in directory=[cdmp_20200423193200], requested by (instance=2, osid=127408), summary=[abnormal process termination].

2020-04-23T19:32:54.093147+08:00

XXXDB(4):ORA-00060: Global Enqueue Services Deadlock detected. See Note 60.1 at My Oracle for Troubleshooting ORA-60 Errors. More info in file

/oracle/app/oracle/diag/rdbms/z1d1v19c/xxxxxxxx2/trace/xxxxxxxx2_ora_127383.trc.

2020-04-23T19:32:54.289460+08:00

Dumping diagnostic data in directory=[cdmp_20200423193254], requested by (instance=2, osid=127383), summary=[abnormal process termination].

2020-04-23T19:32:57.576079+08:00

XXXDB(4):ORA-00060: Global Enqueue Services Deadlock detected. See Note 60.1 at My Oracle for Troubleshooting ORA-60 Errors. More info in file

/oracle/app/oracle/diag/rdbms/z1d1v19c/xxxxxxxx2/trace/xxxxxxxx2_ora_124482.trc.

2.2 故障根源

查看trc内容,发现是自身导致的阻塞,几个trc都类似,下面截取其中一个trc跟踪日志的关键内容:

*** 2020-04-23T19:32:00.644695+08:00 (XXXDB(4))

*** SESSION ID:(7989.26294) 2020-04-23T19:32:00.644756+08:00

*** CLIENT ID:() 2020-04-23T19:32:00.644762+08:00

*** SERVICE NAME:(XXXDB) 2020-04-23T19:32:00.644767+08:00

*** MODULE NAME:(oracle@xxxxxxxxdb2) 2020-04-23T19:32:00.644772+08:00

*** ACTION NAME:() 2020-04-23T19:32:00.644777+08:00

*** CLIENT DRIVER:() 2020-04-23T19:32:00.644781+08:00

*** CONTAINER ID:(4) 2020-04-23T19:32:00.644785+08:00

Single resource deadlock: blocking enqueue which blocks itself, f 0

Granted global enqueue 0xaf1cabb50

----------enqueue 0xaf1cabb50------------------------

lock version : 137621

Owner inst : 2

grant_level : KJUSERNL

req_level : KJUSERPW

bast_level : KJUSERNL

notify_func : none

resp : 0x5bf1f1c38

procp : 0xb069785a0

pid : 127405

proc version : 580

oprocp : (nil)

opid : 127405

group lock owner : 0xb5aaec910

possible pid : 127405

xid : C1000-0002-000FC6D3

dd_time : 10.0 secs

dd_count : 0

timeout : 60.0 secs

On_timer_q? : Y

On_dd_q? : Y

sec since mv2grQ : N/A

lock_state : OPENING CONVERTING

ast_flag : 0x0

flags : 0x0

Open Options : KJUSERDEADLOCK

Convert options : KJUSERGETVALUE

History : FREE > KJLALC > REF_RES > LOC_AST > CLOSE > FREE > KJLALC > REF_RES > GR2CVT

Msg_Seq : 0x0

res_seq : 10845078

valblk : 0x00000000000000000000000000000000 .

user session for deadlock lock 0xaf1cabb50

sid: 7867 ser: 47087 audsid: 210608 user: 103/ZJY

pdb: 4/XXXDB

flags: (0x41) USR/- flags2: (0x40009) -/-/INC

flags_idl: (0x1) status: BSY/-/-/- kill: -/-/-/-

pid: 193 O/S info: user: grid, term: UNKNOWN, ospid: 127405

image: oracle@xxxxxxxxdb2

client details:

O/S info: user: , term: , ospid: 1234

machine: ltyglapp01 program: oracle@xxxxxxxxdb2

application name: oracle@xxxxxxxxdb2, hash value=420857658

current SQL:

delete from tx_xxxx_xxxxx_xxxx where 1=1 and FTOOL_CODE='X019541' and T8_SYS_ADTYPE_ID=1 and T8_SYS_PORTFOL_ID=100148 and SETTLE_DATE >='20200423' and isaccount != 1 and account_type=3 and (BOND_MARKET=1 or BOND_MARKET is null )

Requesting global enqueue 0xb1ceb9110

----------enqueue 0xb1ceb9110------------------------

lock version : 534003

Owner inst : 2

grant_level : KJUSERCW

req_level : KJUSERPW

bast_level : KJUSERNL

notify_func : none

resp : 0x5bf1f1c38

procp : 0xb069bd4b0

pid : 127408

proc version : 1615

oprocp : (nil)

opid : 127408

group lock owner : 0xb4aa785e8

possible pid : 127408

xid : C2000-0002-0002D8DD

dd_time : 0.0 secs

dd_count : 0

timeout : -6670697.0 secs

On_timer_q? : N

On_dd_q? : N

sec since mv2grQ : 0

lock_state : CONVERTING

ast_flag : 0x0

flags : 0x1

Open Options : KJUSERDEADLOCK

Convert options : KJUSERGETVALUE

History : REF_RES > LOC_AST > CLOSE > FREE > KJLALC > REF_RES > GR2CVT > LOC_AST_FG > LOC_AST

Msg_Seq : 0x0

res_seq : 10845078

valblk : 0x00000000000000000000000000000000 .

user session for deadlock lock 0xb1ceb9110

sid: 7989 ser: 26294 audsid: 210609 user: 103/ZJY

pdb: 4/XXXDB

flags: (0x41) USR/- flags2: (0x40009) -/-/INC

flags_idl: (0x1) status: BSY/-/-/- kill: -/-/-/-

pid: 194 O/S info: user: grid, term: UNKNOWN, ospid: 127408

image: oracle@xxxxxxxxdb2

client details:

O/S info: user: , term: , ospid: 1234

machine: ltyglapp01 program: oracle@xxxxxxxxdb2

application name: oracle@xxxxxxxxdb2, hash value=420857658

current SQL:

delete from tx_xxxx_xxxxx_xxxx where 1=1 and FTOOL_CODE='112094455' and T8_SYS_ADTYPE_ID=24 and T8_SYS_PORTFOL_ID=100148 and SETTLE_DATE >='20200423' and isaccount != 1 and (BOND_MARKET=3 or BOND_MARKET is null ) and relation_no='100912'

----------resource 0x5bf1f1c38----------------------

resname : [0x11ecf][0x0],[TM][ext 0xfffdbbf8,0x0][domid 0x4]

可以发现都是delete from tx_xxxx_xxxxx_xxxx这张表,死锁现象是Single resource deadlock,而且TYPE是TM,综合分析这种现象,大多就是外键没有索引导致。

c. For single resource deadlock on TM enqueue, missing foreign key index is often the cause, please check case 3 for the solution.

进一步获取建表语句查看约束部分,发现只有主键,没有外键:

select dbms_metadata.get_ddl('TABLE','TX_XXXX_XXXXX_XXXX','ZJY') from dual;

"TEMPACCRUBLN" NUMBER(22,4) DEFAULT 0,

CONSTRAINT "PK_TX_XXXX_XXXXX_XXXX" PRIMARY KEY ("ID")

USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS

STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645

PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1

BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)

TABLESPACE "ZJYDATA" ENABLE

查询ZJY用户下所有存在外键的表:

STAT TABLE_NAME CON IDX

ok T8_ACC_XXXXXXX TX_XXXXX_XXXX_ID TX_XXXXX_XXXX_ID

**** T8_SHIBOR_ACC_XXXXXXX TX_XXXXX_XXXX_ID

**** T8_CUSTOM_ACC_XXXXXXX TX_XXXXX_XXXX_ID

**** T8_IMPAWN_ACC_XXXXXXX TX_XXXXX_XXXX_ID

**** T8_FUND_ACC_XXXXXXX TX_XXXXX_XXXX_ID

**** T8_LENDING_ACC_XXXXXXX TX_XXXXX_XXXX_ID

**** T8_BOND_ACC_XXXXXXX TX_XXXXX_XXXX_ID

**** T8_BUYOUT_ACC_XXXXXXX TX_XXXXX_XXXX_ID

进一步分析,其中T8_BOND_ACC_XXXXXXX的外键就是引用了TX_XXXX_XXXXX_XXXX的主键ID列:

select dbms_metadata.get_ddl('TABLE','T8_BOND_ACC_XXXXXXX','ZJY') from dual;

REFERENCES "ZJY"."TX_XXXX_XXXXX_XXXX" ("ID") ON DELETE CASCADE ENABLE

2.3 故障处置

业务闲时,在T8_BOND_ACC_XXXXXXX表的外键列TX_XXXXX_XXXX_ID上创建索引,再观察告警情况。

create index TX_XXXXX_XXXX_ID on T8_BOND_ACC_XXXXXXX(TX_XXXXX_XXXX_ID) online;

注:上面是示范创建语句,具体创建索引的语句,比如是否采用online,是否使用并行等参照行里运维具体规范。

根本解决方案及建议

虽然本次告警信息未发现其他表的全局死锁问题,但从Oracle的设计层面来看,建议在在所有7个表的外键列上都创建对应的索引,防止后续在其他表大量DML时出现同类问题。

此外,复盘的时候发现,trc跟踪日志的这一部分信息:

----------resource 0x5bf1f1c38----------------------

resname : [0x11ecf][0x0],[TM][ext 0xfffdbbf8,0x0][domid 0x4]

这里的[0x11ecf]转换为10进制,就是对应的外键无索引对象的object_id。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值