测试环境出现问题
今天 ,收到项目组通知说发生了死锁,让我查一下死锁出现的原因 。
首先 ,登录数据库,查看trace日志所在路径
-
SYS @ cams > show parameter dump ;
-
NAME TYPE VALUE
-
------------------------------------ ----------- ------------------------------
-
background_core_dump string partial
-
background_dump_dest string / u01 / app / oracle / diag / rdbms / cams / cams / trace
-
core_dump_dest string / u01 / app / oracle / diag / rdbms / cams / cams / cdump
-
max_dump_file_size string unlimited
-
shadow_core_dump string PARTIAL
-
user_dump_dest string / u01 / app / oracle / diag / rdbms / cams / cams / trace
查看alert 日志,搜索 00060
-
[ oracle@db trace ] $ cd / u01 / app / oracle / diag / rdbms / cams / cams / trace /
-
[ oracle@db trace ] $ vi alert_cams . log
找到 报错trace日志如下:
-
ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12850 . trc .
-
Tue Sep 12 13 : 28 : 31 2017
-
ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12854 . trc .
-
Tue Sep 12 13 : 28 : 36 2017
-
ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12822 . trc .
-
Tue Sep 12 13 : 31 : 12 2017
打开 其中一个 trace文件,部分内容如下 :
-
Trace file / u01 / app / oracle / diag / rdbms / cams / cams / trace / cams_ora_12850 . trc
-
Oracle Database 11g Enterprise Edition Release 11 . 2 . . 4 . 0 - 64bit Production
-
With the Partitioning , OLAP , Data Mining and Real Application Testing options
-
ORACLE_HOME = / u01 / app / oracle / product / 11 . 2 . / db_1
-
System name : Linux
-
Node name : db
-
Release : 2 . 6 . 32-431 . el6 . x86_64
-
Version : # 1 SMP Sun Nov 10 22 : 19 : 54 EST 2013
-
Machine : x86_64
-
VM name : VMWare Version : 6
-
Instance name : cams
-
Redo thread mounted by this instance : 1
-
Oracle process number : 293
-
Unix process pid : 12850 , image : oracle@db
-
* * * 2017-09-12 13 : 28 : 28 . 401
-
* * * SESSION ID : ( 1139 . 47765 ) 2017-09-12 13 : 28 : 28 . 401
-
* * * CLIENT ID : ( ) 2017-09-12 13 : 28 : 28 . 401
-
* * * SERVICE NAME : ( SYS$USERS ) 2017-09-12 13 : 28 : 28 . 401
-
* * * MODULE NAME : ( JDBC Thin Client ) 2017-09-12 13 : 28 : 28 . 401
-
* * * ACTION NAME : ( ) 2017-09-12 13 : 28 : 28 . 401
-
-
-
* * * 2017-09-12 13 : 28 : 28 . 401
-
DEADLOCK DETECTED ( ORA-00060 )
-
-
[ Transaction Deadlock ]
-
这里指出死锁不是Oracle自身的错误,是用户的设计造成的问题
-
The following deadlock is not an ORACLE error . It is a
-
deadlock due to user error in the design of an application
-
or from issuing incorrect ad-hoc SQL . The following
-
information may aid in determining the deadlock :
-
-
这个死锁图表明两个session各持有一份资源,等待对方持有的资源
-
Deadlock graph :
-
---------Blocker ( s ) - - - - - - - - ---------Waiter ( s ) - - - - - - - - -
-
Resource Name process session holds waits process session holds waits
-
TX-00160018-00000adc 293 1139 X 384 65 X
-
TX-0004000f-000064d6 384 65 X 293 1139 X
-
-
session 1139 : DID 0001-0125-000074A5 session 65 : DID 0001-0180-000037EC
-
session 65 : DID 0001-0180-000037EC session 1139 : DID 0001-0125-000074A5
-
-
Rows waited on :
-
Session 1139 : obj - rowid = 00021C7B - AAAhx7AAHAAARMrAAB
-
( dictionary objn - 138363 , file - 7 , block - 70443 , slot - 1 )
-
Session 65 : obj - rowid = 00021C7B - AAAhx7AAHAAARMtAAD
-
( dictionary objn - 138363 , file - 7 , block - 70445 , slot - 3 )
-
-
----- Information for the OTHER waiting sessions -----
-
Session 65 :
-
sid : 65 ser : 27869 audsid : 3026187 user : 111 / CAMS_CORE
-
flags : ( 0x45 ) USR / - flags_idl : ( 0x1 ) BSY / - / - / - / - / -
-
flags2 : ( 0x40009 ) - / - / INC
-
pid : 384 O / S info : user : oracle , term : UNKNOWN , ospid : 12854
-
image : oracle@db
-
client details :
-
O / S info : user : cams , term : unknown , ospid : 1234
-
machine : yy program : JDBC Thin Client
-
application name : JDBC Thin Client , hash value = 2546894660
-
current SQL :
-
导致死锁的SQL语句1
-
select
-
-
SEQ_NO , ACO_AC , AC_DTE , CCY , BAL , FRZ_AMT ,
-
CRE_TLR , CRE_DTE ,
-
CRE_BR , CRE_BK , UPD_TLR , UPD_DTE , UPD_BR , UPD_BK , TSL_AMT , VPS_BAL
-
-
FROM CD_ACBALANCE
-
WHERE ACO_AC = : 1
-
FOR UPDATE
-
-
----- End of information for the OTHER waiting sessions -----
-
-
Information for THIS session :
-
-
----- Current SQL Statement for this session ( sql_id = cfy88pmyts0fn ) -----
-
导致死锁的SQL语句2
-
select
-
-
SEQ_NO , ACO_AC , AC_DTE , CCY , BAL , FRZ_AMT ,
-
CRE_TLR , CRE_DTE ,
-
CRE_BR , CRE_BK , UPD_TLR , UPD_DTE , UPD_BR , UPD_BK , TSL_AMT , VPS_BAL
-
-
FROM CD_ACBALANCE
-
WHERE ACO_AC = : 1
-
FOR UPDATE
-
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
这里可以明确的是 select for update语句造成了死锁。但是考虑到环境为测试环境,没开启归档,不能通过logminer挖掘故障期间的SQL事物,而且开发人员已经通过kill session的方式处理了死锁,所以需要在别的环境中重现该问题,并进行分析。
重现死锁问题
首先连上数据库,启用HR用户
-
[ oracle@prod ~ ] $ sqlplus / as sysdba
-
SQL * Plus : Release 11 . 2 . . 4 . 0 Production on Wed Sep 13 03 : 14 : 17 2017
-
Copyright ( c ) 1982 , 2013 , Oracle . All rights reserved .
-
Connected to an idle instance .
-
SYS@PROD > startup ;
-
ORACLE instance started .
-
Total System Global Area 1202556928 bytes
-
Fixed Size 2252704 bytes
-
Variable Size 771752032 bytes
-
Database Buffers 419430400 bytes
-
Redo Buffers 9121792 bytes
-
Database mounted .
-
Database opened .
-
SYS@PROD > conn hr / hr
-
ERROR :
-
ORA-28000 : the account is locked
-
Warning : You are no longer connected to ORACLE .
-
@ > conn / as sysdba
-
Connected .
-
SYS@PROD > alter user hr identified by hr account unlock ;
-
User altered .
-
SYS@PROD > conn hr / hr
-
Connected .
选中COUNTRIES表作为测试对象
-
HR@PROD > desc countries ;
-
Name Null ? Type
-
----------------------------------------- -------- ----------------------------
-
COUNTRY_ID NOT NULL CHAR ( 2 )
-
COUNTRY_NAME VARCHAR2 ( 40 )
-
REGION_ID NUMBER
-
HR@PROD > select count ( * ) from countries ;
-
COUNT ( * )
-
----------
-
25
-
HR@PROD > select count ( distinct ( COUNTRY_ID ) ) from countries ;
-
COUNT ( DISTINCT ( COUNTRY_ID ) )
-
---------------------------
-
25
经过验证,COUNTRIES表中的COUNTRY_ID字段是唯一的。
用COUNTRIES表新建一个test表,在test表中做测试
-
HR@PROD > create table test as select * from countries ;
-
Table created .
-
HR@PROD > desc test
-
Name Null ? Type
-
----------------------------------------- -------- ----------------------------
-
COUNTRY_ID NOT NULL CHAR ( 2 )
-
COUNTRY_NAME VARCHAR2 ( 40 )
-
REGION_ID NUMBER
-
HR@PROD > select country_id , country_name , region_id from countries ;
-
CO COUNTRY_NAME REGION_ID
-
-- ---------------------------------------- ----------
-
AR Argentina 2
-
AU Australia 3
-
BE Belgium 1
-
BR Brazil 2
-
CA Canada 2
-
CH Switzerland 1
-
CN China 3
-
DE Germany 1
-
DK Denmark 1
-
EG Egypt 4
-
FR France 1
-
CO COUNTRY_NAME REGION_ID
-
-- ---------------------------------------- ----------
-
IL Israel 4
-
IN India 3
-
IT Italy 1
-
JP Japan 3
-
KW Kuwait 4
-
ML Malaysia 3
-
MX Mexico 2
-
NG Nigeria 4
-
NL Netherlands 1
-
SG Singapore 3
-
UK United Kingdom 1
-
CO COUNTRY_NAME REGION_ID
-
-- ---------------------------------------- ----------
-
US United States of America 2
-
ZM Zambia 4
-
ZW Zimbabwe 4
-
25 rows selected .
情形1:
两个select for update语句想要同时锁定一条语句。
Session1:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
-
CO COUNTRY_NAME REGION_ID
-
-- ---------------------------------------- ----------
-
AU Australia 3
Sessions2:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
这时,session2会一直处于锁等待状态,而不会出现死锁。当把第一个事物commit或者rollback之后,第二个事物会继续执行。
Session1:
-
HR@PROD > rollback ;
-
Rollback complete .
Session2:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
-
CO COUNTRY_NAME REGION_ID
-
-- ---------------------------------------- ----------
-
AU Australia
显然,两个select for update语句想要同时锁定一条语句,并不会出现死锁,而会出现锁等待的现象。
情形2:
第一个select for update事物锁定A row后,想要再锁定B row;第二个select for update事物锁定B row后,想要锁定A row。
Session1:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
-
CO COUNTRY_NAME REGION_ID
-
-- ---------------------------------------- ----------
-
AU Australia 3
Session 2:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
-
CO COUNTRY_NAME REGION_ID
-
-- ---------------------------------------- ----------
-
UK United Kingdom 1
如果Session1想要继续持有Session2正在持有的row:
Session1:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
这时,会出现锁等待的现场,和上一场景类似。如果Session2也想持有Session1正在持有的row:
Session2:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
这时,Session2还处于等待状态,但是Session1出现死锁:
Session1:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
-
select country_id , country_name , region_id from test where country_id = 'UK' for update
-
*
-
ERROR at line 1 :
-
ORA-00060 : deadlock detected while waiting for resource
这时,死锁的现象已经重现,而且,场景2从头到尾只使用了一个select for update语句,只是换了参数而已。
定位死锁语句
通过检查数据库表,能够检查出是哪一条语句被死锁,产生死锁的机器是哪一台。
1)用dba用户执行以下语句
-
select c . owner , c . object_name , c . object_type , b . sid ,
-
b . serial# , b . lockwait , b . status , b . osuser , b . machine , b . process , b . program
-
from v$locked_object a ,
-
v$ session b ,
-
dba_objects c
-
where b . sid = a . session_id
-
and a . object_id = c . object_id ;
如果有输出的结果,则说明有死锁,且能看到死锁的机器是哪一台。字段说明:
OWNER:死锁语句所用的数据库用户,这里是HR用户。
OBJECT_NAME:产生死锁的对象,这里是TEST表。
OBJECT_TYPE:产生死锁的对象类型,这里是TABLE。
SID:SESSION标识,常用于连接 其它 列 。
SERIAL#: SID有可能会重复,当两个session的SID重复时,SERIAL#用来区别session
LOCKWAIT: 可以通过这个字段查询出当前正在等待的锁的相关信息 ,如果有内容表示被死锁或者有锁等待事件。
STATUS: 用来判断session状态。 ACTIVE :正执行 SQL语句 。 INACTIVE :等待操作。 KILLED :被标注为杀死。
OSUSER:客户端操作系统用户名。
MACHINE:客户端操作系统的机器名。
PROCESS:客户端进程的ID。
PROGRAM:客户端执行的应用程序。
2)用dba用户执行以下语句,可以查看到被死锁的语句。
-
select sql_text from v$sql where ( address , hash_value ) in
-
( select sql_address , sql_hash_value from v$ session where sid in
-
( select session_id from v$locked_object ) ) ;
或者
-
select sql_text from v$sql where ( address , hash_value ) in
-
( select sql_address , sql_hash_value from v$ session where lockwait is not null ) ;
这里查出来出现死锁的语句和之前的测试结果一致。
3) 分析trace日志文件
死锁发生时,可以再trace日志文件中找到如下语句:
-
ORA-00060 : Deadlock detected . More info in file / u01 / app / oracle / diag / rdbms / prod / PROD / trace / PROD_ora_4852 . trc .
打开trace文件,可以找到死锁的语句:
-
----- Information for the OTHER waiting sessions -----
-
Session 31 :
-
sid : 31 ser : 25 audsid : 110280 user : 84 / HR
-
flags : ( 0x45 ) USR / - flags_idl : ( 0x1 ) BSY / - / - / - / - / -
-
flags2 : ( 0x40009 ) - / - / INC
-
pid : 23 O / S info : user : oracle , term : UNKNOWN , ospid : 4853
-
image : oracle@prod ( TNS V1-V3 )
-
client details :
-
O / S info : user : oracle , term : pts / 1 , ospid : 3149
-
machine : prod program : sqlplus@prod ( TNS V1-V3 )
-
application name : SQL * Plus , hash value = 3669949024
-
current SQL :
-
select country_id , country_name , region_id from test where country_id = 'AU' for update
-
-
----- End of information for the OTHER waiting sessions -----
-
-
Information for THIS session :
-
-
----- Current SQL Statement for this session ( sql_id = 6n5kjs2twrwwq ) -----
-
select country_id , country_name , region_id from test where country_id = 'UK' for update
-
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
解决死锁问题
方法一(杀进程)
1)查找死锁的进程:
-
select sid , serial# , username , command , lockwait , osuser from v$ session where lockwait is not null ;
2)kill掉这个死锁的进程:
-
alter system kill session 'sid, serial#' ;
这里执行语句为:alter system kill session '29,69';
原先造成死锁的进程被killed。
3) 如果还不能解决,使用杀系统进程的方式处理:
这里为了测试,再次重现了死锁,并使用杀进程方式进行处理。
-
select p . spid from v$ session s , v$process p where s . sid = XXX and s . paddr = p . addr ;
其中,XXX使用第一步查出来的SID参数替代
这里查出来的进程号为:4257
-
[ oracle@prod ~ ] $ ps -ef | grep 4257
-
oracle 4257 3149 0 07 : 08 ? 00 : 00 : 00 oraclePROD ( DESCRIPTION = ( LOCAL = YES ) ( ADDRESS = ( PROTOCOL = beq ) ) )
-
oracle 4321 4000 0 07 : 21 pts / 4 00 : 00 : 00 grep 4257
-
[ oracle@prod ~ ] $ kill -9 4257
-
[ oracle@prod ~ ] $ ps -ef | grep 4257
-
oracle 4327 4000 0 07 : 21 pts / 4 00 : 00 : 00 grep 4257
杀完进程之后,造成死锁的进程被杀死
查不到死锁进程
方法二(重启库)
-
SYS@PROD > shutdown immediate ;
-
Database closed .
-
Database dismounted .
-
ORACLE instance shut down .
-
SYS@PROD > startup ;
-
ORACLE instance started .
-
Total System Global Area 1202556928 bytes
-
Fixed Size 2252704 bytes
-
Variable Size 771752032 bytes
-
Database Buffers 419430400 bytes
-
Redo Buffers 9121792 bytes
-
Database mounted .
-
Database opened .
查看死锁进程:
方法三(commit || rollback)
Session1:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
-
CO COUNTRY_NAME REGION_ID
-
-- ---------------------------------------- ----------
-
AU Australia 3
Session2:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
-
CO COUNTRY_NAME REGION_ID
-
-- ---------------------------------------- ----------
-
UK United Kingdom 1
Session1:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
-
Waiting……
Session2:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
-
Waiting……
Session1:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'UK' for update ;
-
select country_id , country_name , region_id from test where country_id = 'UK' for update
-
*
-
ERROR at line 1 :
-
ORA-00060 : deadlock detected while waiting for resource
这里Session1出现死锁,只要执行commit或者rollback就可以解除死锁,只不过事务中第一个SQL执行成功,第二个SQL执行失败。
Session1:
-
HR@PROD > commit ;
-
Commit complete .
Session2:
-
HR@PROD > select country_id , country_name , region_id from test where country_id = 'AU' for update ;
-
CO COUNTRY_NAME REGION_ID
-
-- ---------------------------------------- ----------
-
AU Australia 3
此时,死锁状态解除:
问题总结
最后,通过与开发人员交流,得出的结论是出现该问题的原因不是程序设计的问题,而是在开发环境中,有人通过debug模式连到服务器上进行代码调试,有人通过客户端的形式访问服务器上的应用,当两者同时调试某一功能时(通过select for update的语句进行一张表中的数据访问),在特殊的情况下,出现了死锁的情况。
当然,在开发环境中比较容易出现这类情况,如果需要避免,就需要使用该开发环境的开发人员与测试人员做好沟通了。对于容易出现锁表的功能,可以要求测试人员在测试环境中测试,除特殊情况,尽量减少在开发环境中测试的次数,以免锁表影响开发测试进度。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/31394774/viewspace-2144941/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/31394774/viewspace-2144941/