操作系统版本:Red Hat Enterprise Linux Server release 5.4 i686
数据库版本:Oracle 10.2.0.1
测试环境一台服务器,使用sqlplus无响应,不仅是sqlplus,敲lsnrctl status也没有反应,所有的alert日志、操作系统日志均无报错。开发人员反映,此系统从使用以来,每隔3、4个月就会发生此问题。
由于没法sqlplus登录数据库(加prelim参数也没用),所以考虑用strace看看sqlplus命令执行情况。以下省略大部分内容,使用strace sqlplus,看到的是命令一致在times()函数上执行,赋值为NULL,返回值是个很大的负值,并且逐渐递增,一直处于循环中。
times(NULL)= -2146663723
times(NULL)= -2146663723
times(NULL)= -2146663723
times(NULL)= -2146663723
times(NULL)= -2146663723
times(NULL)= -2146663723
times(NULL)= -2146663723
times(NULL)= -2146663718
times(NULL)= -2146663718
times(NULL)= -2146663718
times(NULL)= -2146663718
times(NULL)= -2146663718
times(NULL)= -2146663718
times(NULL)= -2146663718
times(NULL)= -2146663718
times(NULL)= -2146663718
看上去像是函数调用出现的bug,于是去MOS上查询相关资料:
文档SQL*Plus 10.2.0.1 Hangs, When System Uptime Is Long Period of Time [ID 338461.1]与此症状相符。
看了一下系统运行时间:
[oracle@localhost lib]$ date
Mon Feb 13 15:07:21 CST 2012
[oracle@localhost lib]$ uptime
15:07:29 up 198 days, 23:05,5 users,load average: 16.02, 16.11, 15.91
解决方法:
(1)临时解决方法:重启机器,之后该问题消失。
(2)应用one-off Patch 4612267。
(3)升级数据库版本至10.2.0.2或更高。
附相关说明:
SQL*Plus 10.2.0.1 Hangs, When System Uptime Is Long Period of Time [ID 338461.1]
Applies to:
SQL*Plus - Version: 10.2.0.1 to 10.2.0.1 - Release: Oracle10g to Oracle10g
Generic UNIX
Linux x86
HP-UX PA-RISC (64-bit)
IBM AIX on POWER Systems (64-bit)
Linux x86-64
Checked for relevance on 18-Apr-2010
Symptoms
Trying to invoke SQL*Plus hangs when machine uptime has been for a long period of time. There have been cases where problem occurs when uptime reaches 60 days and others as long as 248 days.
Database connection is not relevant as the 'sqlplus' executable itself hangs with or without connection information. It hangs regardless of what parameters are passed in. For example:
sqlplus -V
However, the hang does not reproduce with Instant Client version 10.1.0.4
Generating a stack trace using gdb debugger shows:
#0 0x0048a0dd in times () from /lib/tls/libc.so.6
#1 0x01884599 in sltrgatime64 () from ./libclntsh.so.10.1
#2 0x0137d70f in kghinp () from ./libclntsh.so.10.1
#3 0x00f86b47 in kpuinit0 () from ./libclntsh.so.10.1
#4 0x00f85e7a in kpuenvcr () from ./libclntsh.so.10.1
#5 0x01051b5e in OCIEnvCreate () from ./libclntsh.so.10.1
#6 0x0099787e in afidrv () from ./libsqlplus.so
#7 0x009702b9 in safimTerminate () from ./libsqlplus.so
#8 0x0098c246 in afidrv () from ./libsqlplus.so
#9 0x080486f6 in main ()
Running STRACE tool shows:
$ strace /oracle/home/bin/sqlplus -V 2>&1 |less
......
old_mmap(NULL, 385024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x41794000
gettimeofday({1122996561, 411035}, NULL) = 0
access("/usr/local/UD/conf/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/lib/oracle/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/conf/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/lib/oracle/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
fcntl64(-1218313656, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
It is looping on the times() function.
In addition to sqlplus, it has been reported that the netca and dbca tools also hang.
Changes
This may happen with a new installation of Instant Client 10.2.0.1.0 or Oracle 10.2.0.1.0 on UNIX platform, or it can just occur after some period of time with no other changes.
Cause
This is a known, unpublished bug.
BUG 4612267OCI CLIENT IS IN AN INFINITE LOOP WHEN MACHINE UPTIME HITS 248 DAYS
Solution
Select one of the following two solutions:
1) Apply one-off patch available for 10.2.0.1.
a. Download one-off patch off Metalink:
Patch 4612267
Description OCI CLIENT IS IN AN INFINITE LOOP WHEN MACHINE UPTIME HITS 248 DAYS
Product CORE
Release Oracle 10.2.0.1
b. To apply patch on Instant Client install, please follow instructions documented in the OCI manual.
You can find this in:
under "Patching Instant Client Shared Libraries on Linux or UNIX".
2) Apply Patchset 10.2.0.2 or higher.
According to unpublished BUG 4612267, this bug is fixed in version 11, and backported to 10.2.0.2 patchset.