Oracle Version: 10.2.0.1.0
SQL*Plus - Version: 10.2.0.1.0
可能导致本地发起的所有连接ORACLE数据库服务端的连接无响应。
现象 :
切换到oracle 后 执行sqlplus 没有反映,也不报错,就挂起在那里
[root@localhost~]# su - oracle
[oracle@localhost ~]$ sqlplus #这里就无响应了。。无论多久都无响应
oracle下其他的命令可以执行
通过sar 看到信息
Linux 2.6.9-22.ELsmp (economy2)
09:43:13 PM CPU %user %nice %system %iowait %idle
09:43:15 PM all 46.77 0.00 53.23 0.00 0.00
09:43:17 PM all 47.00 0.00 53.00 0.00 0.00
09:43:19 PM all 43.20 0.00 54.37 2.43 0.00
Average: all 45.63 0.00 53.54 0.82 0.00
结束sqlplus
通过sar 2 4 看到系统状况
09:44:15 PM CPU %user %nice %system %iowait %idle
09:44:17 PM all 0.00 0.00 0.00 0.00 0.00
09:44:19 PM all 0.00 0.00 0.00 0.00 0.00
09:44:21 PM all 0.00 0.00 0.00 100.00 0.00
09:44:23 PM all 0.00 0.00 0.00 0.00 0.00
Average: all 0.00 0.00 0.00 100.00 0.00
但只要UpTime200天左右就会出现该现象
[oracle@xxxxx ~]$uptime
16:48:26 up 200 days, 2:20, 2 users, load average: 0.00, 0.03, 0.10
问题基本定位为ORACLE客户端软件的BUG,
涉及的版本:ORACLE 10.2.0.1.0
现象就是UPTIME>50天,即有可能出现运行SQLPLUS后无反应的现象,主要原因是时间溢出错误。事实上只要Linux x86主机运行天数是24.8(本系统运行198天)的倍数都有可能引发该bug,因为time()函数值为null,造成无限死循环,从而耗尽cpu。
- Running STRACE tool shows:
- $ strace /oracle/home/bin/sqlplus -V 2>&1 |less
- ......
- old_mmap(NULL, 385024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x41794000
- gettimeofday({1122996561, 411035}, NULL) = 0
- access("/usr/local/UD/conf/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
- access("/usr/local/UD/lib/oracle/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
- access("/usr/local/UD/conf/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
- access("/usr/local/UD/lib/oracle/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
- fcntl64(-1218313656, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
- It is looping on the times() function.--死循环中
- times(NULL) = -1825782405
- times(NULL) = -1825782405
- times(NULL) = -1825782405
- times(NULL) = -1825782405
- times(NULL) = -1825782405
- times(NULL) = -1825782405
- times(NULL) = -1825782405
Running STRACE tool shows:
$ strace /oracle/home/bin/sqlplus -V 2>&1 |less
......
old_mmap(NULL, 385024, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x41794000
gettimeofday({1122996561, 411035}, NULL) = 0
access("/usr/local/UD/conf/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/lib/oracle/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/conf/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/UD/lib/oracle/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
fcntl64(-1218313656, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
It is looping on the times() function.--死循环中
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
times(NULL) = -1825782405
解决办法:
1、重启系统。
2、对该bug单独打临时patch 4612267。 补丁下载地址:http://download.csdn.net/detail/payton_liu/8033537
3、升级ORACLE客户端到10.2.0.2.0(官网已经公布在这个版本解决了此问题)。
4、降级到9I,9I绝对不会出现这个问题;或降级到10.1.0.4版本(没有经过全面测试)。
第一种方法没有彻底解决问题,以后照旧;第三、四种方法,升级时间长,且要求停库很久,当前生产环境暂不适合;
我采取的是第二种方法,打补丁包的方式。而且据oracle官方文档说明,oracle11已经修复该问题。
下面是pache 4612267补丁包的安装及验证方法:
先停监听、dbconsole和数据库
$ lsnrctl stop
$ emctl stop dbconsole
$ sqlplus / as sysdba
SQL> shutdown immediate
注意:dbconsole是在已经装了Oracle EM的情况下要停止,如果未安装则无需干涉。
安装patch
$ $ORACLE_HOME/OPatch/opatch lsinventory
Invoking OPatch 10.2.0.1.0
Oracle interim Patch Installer version 10.2.0.1.0
Copyright (c) 2005, Oracle Corporation. All rights reserved..
Oracle Home : /u01/app/oracle/product/10.2.0/db_1
Central Inventory : /u01/app/oracle/oraInventory
from : /u01/app/oracle/product/10.2.0/db_1/oraInst.loc
OPatch version : 10.2.0.1.0
OUI version : 10.2.0.1.0
OUI location : /u01/app/oracle/product/10.2.0/db_1/oui
Log file location : /u01/app/oracle/product/10.2.0/db_1/cfgtoollogs/
opatch/opatch-2009_Jan_13_11-06-27-HKT_Tue.log
Lsinventory Output file location : /u01/app/oracle/product/10.2.0/db_1/
cfgtoollogs/opatch/lsinv/lsinventory-2009_Jan_13_11-06-27-HKT_Tue.txt
--------------------------------------------------------------------------------
Installed Top-level Products (2): Oracle Database 10g 10.2.0.1.0
Oracle Database 10g Products 10.2.0.1.0
There are 2 products installed in this Oracle Home.
Interim patches (1) :
Patch 4612267 : applied on Tue Jan 13 11:05:10 HKT 2009
Created on 5 Oct 2005, 13:48:00 hrs US/Pacific
Bugs fixed: 4612267
---------------------------------------------------------------
OPatch succeeded.
启动数据库、监听和dbconsole
$ sqlplus / as sysdba
SQL> startup
$ lsnrctl start
$ emctl start dbconsole
如果有需要,还可以删除patch,删除前先停库
$ cd $ORACLE_BASE/patches/4612267
$ $ORACLE_HOME/OPatch/opatch rollback -id 4612267
Invoking OPatch 10.2.0.1.0
...
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/app/oracle/product/10.2.0/db_1') Is the local system ready for patching?
Do you want to proceed? [y|n] y (此处输入y)
User Responded with: Y
...
RollbackSession removing interim patch '4612267' from inventory The local system has been patched and can be restarted.
OPatch succeeded.
此时再执行上面的验证patch命令就会发现该patch已经删除了。