今天发现一台数据库宕了,一看原来是文件系统目录满了,进一步查看发现是由于bdump目录下生成过多的trace文件所致,而为什么生成那么多trace文件呢?
发现alert日志中出现大量的如下错误:
Thu Jun 20 19:12:32 2013
Errors in file /soft/oracle/admin/zhs10g/bdump/zhs10g_ora_2972.trc:
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Thu Jun 20 19:12:35 2013
found dead dispatcher 'D000', pid = (13, 28)
这个错误在startup database时,database在mount前就出现,而且在database open(能正常open)以后就大量出现该错误,并生成trace文件,直至把文件系统写满,数据库宕掉。
[oracle@zhs10g bdump]$ more /soft/oracle/admin/zhs10g/bdump/zhs10g_ora_2972.trc
/soft/oracle/admin/zhs10g/bdump/zhs10g_ora_2972.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /soft/oracle/product/10.2.0.4/dbh
System name: Linux
Node name: dctest2
Release: 2.6.18-238.el5
Version: #1 SMP Sun Dec 19 14:22:44 EST 2010
Machine: x86_64
Instance name: zhs10g
Redo thread mounted by this instance: 1
Oracle process number: 13
Unix process pid: 2972, image: oracle@dctest2 (D000)
Warning: keltnfy call to ldmInit failed with error 46
*** 2013-06-20 19:12:32.966
network error encountered getting listening address:
NS Primary Error: TNS-12533: TNS:illegal ADDRESS parameters
NS Secondary Error: TNS-12560: TNS:protocol adapter error
NT Generic Error: TNS-00503: Illegal ADDRESS parameters
OPIRIP: Uncaught error 108. Error stack:
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x208, PC: [0x7a06b8, kslgetl()+120]
*** 2013-06-20 19:12:32.972
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [kslgetl()+120] [SIGSEGV] [Address not mapped to object] [0x000000208] [] []
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Current SQL information unavailable - no session.
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+31 call ksedst1() 000000000 ? 000000001 ?
2B1698BF4D50 ? 2B1698BF4DB0 ?
2B1698BF4CF0 ? 000000000 ?
ksedmp()+610 call ksedst() 000000000 ? 000000001 ?
2B1698BF4D50 ? 2B1698BF4DB0 ?
2B1698BF4CF0 ? 000000000 ?
ssexhd()+629 call ksedmp() 000000003 ? 000000001 ?
2B1698BF4D50 ? 2B1698BF4DB0 ?
2B1698BF4CF0 ? 000000000 ?
__restore_rt()+0 call ssexhd() 00000000B ? 2B1698BF5D70 ?
2B1698BF5C40 ? 2B1698BF4DB0 ?
2B1698BF4CF0 ? 000000000 ?
kslgetl()+120 signal __restore_rt() 0600E7720 ? 0000000E8 ?
071668860 ? 0000009A9 ?
000003980 ? 071668878 ?
ksfglt()+108 call kslgetl() 0600E7720 ? 000000001 ?
071668860 ? 0000009A9 ?
000003980 ? 071668878 ?
kghfre()+2238 call ksfglt() 0600E7720 ? 0600E7720 ?
000000000 ? 071668860 ?
0000009A9 ? 071668878 ?
kmnsbf()+96 call kghfre() 0068966E0 ? 060036468 ?
7FFF98428300 ? 000012000 ?
005609138 ? 071668878 ?
nsbfr()+311 call kmnsbf() 0068966E0 ? 060036468 ?
7FFF98428300 ? 000012000 ?
005609138 ? 071668878 ?
nsiofrrg()+478 call nsbfr() 0068966E0 ? 011D65030 ?
7FFF98428300 ? 000012000 ?
005609138 ? 071668878 ?
nsiocancel()+198 call nsiofrrg() 011D64250 ? 011D64700 ?
000000000 ? 000012000 ?
005609138 ? 071668878 ?
nsopen_shutitdown() call nsiocancel() 011D64250 ? 011D64700 ?
+544 000000000 ? 000012000 ?
005609138 ? 071668878 ?
nsclose()+412 call nsopen_shutitdown() 011D40BA0 ? 7FFF98428560 ?
011D64250 ? 011D64700 ?
7FFF00000000 ? 0000000C0 ?
nsgblclose()+272 call nsclose() 7FFF98428560 ? 000000000 ?
0000000C0 ? 011D64700 ?
7FFF00000000 ? 0000000C0 ?
nsgblTRMHelper()+61 call nsgblclose() 0000000C0 ? 000000000 ?
0000000C0 ? 011D64700 ?
7FFF00000000 ? 0000000C0 ?
nsgblRealTerm()+174 call nsgblTRMHelper() 011D64250 ? 000000000 ?
0000000C0 ? 011D64700 ?
7FFF00000000 ? 0000000C0 ?
nlstdstp()+300 call nsgblRealTerm() 0068BBA80 ? 000000000 ?
0000000C0 ? 011D64700 ?
7FFF00000000 ? 0000000C0 ?
npinlt()+53 call nlstdstp() 0068BBA80 ? 000000000 ?
0000000C0 ? 011D64700 ?
7FFF00000000 ? 0000000C0 ?
ksuabt()+620 call npinlt() 0068BBA80 ? 000000000 ?
0000000C0 ? 011D64700 ?
7FFF00000000 ? 0000000C0 ?
opidrv()+1820 call ksuabt() 0068BBA80 ? 006896F74 ?
7FFF98428998 ? 000000001 ?
6896F7400000000 ? 000000000 ?
sou2o()+114 call opidrv() 000000032 ? 000000004 ?
7FFF98428C78 ? 000000001 ?
6896F7400000000 ? 000000000 ?
opimai_real()+317 call sou2o() 7FFF98428C50 ? 000000032 ?
000000004 ? 7FFF98428C78 ?
6896F7400000000 ? 000000000 ?
main()+116 call opimai_real() 000000003 ? 7FFF98428CE0 ?
000000004 ? 7FFF98428C78 ?
6896F7400000000 ? 000000000 ?
__libc_start_main() call main() 000000003 ? 7FFF98428CE0 ?
+244 000000004 ? 7FFF98428C78 ?
6896F7400000000 ? 000000000 ?
_start()+41 call __libc_start_main() 000723088 ? 000000001 ?
7FFF98428E38 ? 000000000 ?
6896F7400000000 ? 000000003 ?
第一时间使用MetaLink,看看能不能找到有用信息,果然找到一篇比较相近的官文:
ORA-07445: [kslgetl()+80] Followed by ORA-108: failed to set up dispatcher to accept connection asynchronously [ID 1298804.1]
Applies to:
Oracle Server - Enterprise Edition - Version 11.1.0.6 to 11.1.0.7 [Release 11.1]
Information in this document applies to any platform.
Symptoms
The following errors are seen in the trace file written by an ORA-7445 [kslgetl]:
network error encountered getting listening address:
NS Primary Error: TNS-12533: TNS:illegal ADDRESS parameters
NS Secondary Error: TNS-12560: TNS:protocol adapter error
NT Generic Error: TNS-00503: Illegal ADDRESS parameters
OPIRIP: Uncaught error 108. Error stack:
ORA-00108: failed to set up dispatcher to accept connection asynchronously
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x130, PC: [0x82f09dc, kslgetl()+80]
The trace file indicates that there is no session:
Current SQL information unavailable - no session.
The Call Stack Trace in the ORA-7445 trace file contains a function list similar to:
kslgetl <- PGOSF57_ksfglt
<- kghfre <- kmnsbf <- nsbfr <- nsiofrrg <- nsiocancel
<- nsopen_shutitdown <- nsclose <- nsgblclose <- nsgblTRMHelper <- nsgblRealTerm
<- nlstdstp <- npinlt <- ksuabt <- opidrv <- sou2o
<- opimai_real <- main <- libc_start_main
Changes
None.
Cause
The trace file first reports: Warning: keltnfy call to ldmInit failed with error 46
The ORA-7445 is not the starting point here. This exception is just a spin-off from ORA-108 and it is possible that different internal errors may be seen, such as ORA-600 [504], depending on what is happening when the ORA-108 is encountered.
The cause for the ORA-108 is related to the inital message at the beginning of the trace file: "keltnfy call to ldmInit failed with error 46" and this is followed by: "network error encountered getting listening address:"
The error code (here: 46) is the key for solving the issue.
This warning says that ldmInit() returned error 46 which is LDMERR_HOST_NOT_FOUND (host not found).
This error is returned if the OS call gethostbyname() fails with an error. So these appears to be a network specific issue.
Solution
1) Check permission on /etc/hosts
$ ls -l /etc/hosts -rw-r--r-- 2 root root 194 Oct 17 2006 /etc/hosts
Check if /etc/hosts file is correctly configured
<ip address> <fully qualified hostname> <simple or short hostname> <alias, if applicable> ( all of this on one line ).
2) Check the hostname:
$ hostname
$ ping `hostname`
Make sure you are able to ping the hostname
3) Check if /etc/nodename is correctly configured
If you have DNS setup, ping is not a tool to diagnose DNS problem. A better tool to use is nslookup, dnsquery, or dig.
$ nslookup <shortname> $ nslookup <long name> $ nslookup <ip address>
The forward and reverse lookup should succeed and return consistent address/info.
4) Check nsswitch.conf
$ more nsswitch.confhosts: files dnsMake sure host lookup is also done through the /etc/hosts file and not just dns. It is recommended that FILES come first before DNS.
Also, check the resolv.conf. This makes sure that the DNS is working properly.
这是由于主机名不能被正常访问导致,检查了我自己的host表,果然是有误:在host表中没有与hostname名称对于的记录。
于是调整好host表后重启数据库,数据库能正常open,而且也没有在出现ORA-的错误了。