接到一同事的电话说数据库启动后一分钟自动挂掉。
1、查看alert.log文件,重要信息如下:
Fri Nov 30 12:10:55 2012 Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_reco_952.trc: ORA-02068: following severe error from DBL_SHCRM ORA-03113: end-of-file on communication channel Fri Nov 30 12:11:28 2012 Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_reco_952.trc: ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [0x612CFBF6] [] [] [] [] Fri Nov 30 12:11:32 2012 Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_pmon_2576.trc: ORA-00476: RECO process terminated with error Fri Nov 30 12:11:32 2012 PMON: terminating instance due to error 476 Fri Nov 30 12:11:32 2012 Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_ckpt_3156.trc: ORA-00476: RECO process terminated with error Fri Nov 30 12:11:32 2012 Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_dbw0_2200.trc: ORA-00476: RECO process terminated with error Fri Nov 30 12:11:32 2012 Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_mman_1744.trc: ORA-00476: RECO process terminated with error Fri Nov 30 12:11:32 2012 Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_q000_720.trc: ORA-00476: RECO process terminated with error Fri Nov 30 12:11:32 2012 Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_j000_1740.trc: ORA-00476: RECO process terminated with error Fri Nov 30 12:11:34 2012 Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_lgwr_1748.trc: ORA-00476: RECO process terminated with error Fri Nov 30 12:11:39 2012 Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_smon_3752.trc: ORA-00476: RECO process terminated with error Instance terminated by PMON, pid = 2576
2、查询错误ORA-02068:
oracle@p750703a:/home/oracle>oerr ora 02068 02068, 00000, "following severe error from %s%s" // *Cause: A severe error (disconnect, fatal Oracle error) received from // the indicated database link. See following error text. // *Action: Contact the remote system administrator. oracle@p750703a:/home/oracle>结合alert.log中的信息与这个错误的描述是关于db_link的问题。
3、在mos上相关信息描述[ID 336208.1]:
This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review. Applies to: Oracle Server - Enterprise Edition - Version: 9.2.0.1 to 10.2.0.2 - Release: 9.2 to 10.2 Information in this document applies to any platform. ***Checked for relevance on 24-Nov-2011*** Symptoms On RAC instance, with db link, select query is failing with ORA-07445: exception encountered: core dump [] [SIGSEGV] [unknown code] [0x000000060] [] [] ORA-02068: following severe error from MBODMRTT ORA-03113: end-of-file on communication channel The complete sequence of messages in the alert log is similar to: Errors in file /apps01/app/oracle/admin/MBODODST/udump/mbods1_reco_28993.trc: ORA-07445: exception encountered: core dump [] [SIGSEGV] [unknown code] [0x000000060] [] [] ORA-02068: following severe error from MBODMRTT ORA-03113: end-of-file on communication channel Mon Sep 26 02:23:58 2005 Trace dumping is performing id=[cdmp_20050926022358] Mon Sep 26 02:23:59 2005 Errors in file /apps01/app/oracle/admin/MBODODST/bdump/mbods1_pmon_28971.trc: ORA-00476: RECO process terminated with error Mon Sep 26 Also, we might see the following errors in alert.log: ORA-02068: following severe error from dblink_name ORA-03113: end-of-file on communication channel ORA-07445: exception encountered: core dump [lxsCpStr()+688] [SIGSEGV] [Address not mapped to object] [0x60] [] [] Cause TAF is not supported for remote database links. TAF is not supported for DML statements. This limitation is now documented in Oracle 10.2 The instance crash is caused by Bug 4496339 RECO PROCESS DIES WITH ORA-7445 [lxsCpStr] (Fixed-Releases: 10.2.0.3 and 11.1.0.6) The fix avoids the instance from crashing. Solution 1/ To implement the solution, do the following while connected as sysdba. Please keep a spool of the output. The procedure you are adopting is to clean up stranded 2pc entries in the database. Stranded entries means RECO process is not able to clean up the entries even after failure due to some reason. For every database where the action plan is applied identify the local_tran_id from dba_2pc_pending and use the procedure to purge it. The below steps would resolve the issue: 1. SQL> select * from dba_2pc_pending; 2. -- check the current value of _smu_debug_mode (default 0): SQL> show parameter debug -- if default 0, it will show no entry -- set it temporarily to 4: 3. SQL> alter system set "_smu_debug_mode" = 4; -- in 9.2x alter session can be used instead. 4. SQL> commit; -- so that the call to DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY is the first step of the transaction 5. SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('<transaction id>'); 6. SQL> commit; 7. SQL> alter system set "_smu_debug_mode" = <original value>; 8. SQL> commit; 9. SQL> select * from dba_2pc_pending; 2/ Download Patch 5021063 on 10.1.0.4 which includes the fix for this bug OR 3/ Download Patch 4496339 if available on your database version and platform. OR 4/ The bug also lists a workaround: do not use failover for dblinks. OR 5/ The bug is fixed in 10.2.0.3 and 11.1.0.6 . Upgrade to one of these patchsets or higher. References NOTE:100664.1 - Master Note for Troubleshooting Oracle Managed Distributed Transactions NOTE:126069.1 - Manually Resolving In-Doubt Transactions: Different Scenarios NOTE:159377.1 - How to Purge a Distributed Transaction from a Database NOTE:290405.1 - ORA-30019 When Executing Dbms_transaction.Purge_lost_db_entry PATCH:5021063 - MLR ON TOP OF 10.1.0.4 FOR CPUJAN_MERGE BUG:4496339 - RECO PROCESS DIES WITH ORA-7445[LXSCPSTR] 相关内容 -------------------------------------------------------------------------------- 产品 -------------------------------------------------------------------------------- Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition 关键字 -------------------------------------------------------------------------------- DATABASE LINK; DBA_2PC_PENDING; DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY; DML; SIGSEGV; TAF 错误 -------------------------------------------------------------------------------- ORA-7445[LXSCPSTR]; ORA-476; ORA-7445; ORA-2068; ORA-3113; ORA-7445[EXCEPTION]
4、如上所述,查看一下对应的DB_LINK对应的配置为,问题在如下蓝色字体部分:
shcrmops12 = (DESCRIPTION = (FAILOVER = ON) (ADDRESS_LIST = #(ADDRESS = (PROTOCOL = TCP)(HOST = shcrmdb1.boss.sh.cmcc)(PORT = 1521)) #(ADDRESS = (PROTOCOL = TCP)(HOST = shcrmdb2.boss.sh.cmcc)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 10.10.100.161)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 10.10.100.163)(PORT = 1521)) ) (CONNECT_DATA = (Service_name = shcrm) (failover_mode = (type = session) (backup = shcrm2) ) ) )
5、关闭DB_LINK配置中的TAF,作如下修改:
CREATEdatabaselinkDBL_SHCRM
CONNECTTO AIALM_SH IDENTIFIEDBY AIALM_SH
USING'(DESCRIPTION=
(FAILOVER = ON)
(ADDRESS_LIST =
(ADDRESS =(PROTOCOL = TCP)(HOST = 10.10.100.163)(PORT = 1521))
)
(CONNECT_DATA =
(Service_name = shcrm)
)
)';
6、重启实例,再没有此现象。
--------附-----
数据库版本:
Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Prod
PL/SQL Release 10.1.0.2.0 - Production
CORE 10.1.0.2.0Production
TNS for 32-bit Windows: Version 10.1.0.2.0 - Production
NLSRTL Version 10.1.0.2.0 - Production
参考文章:ID 1447285.1
References
BUG:4496339 - RECO PROCESS DIES WITH ORA-7445[LXSCPSTR]NOTE:126069.1 - Manually Resolving In-Doubt Transactions: Different Scenarios
NOTE:18455.1 - OERR: ORA 476 RECO process terminated with error
NOTE:336208.1 - Instance Crash While Using Dblink Ora-2068 ORA-07445 in RECO process
NOTE:460982.1 - How To Configure Server Side Transparent Application Failover