Oracle 10.1实例启动后一分钟后自动挂掉的处理方法

最新推荐文章于 2023-03-25 21:17:58 发布

dbcommando

最新推荐文章于 2023-03-25 21:17:58 发布

阅读量3.9k

点赞数

分类专栏： Oracle Trouble Shooting

本文链接：https://blog.csdn.net/dbcommando/article/details/8244770

版权

Oracle Trouble Shooting 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

接到一同事的电话说数据库启动后一分钟自动挂掉。

1、查看alert.log文件,重要信息如下：

Fri Nov 30 12:10:55 2012
Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_reco_952.trc:
ORA-02068: following severe error from DBL_SHCRM
ORA-03113: end-of-file on communication channel

Fri Nov 30 12:11:28 2012
Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_reco_952.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [0x612CFBF6] [] [] [] []

Fri Nov 30 12:11:32 2012
Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_pmon_2576.trc:
ORA-00476: RECO process terminated with error

Fri Nov 30 12:11:32 2012
PMON: terminating instance due to error 476
Fri Nov 30 12:11:32 2012
Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_ckpt_3156.trc:
ORA-00476: RECO process terminated with error

Fri Nov 30 12:11:32 2012
Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_dbw0_2200.trc:
ORA-00476: RECO process terminated with error

Fri Nov 30 12:11:32 2012
Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_mman_1744.trc:
ORA-00476: RECO process terminated with error

Fri Nov 30 12:11:32 2012
Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_q000_720.trc:
ORA-00476: RECO process terminated with error

Fri Nov 30 12:11:32 2012
Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_j000_1740.trc:
ORA-00476: RECO process terminated with error

Fri Nov 30 12:11:34 2012
Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_lgwr_1748.trc:
ORA-00476: RECO process terminated with error

Fri Nov 30 12:11:39 2012
Errors in file c:\oracle\product\10.1.0\admin\orcl\bdump\orcl_smon_3752.trc:
ORA-00476: RECO process terminated with error

Instance terminated by PMON, pid = 2576

2、查询错误ORA-02068：

oracle@p750703a:/home/oracle>oerr ora 02068
02068, 00000, "following severe error from %s%s"
// *Cause: A severe error (disconnect, fatal Oracle error) received from
//         the indicated database link.  See following error text.
// *Action: Contact the remote system administrator.
oracle@p750703a:/home/oracle>

结合alert.log中的信息与这个错误的描述是关于db_link的问题。

3、在mos上相关信息描述[ID 336208.1]：

This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review. 

Applies to: 
Oracle Server - Enterprise Edition - Version: 9.2.0.1 to 10.2.0.2 - Release: 9.2 to 10.2
Information in this document applies to any platform.
***Checked for relevance on 24-Nov-2011*** 
Symptoms
On RAC instance, with db link, select query is failing with

ORA-07445: exception encountered: core dump [] [SIGSEGV] [unknown code] [0x000000060] [] []
ORA-02068: following severe error from MBODMRTT
ORA-03113: end-of-file on communication channel

The complete sequence of messages in the alert log is similar to:

Errors in file /apps01/app/oracle/admin/MBODODST/udump/mbods1_reco_28993.trc:
ORA-07445: exception encountered: core dump [] [SIGSEGV] [unknown code] [0x000000060] [] []
ORA-02068: following severe error from MBODMRTT
ORA-03113: end-of-file on communication channel
Mon Sep 26 02:23:58 2005
Trace dumping is performing id=[cdmp_20050926022358]
Mon Sep 26 02:23:59 2005
Errors in file /apps01/app/oracle/admin/MBODODST/bdump/mbods1_pmon_28971.trc:
ORA-00476: RECO process terminated with error
Mon Sep 26 

Also, we might see the following errors in alert.log:

ORA-02068: following severe error from dblink_name
ORA-03113: end-of-file on communication channel
ORA-07445: exception encountered: core dump [lxsCpStr()+688] [SIGSEGV] [Address not mapped to object] [0x60] [] []

Cause
TAF is not supported for remote database links. TAF is not supported for DML statements. This limitation is now documented in Oracle 10.2 

The instance crash is caused by Bug 4496339 RECO PROCESS DIES WITH ORA-7445 [lxsCpStr] (Fixed-Releases: 10.2.0.3 and 11.1.0.6)

The fix avoids the instance from crashing. 

Solution
1/ To implement the solution, do the following while connected as sysdba. Please keep a spool of the output.

The procedure you are adopting is to clean up stranded 2pc entries in the database. Stranded entries means RECO process is not able to clean up the entries even after failure due to some reason. For every database where the action plan is applied identify the local_tran_id from dba_2pc_pending and use the procedure to purge it. The below steps would resolve the issue:

1. SQL> select * from dba_2pc_pending;

2. -- check the current value of _smu_debug_mode (default 0):

SQL> show parameter debug -- if default 0, it will show no entry

-- set it temporarily to 4:

3. SQL> alter system set "_smu_debug_mode" = 4; -- in 9.2x alter session can be used instead.

4. SQL> commit; -- so that the call to DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY is the first step of the transaction 

5. SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('<transaction id>');

6. SQL> commit;

7. SQL> alter system set "_smu_debug_mode" = <original value>;

8. SQL> commit;

9. SQL> select * from dba_2pc_pending;

2/ Download Patch 5021063  on 10.1.0.4 which includes the fix for this bug

OR 

3/ Download Patch 4496339 if available on your database version and platform. 

OR

4/ The bug also lists a workaround: do not use failover for dblinks.

OR

5/ The bug is fixed in 10.2.0.3 and 11.1.0.6 . Upgrade to one of these patchsets or higher. 

References
NOTE:100664.1 - Master Note for Troubleshooting Oracle Managed Distributed Transactions
NOTE:126069.1 - Manually Resolving In-Doubt Transactions: Different Scenarios
NOTE:159377.1 - How to Purge a Distributed Transaction from a Database
NOTE:290405.1 - ORA-30019 When Executing Dbms_transaction.Purge_lost_db_entry
PATCH:5021063 - MLR ON TOP OF 10.1.0.4 FOR CPUJAN_MERGE
BUG:4496339 - RECO PROCESS DIES WITH ORA-7445[LXSCPSTR]

相关内容

--------------------------------------------------------------------------------
产品 
--------------------------------------------------------------------------------

Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition 
关键字 
--------------------------------------------------------------------------------
DATABASE LINK; DBA_2PC_PENDING; DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY; DML; SIGSEGV; TAF 
错误 
--------------------------------------------------------------------------------
ORA-7445[LXSCPSTR]; ORA-476; ORA-7445; ORA-2068; ORA-3113; ORA-7445[EXCEPTION]

4、如上所述，查看一下对应的DB_LINK对应的配置为，问题在如下蓝色字体部分：

shcrmops12  =
  (DESCRIPTION =
    (FAILOVER = ON)
    (ADDRESS_LIST =
      #(ADDRESS = (PROTOCOL = TCP)(HOST = shcrmdb1.boss.sh.cmcc)(PORT = 1521))
      #(ADDRESS = (PROTOCOL = TCP)(HOST = shcrmdb2.boss.sh.cmcc)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.10.100.161)(PORT = 1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = 10.10.100.163)(PORT = 1521))
    )
    (CONNECT_DATA =
      (Service_name = shcrm)
      (failover_mode =
       (type = session)
       (backup = shcrm2)
      )
    )
  )

5、关闭DB_LINK配置中的TAF，作如下修改：

CREATEdatabaselinkDBL_SHCRM
CONNECTTO AIALM_SH IDENTIFIEDBY AIALM_SH
USING'(DESCRIPTION=
    (FAILOVER = ON)
    (ADDRESS_LIST =
      (ADDRESS =(PROTOCOL = TCP)(HOST = 10.10.100.163)(PORT = 1521))
    )
    (CONNECT_DATA =
     (Service_name = shcrm)
    )
  )';

6、重启实例，再没有此现象。

--------附-----

数据库版本：

Oracle Database 10g Enterprise Edition Release 10.1.0.2.0 - Prod
PL/SQL Release 10.1.0.2.0 - Production
CORE 10.1.0.2.0Production
TNS for 32-bit Windows: Version 10.1.0.2.0 - Production
NLSRTL Version 10.1.0.2.0 - Production

参考文章：ID 1447285.1

References

BUG:4496339 - RECO PROCESS DIES WITH ORA-7445[LXSCPSTR]
NOTE:126069.1 - Manually Resolving In-Doubt Transactions: Different Scenarios
NOTE:18455.1 - OERR: ORA 476 RECO process terminated with error
NOTE:336208.1 - Instance Crash While Using Dblink Ora-2068 ORA-07445 in RECO process
NOTE:460982.1 - How To Configure Server Side Transparent Application Failover