Oracle 启动失败,报错 ORA-03113
ORA-03113: end-of-file on communication channel
SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 6.0801E+10 bytes
Fixed Size 7660704 bytes
Variable Size 8724155232 bytes
Database Buffers 5.1942E+10 bytes
Redo Buffers 126554112 bytes
Database mounted.
ORA-03113: end-of-file on communication channel
Process ID: 7512
Session ID: 406 Serial number: 27398
定位:
1.先找到Oracle告警日志位置,然后查看告警日志。
告警日志文件是一类特殊的跟踪文件(trace file),告警日志文件命名格式一般为:alert_<SID>.log,其中SID为ORACLE数据库实例名称。
数据库告警日志是按时间顺序记录message和错误信息。
文件路径可以通过命令:show parameter background_dump_dest 查看。
由于我本机startup失败无法查看参数,我通过其他的相同配置安装的数据库查看参数来定位日志文件路径:
SQL> startup mount
SQL> show parameter background_dump_dest; //或者使用:select value from v$parameter where name='background_dump_dest';
NAME TYPE VALUE
--------------------- ------------- ---------------------------------------
background_dump_dest string /opt/oracle/diag/rdbms/orcl/orcl/trace
相关查询命令:
查询 trace file 路径
SQL> select name, value from v$diag_info where name like '%Default%';
NAME VALUE
--------------------------------------------------------------------------------
Default Trace File /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_23843.trc
或者
SQL> select name, value from v$diag_info where name like '%Trace%';
NAME VALUE
--------------------------------------------------------------------------------
Diag Trace /opt/oracle/diag/rdbms/orcl/orcl/trace
Default Trace File /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_23843.trc
查询 user_dump_dest 路径 && 查询 background_dump_dest 路径:
SQL> show parameter user_dump_dest;
NAME TYPE VALUE
------------------------------------------------------------------------------
user_dump_dest string /opt/oracle/product/12.1.0.2.0/dbhome_1/rdbms/log
SQL> show parameter background_dump_dest;
NAME TYPE VALUE
----------------------------------------------------------------------------------
background_dump_dest string /opt/oracle/diag/rdbms/orcl/orcl/trace
或者
SQL> select name, value from v$parameter where name like '%dump%';
NAME VALUE
------------------------------------------------------------------------
shadow_core_dump partial
background_core_dump partial
background_dump_dest /opt/oracle/diag/rdbms/orcl/orcl/trace
NAME VALUE
------------------------------------------------------------------------
user_dump_dest /opt/oracle/product/12.1.0.2.0/dbhome_1/rdbms/log
core_dump_dest /opt/oracle/diag/rdbms/orcl/orcl/cdump
max_dump_file_size unlimited
如果实在没法查看到路径,那就直接搜索文件名:
[oracle@localhost opt]$ find $ORACLE_BASE -name alert_orcl.log
/opt/oracle/diag/rdbms/orcl/orcl/trace/alert_orcl.log
$ cd /opt/oracle/diag/rdbms/orcl/orcl/trace/
$ ls -alcr | grep alert
-rw-r----- 1 oracle dba 17055042508 Mar 28 10:16 alert_orcl.log
#查看日志寻找错误信息
$ tail -n -100 alert_orcl.log
Starting background process ARC3
Thu Mar 28 10:33:56 2019
ARC2 started with pid=30, OS id=7521
ARC1: Archival started
Thu Mar 28 10:33:56 2019
ARC3 started with pid=31, OS id=7523
ARC2: Archival started
Thu Mar 28 10:33:56 2019
ARC2: Becoming the 'no FAL' ARCH
ARC2: Becoming the 'no SRL' ARCH
Thu Mar 28 10:33:56 2019
ARC1: Becoming the heartbeat ARCH
Thu Mar 28 10:33:56 2019
ARC3: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
Thu Mar 28 10:33:56 2019
Errors in file /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7512.trc:
ORA-19815: WARNING: db_recovery_file_dest_size of 274877906944 bytes is 100.00% used, and has 0 remaining bytes available.
Thu Mar 28 10:33:56 2019
************************************************************************
You have following choices to free up space from recovery area:
1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,
then consider changing RMAN ARCHIVELOG DELETION POLICY.
2. Back up files to tertiary device such as tape using RMAN
BACKUP RECOVERY AREA command.
3. Add disk space and increase db_recovery_file_dest_size parameter to
reflect the new space.
4. Delete unnecessary files using RMAN DELETE command. If an operating
system command was used to delete files, then use RMAN CROSSCHECK and
DELETE EXPIRED commands.
************************************************************************
Thu Mar 28 10:33:56 2019
Errors in file /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7512.trc:
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 935042560 bytes disk space from 274877906944 limit
ARCH: Error 19809 Creating archive log file to '/opt/oracle/fast_recovery_area/ORCL/archivelog/2019_03_28/o1_mf_1_717_%u_.arc'
Thu Mar 28 10:33:56 2019
Errors in file /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7512.trc:
ORA-16038: log 102 sequence# 717 cannot be archived
ORA-19809: limit exceeded for recovery files
ORA-00312: online log 102 thread 1: '/opt/oracle/oradata/orcl/redo102.log'
ORA-00312: online log 102 thread 1: '/opt/oracle/oradata/orcl/redo102_2.log'
USER (ospid: 7512): terminating the instance due to error 16038
Thu Mar 28 10:33:56 2019
System state dump requested by (instance=1, osid=7512), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_diag_7443_20190328103356.trc
Thu Mar 28 10:33:56 2019
Dumping diagnostic data in directory=[cdmp_20190328103356], requested by (instance=1, osid=7512), summary=[abnormal instance termination].
Thu Mar 28 10:33:56 2019
Instance terminated by USER, pid = 7512
通过日志,找到了关键的错误信息:
Errors in file /opt/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_7512.trc:
ORA-16038: log 102 sequence# 717 cannot be archived
ORA-19809: limit exceeded for recovery files ========================》超出了恢复文件数的限制
ORA-00312: online log 102 thread 1: '/opt/oracle/oradata/orcl/redo102.log'
ORA-00312: online log 102 thread 1: '/opt/oracle/oradata/orcl/redo102_2.log'
USER (ospid: 7512): terminating the instance due to error 16038
官方问题说明:
ORA-19809: limit exceeded for recovery files
Cause:The limit for recovery files specified by the DB_RECOVERY_FILE_DEST_SIZE was exceeded.
Action:There are five possible solutions:
1) Take frequent backup of recovery area using RMAN.
2) Consider changing RMAN retention policy.
3) Consider changing RMAN archived log deletion policy.
4) Add disk space and increase DB_RECOVERY_FILE_DEST_SIZE.
5) Delete files from recovery area using RMAN.
查看 db_recovery_file_dest_size 参数设置的闪回空间最大限制:
$ sqlplus / as sysdba
SQL> startup mount
SQL> show parameter db_recovery_file_dest_size;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest_size big integer 3882M
SQL> show parameter db_recovery_file_dest; //查看闪回空间文件路径,du -sh 可看目录大小
SQL命令查看闪回空间当前使用情况:
$ sqlplus / as sysdba
SQL> startup mount
SQL> set linesize 180
SQL> show linesize
SQL> column FILE_TYPE format a32
SQL> select * from v$recovery_area_usage;
如果显示归档日志占满(ARCHIVED LOG的PERCENT_SPACE_USED接近100%),则查看归档日志存储路径:
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 696
Next log sequence to archive 696
Current log sequence 699
其中 USE_DB_RECOVERY_FILE_DEST 则代表使用的是闪回空间,归档日志存储在闪回空间路径下。归档日志满导致闪回空间使用占满,需要清理归档日志、或者单独配置归档日志路径、或者配置调大闪回空间参数限制。
报错的原因:数据插入导致闪回空间不足。
解决办法:常用的办法有两种,删除不需要的归档日志文件(如果是归档日志太大导致磁盘写满从而导致闪回日志无法创建或写入),或者增加闪回空间大小(db_recovery_file_dest_size,闪回空间设置不够导致闪回空间写满、达到闪回空间限制)。
根据实际情况选择合理的清除操作。可以直接进入RMAN删除归档日志腾出空间(也可以物理删除归档日志文件 -> 然后进入RMAN -> crosscheck archivelog all;->delete noprompt expired archivelog all;)
我的处理步骤如下:
$ sqlplus / as sysdba
SQL> startup mount
SQL> archive log list; //查看归档状态及路径,若Archive destination为USE_DB_RECOVERY_FILE_DEST,则执行
SQL> show parameter db_recovery_file_dest_size; //查看闪回空间大小
SQL> show parameter db_recovery_file_dest; //查看闪回空间文件路径,du -sh 可看目录大小
SQL> alter system set db_recovery_file_dest_size=10G; //修改闪回空间大小
进入RMAN,查看过期的归档日志、删除不需要的归档日志
SQL> exit
$ rman target /
RMAN> list archivelog all; #查看所有归档日志文件
RMAN> crosscheck archivelog all; #检查过期归档日志(检查控制文件和实际物理文件的差别)
RMAN> list expired archivelog all; #crosscheck执行后就可以查看有哪些过期的日志文件
RMAN> delete expired archivelog all; #删除所有过期归档日志(同步删除控制文件的信息和实际物理文件)
上述操作后,如果没有过期归档日志或者删除过期日志后磁盘还是没有回收多少、磁盘占用仍然很高的话,确认风险(可能导致丢数据的风险)可控后,可以执行下面删除命令来强制截断归档日志:
RMAN> delete archivelog all completed before 'sysdate - 15'; #删除15天前的日志,如果不要提示则加参数noprompt(该方法会在删除控制文件记录的同时、也会同步将物理文件删除)
RMAN> exit
$ sqlplus / as sysdba
$ alter database open
启动成功
相关命令介绍:
清除过期归档日志:
RMAN> crosscheck archivelog all; //检查控制文件和实际物理文件的差别
RMAN> delete noprompt expired archivelog all; //删除所有过期的日志
根据时间删除归档日志:
RMAN> delete noprompt archivelog until time 'sysdate-7'; //加noprompt无确认提示。删除系统时间7天以前的归档日志。
RMAN> delete archivelog all completed before 'sysdate-7'; //删除系统时间7天以前的归档日志,不会删除闪回区有效的归档日志,删除时有提示确认
RMAN> delete archivelog all completed before 'sysdate-1'; //同上,删除1天以前的归档日志
RMAN> delete archivelog all completed before 'sysdate'; //删除当前所有的归档日志
RMAM> delete noprompt archivelog all; //同上一命令
查看过期的归档日志:
RMAN> list expired archivelog all;
说明:
在control file中记录着每一个archive log的相关信息。
当我们在OS下把这些物理文件delete掉或异常变动后,在control file中仍然记录着这些archive log的信息;
当我们手工清除archive目录下的文件后,这些记录并没有被我们从controlfile中清除掉、Oracle并不知道这些文件已经不存在了。
因此,我们才要做手工的清除操作。
如果只是单独执行crosscheck而没有执行delete,那么备份还是会失败,因为那些控制文件的信息和实际的信息还是不一致。
补充说明:
1.备份集有两种状态:
A:Available,RMAN认为该项存在于备份介质上;
X:Expired,备份存在于控制文件或恢复目录中、但并没有物理存在于备份介质上。
2.crosscheck的目的是检查RMAN 的目录以及物理文件,如果物理文件不存在于介质上,将标记为Expired。如果物理文件存在,将维持Available。
如果原先标记为Expired的备份集再次存在于备份介质上(如恢复了损坏的磁盘),crosscheck将把状态重新从Expired标记回Available。
3.crosscheck 输出分两部分:第一部分列出确定存在于备份介质上的所有备份集片;第二部分列出不存在于备份介质上的备份集片,并将其标记为Expired。当设置备份保存策略后,一个备份过期,crosscheck之后标记为丢弃的备份状态依旧为availabel,要删除丢弃备份delete obsolete。
例如:Oracle数据库在迁移过程中一个归档文件丢失了,rman备份归档日志时提示某个归档日志找不到了。
则可以通过下面的操作,在不停止Oracle数据库情况下截断归档日志,重新开始新的归档:
RMAN> crosscheck archivelog all;
RMAN> delete archivelog all;
清除过程中遇到的错误及解决办法:
错误1:
RMAN> crosscheck archivelog all;
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of crosscheck command at 03/28/2019 12:59:23
RMAN-12010: automatic channel allocation initialization failed
RMAN-06403: could not obtain a fully authorized session
ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3640
Additional information: -2000252533
RMAN> exit
SQL> shutdown immediate
SQL> startup mount //启动数据库实例、但不打开数据库
Oracle启动参数说明,参考:Oracle数据库启动参数_sunny05296的博客-CSDN博客