故障内容:
公司的某个web页面突然无法打开,显示:
Load xx version failed, please check the database user or password is corrent, or column cversion in table tk_table is ok(select cversion from tk_table ),System start failed!
排查问题
排查的时候从故障看,第一眼就看出是数据库连接出的问题,登录数据库服务器后,首先查看了服务器的连接数是不是满了,看看数据库进程是不是在。很显然这2种情况很快排除不是该原因造成的。
登录oracle
登录oracle确认数据库是可以登录的。
1、是不是密码过期?
[oracle@s0086-gz ~]$ sqlplus /nolog SQL*Plus: Release 11.2.0.1.0 Production on Wed Aug 31 14:35:42 2016 Copyright (c) 1982, 2009, Oracle. All rights reserved. SQL> conn xx as sysdba Enter password: Connected. SQL> select * from dba_profiles s where s.profile='DEFAULT' and resource_name='PASSWORD_LIFE_TIME'; PROFILE RESOURCE_NAMERESOURCE ------------------------------ -------------------------------- -------- LIMIT ---------------------------------------- DEFAULT PASSWORD_LIFE_TIMEPASSWORD UNLIMITED |
密码设置的是不过期的,排除这个问题。
2、提示表的列不存在?
column cversion in table tk_table is ok(select cversion from tk_table
切换到该web使用的用户,比如eclogoy
登录省略... SQL> select TABLE_NAME from user_tables where TABLE_NAME='TK_TABLE'; TABLE_NAME ------------------------------ TK_TABLE SQL> desc tk_table ; Name Null? Type ----------------------------------------- -------- ---------------------------- COMPANYNAME VARCHAR2(100) LICENSE VARCHAR2(100) .... |
表没有问题。
到这里根据错误提示,好像都是错误提示的那样有问题,排查了都是正常,那会是什么原因导致的呢?
或许是被错误提示给带进了直接排查oracle的问题,那有没有可能是程序本身出问题了?
查看程序错误日志:
错误ORA-19815: db_recovery_file_dest_size 100%used
归档日志把闪回恢复区使用完了(注意:这里和服务器的磁盘空间不一样,服务器磁盘并没有满)
查看闪回恢复区的大小:
SQL> show parameter db_recovery_file_dest_size; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_recovery_file_dest_size big integer 50G |
SQL> select * from v$flash_recovery_area_usage;
FILE_TYPE PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES
-------------------- ------------------ -----------------------------------------------------------------------------------------------
CONTROL FILE 0 0 0
REDO LOG 0 0 0
ARCHIVED LOG 2.6 0 3
....
查看已经占用的空间大小:
SQL> select sum(percent_space_used)*3/100 from v$flash_recovery_area_usage;
SUM(PERCENT_SPACE_USED)*3/100
-----------------------------
.078
已经确认了是闪回区db_recovery_file_dest_size 被用完的问题了,那我们就清理下过期的归档日志:
退出后切换到 RMAN (Recovery Manager)模式:
[oracle@s0086-gz ~]$ rman target / Recovery Manager: Release 11.2.0.1.0 - Production on Wed Aug 31 15:00:54 2016 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. connected to target database: ECOLOGY (DBID=4086534050) RMAN> crosscheck archivelog all; #查看所有的归档日志 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=2000 device type=DISK validation succeeded for archived log archived log file name=/your's path/archivelog/2016_08_31/o1_mf_1_122_cwdm1qv9_.arc RECID=120 STAMP=921323835 validation succeeded for archived log archived log file name=/your's path/archivelog/2016_08_31/o1_mf_1_123_cwdm1v9l_.arc RECID=122 STAMP=921323839 validation succeeded for archived log archived log file name=/your's path/archivelog/2016_08_31/o1_mf_1_124_cwdm1v9t_.arc RECID=121 STAMP=921323837 Crosschecked 3 objects RMAN> delete expired archivelog all; #删除无效的archive log released channel: ORA_DISK_1 allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=2000 device type=DISK specification does not match any archived log in the repository RMAN> DELETE ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-7';#删除7天前的archive log(这里也可以用:delete archivelog until time 'SYSDATE-7') released channel: ORA_DISK_1 allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=2000 device type=DISK |
清理后,web页面恢复访问。
疑问:
是否可以设置参数定期清理过期归档日志?