由于归档路径设置不当，系统无法响应的问题

最新推荐文章于 2024-08-27 15:26:31 发布

ctest011501

最新推荐文章于 2024-08-27 15:26:31 发布

阅读量130

点赞数

由于归档路径设置不当，系统无法响应的问题
Author: Kevin Zou
Date: 2009-2-17

环境： DB 10.2.0.
OS: AIX 5.3
系统采用DBCA的方式来安装，安装完毕后没有更改配置。
一次在做大量数据刷新操作时，发现新提交到系统的SQL无法响应，整个系统都HANG住。
查看系统资源,iostat ,topas发现系统只有DISK的资源比较繁忙，其余都很空闲：
AIX h3cap61 3 5 0059362D4C00 02/18/09

System configuration: lcpu=4 drives=7 mode=Capped

16:19:19 device %busy avque r+w/s Kbs/s avwait avserv

16:19:21     hdisk0      0      0.0        0        3      0.0      0.2
             hdisk5     14      0.0      337     5392      0.0      0.0
             hdisk4      0      0.0        0        0      0.0      0.0
                cd0      0      0.0        0        0      0.0      0.0
             hdisk2      0      0.0        0        0      0.0      0.0
             hdisk3      0      0.0        0        0      0.0      0.0
             hdisk1      0      0.0        0        1      0.0     10.5

16:19:23     hdisk0      0      0.0        0        3      0.0      0.2
             hdisk5     15      0.0      358     5740      0.0      0.0
             hdisk4      0      0.0        0        0      0.0      0.0
                cd0      0      0.0        0        0      0.0      0.0
             hdisk2      0      0.0        0        0      0.0      0.0
             hdisk3      0      0.0        0        0      0.0      0.0
             hdisk1      0      0.0        0        0      0.0      0.0

Average      hdisk0      0      0.0        0        3      0.0      0.2
             hdisk5     14      0.0      347     5566      0.0      0.0
             hdisk4      0      0.0        0        0      0.0      0.0
                cd0      0      0.0        0        0      0.0      0.0
             hdisk2      0      0.0        0        0      0.0      0.0
             hdisk3      0      0.0        0        0      0.0      0.0
             hdisk1      0      0.0        0        0      0.0      5.3
查看等待事件：
SID EVENT
---------- ----------------------------------------------------------------
       128 library cache lock
       129 library cache lock
       130 switch logfile command
       131 library cache lock
       132 buffer busy waits
       134 log file switch (checkpoint incomplete)
       135 log file switch (checkpoint incomplete)
       140 Streams AQ: qmn slave idle wait
       142 log file switch (checkpoint incomplete)
       146 log file switch (checkpoint incomplete)
       149 Streams AQ: qmn coordinator idle wait

       SID EVENT
---------- ----------------------------------------------------------------
       150 log file switch (checkpoint incomplete)
       154 class slave wait
       156 ASM background timer
       159 log file switch (checkpoint incomplete)
       161 log file switch (checkpoint incomplete)
       164 buffer busy waits
       170 pmon timer
系统有大量的log file switch 的等待事件。
难道是归档目录满的原因导致系统无法进行新的事物：
SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     113
Next log sequence to archive   113
Current log sequence           115
SQL> show parameter USE_DB_RECOVERY_FILE_DEST

这里Archive destination 启用了新的参数USE_DB_RECOVERY_FILE_DEST 。在9i没有使用过，当时没有仔细去思考，而是把目光注意到library cache lock 的等待事件。
查询之后，发现library cache lock这些等待事件对应的session也没有运行任何的sql。
通过查看最耗cpu资源的进程，发现都是归档进程。
# ps -ef|grep 438720
    root 315398 496036   0 16:31:33 pts/4 0:00 grep 438720
oracle 438720      1   2   Feb 13      - 0:54 ora_arc1_kong
# ps -ef|grep 229542
oracle 229542      1   1   Feb 13      - 0:58 ora_arc0_kong
    root 401408 496036   0 16:31:43 pts/4 0:00 grep 229542

最后还是把目光转移到了归档进程上。想到刚开始出现的“陌生”参数：USE_DB_RECOVERY_FILE_DEST
找到了说明，原来是需要看db_recovery_file_dest 参数，而不是USE_DB_RECOVERY_FILE_DEST。NND!!
查看其配置：
SQL> show parameter db_recovery_file_dest

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string      /oracle/product/flash_recovery
                                                 _area
db_recovery_file_dest_size           big integer 2G
在OS上查看其空间余额：
/oracle/product/flash_recovery_area/KONG
$ cd ..
$ du -sg .
2.00    .

设置为其他目录，再次执行日志切换：
1* alter system switch logfile
SQL> /

System altered.
一切都正常了。
查看等待事件
SID EVENT
---------- ----------------------------------------------------------------
       132 jobq slave wait
       140 Streams AQ: qmn slave idle wait
       142 Streams AQ: waiting for time management or cleanup tasks
       149 Streams AQ: qmn coordinator idle wait
       154 class slave wait
       156 ASM background timer
       164 smon timer
       170 pmon timer
都是非重点关注的对象了。

在alert文件也有归档空间满的报错信息：
ARC0: Failed to archive thread 1 sequence 113 (19809)
Wed Feb 18 16:38:00 2009
Errors in file /oracle/product/admin/kong/bdump/kong_arc1_438720.trc:
ORA-19815: WARNING: db_recovery_file_dest_size of 2147483648 bytes is 100.00% used, and has 0 remaining bytes available.
Wed Feb 18 16:38:00 2009
************************************************************************
You have following choices to free up space from flash recovery area:
1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,
   then consider changing RMAN ARCHIVELOG DELETION POLICY.
2. Back up files to tertiary device such as tape using RMAN
   BACKUP RECOVERY AREA command.
3. Add disk space and increase db_recovery_file_dest_size parameter to
   reflect the new space.
4. Delete unnecessary files using RMAN DELETE command. If an operating
   system command was used to delete files, then use RMAN CROSSCHECK and
   DELETE EXPIRED commands.

再次证明alert日志很重要，有事没事拜读ALERT！！
不熟悉10g的特性，虚惊一场！
谢谢ITPUB上棉花的帮助

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/40239/viewspace-553604/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/40239/viewspace-553604/