一次NBU备份错误诊断

在对系统进行例行检查的时候,发现日常备份失败。


错误信息为:

RMAN> backup incremental level 0 database;

Starting backup at 10-MAR-08
using target database controlfile instead of recovery catalog
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: sid=120 devtype=SBT_TAPE
channel ORA_SBT_TAPE_1: VERITAS NetBackup for Oracle - Release 5.0GA (2003103006)
channel ORA_SBT_TAPE_1: starting incremental level 0 datafile backupset
channel ORA_SBT_TAPE_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/dev/vx/rdsk/maindbdg/lv_main00
input datafile fno=00008 name=/opt/oracle/oradata/oradata/bjdb01/users01.dbf
input datafile fno=00039 name=/opt/oracle/oradata/oradata/bjdb01/xdb02.dbf
input datafile fno=00009 name=/opt/oracle/oradata/oradata/bjdb01/xdb01.dbf
input datafile fno=00003 name=/opt/oracle/oradata/oradata/bjdb01/cwmlite01.dbf
input datafile fno=00004 name=/opt/oracle/oradata/oradata/bjdb01/drsys01.dbf
input datafile fno=00006 name=/opt/oracle/oradata/oradata/bjdb01/odm01.dbf
input datafile fno=00007 name=/opt/oracle/oradata/oradata/bjdb01/tools01.dbf
channel ORA_SBT_TAPE_1: starting piece 1 at 10-MAR-08
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_SBT_TAPE_1 channel at 03/10/2008 11:31:12
ORA-19506: failed to create sequential file, name="tpjatl1b_1_1", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
VxBSACreateObject: Failed with error:
Server Status: unable to allocate new media for backup, storage unit has none available

从这个错误信息上看似乎是空间不足造成的。不过虽然的备份错误信息变为:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch00 channel at 03/10/2008 05:14:15
ORA-19502: write error on file "bk_26552_1_648968690", blockno 664577 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
VxBSASendData: Failed with error:
Server Status: Communication with the server has not been iniatated or the server status has not been retrieved from the server.

从这个错误上看,就不只是空间的问题了。

通过图形界面jnbSA,发现很多管理选项点击后反应很慢,基本上出不来结果。于是采用bpadm从命令行方式进行查询,从REPORTPROBLEM中查询到下面的信息:

03/11/2008 01:45:04 backupcenter240 bpexpdate Could not build host list: client hostname could not be found
03/11/2008 02:13:34 backupcenter240 bjdb01 cannot write image to media id 000013, drive index 0, I/O
错误
03/11/2008 02:13:48 backupcenter240 bjdb01 backup by oracle on client bjdb01 using policy oracle: media write error
03/11/2008 02:14:04 backupcenter240 bjdb01 backup of client bjdb01 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 02:22:58 backupcenter240 bjdb01 cannot write image to media id 000013, drive index 0, I/O
错误
03/11/2008 02:23:12 backupcenter240 bjdb01 backup by oracle on client bjdb01 using policy oracle: media write error
03/11/2008 02:23:19 backupcenter240 bjdb01 suspending further backup attempts for client bjdb01, policy oracle, schedule Cumulative-Inc because it has exceeded the configured number of tries
03/11/2008 02:23:19 backupcenter240 bjdb01 backup of client bjdb01 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 02:23:20 backupcenter240 - scheduler exiting - the backup failed to back up the requested files (6)
03/11/2008 09:32:42 backupcenter240 data03 cannot write image to media id 000016, drive index 0, I/O
错误
03/11/2008 09:32:53 backupcenter240 data03 DOWN'ing drive index 0, it has had at least 3 errors in last 12 hour(s)
03/11/2008 09:32:55 backupcenter240 data03 backup by oracle on client data03 using policy bjdb03-ora: media write error
03/11/2008 09:33:02 backupcenter240 data03 backup of client data03 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 10:48:34 backupcenter240 data03 media manager terminated during mount of media id 000016, possible media mount timeout
03/11/2008 10:48:36 backupcenter240 data03 media manager terminated by parent process
03/11/2008 10:48:37 backupcenter240 data03 backup by oracle on client data03 using policy bjdb03-ora: the backup failed to back up the requested files
03/11/2008 10:48:38 backupcenter240 data03 suspending further backup attempts for client data03, policy bjdb03-ora, schedule diff because it has exceeded the configured number of tries
03/11/2008 10:48:38 backupcenter240 data03 backup of client data03 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 13:55:03 backupcenter240 bpexpdate Could not build host list: client hostname could not be found

进一步查询详细的log信息,发现存在大量的错误:

03/11/2008 18:23:59 backupcenter240 - cleaning job DB
03/11/2008 18:23:59 backupcenter240 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:23:59 backupcenter240 - no drives up on storage unit <backupcenter240-hcart-robot-tld-0>
03/11/2008 18:24:00 bjdb01 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:24:00 backupcenter240 - no drives up on storage unit <bjdb01-hcart-robot-tld-0>
03/11/2008 18:24:31 backupcenter240 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:24:31 backupcenter240 - no drives up on storage unit <unit_99>
03/11/2008 18:24:32 backupcenter240 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:24:32 backupcenter240 - no drives up on storage unit <unit_data>
03/11/2008 18:24:32 backupcenter240 data03 skipping backup of client data03, policy bjdb03-ora, schedule diff because it has exceeded the configured number of tries

从这个信息上看,似乎是机械手出现了问题。而且如果真的是机械手的问题,那么也可以解释前后两次备份错误信息的不同。当一个磁带备份满了之后,机械手尝试更换新的磁带,这时出现了故障,而对于当时备份的操作,就出现了无法写入的错误,报错没有足够空间。而随后的备份由于机械手故障,而导致没有可用的磁带可以写入,因此报错NETBACKUP没有初始化完成。

继续检查media的报告,在汇总信息中看到:

Number of ACTIVE media that, as of now:
There are no ACTIVE media present in the media database

这进一步确定了刚才的判断,机械手故障导致可用的磁带无法放到驱动器中,因此系统中没有可用的介质。

通过tpconfig检查机械手的状态:

Index DriveName DrivePath Type Shared Status
***** ********* ********** **** ****** ******
0 IBMULTRIUM-TD10 /dev/rmt/1cbn hcart Yes DOWN
TLD(0) Definition DRIVE=1

Currently defined robotics are:
TLD(0) robotic path = /dev/sg/c2t4l1,
volume database host = backupcenter240

机械手处于DOWN的状态,看来问题已经基本确定了。

尝试使用robtest检查机械手:

bash-2.03# robtest
Configured robots with local control supporting test utilities:
TLD(0) robotic path = /dev/sg/c2t4l1

Robot Selection
---------------
1) TLD 0
2) none/quit
Enter choice: 1

Robot selected: TLD(0) robotic path = /dev/sg/c2t4l1

Invoking robotic test utility:
/usr/openv/volmgr/bin/tldtest -r /dev/sg/c2t4l1 -d1 /dev/rmt/1cbn

Opening /dev/sg/c2t4l1
MODE_SENSE complete
Enter tld commands (? returns help information)
?

To exit the utility, type q or Q.

init - Initialize element status
initrange <d#|s#|p#|t> [#]- Init element status range
allow - Allow media removal
prevent - Prevent media removal
extend - Extend media access port
retract - Retract media access port
mode - Mode sense
m <from> <to> - Move medium
pos <to> - Position to drive or slot
s [d|p|t|s [n]] [raw] - Read element status
inquiry - Display vendor and product ID
rezero - Rezero unit
inport - Ready inport (media access port)
debug - Toggle debug mode for this utility
test_ready - Send a TEST UNIT READY to the device

<from> <to> specifies drive (d#), slot (s#), media access port (p#),
or transport (t#)
<d#|s#|p#|t#> is drive #, slot #, media access port #, or transport #
[#] is number of elements for d, s, p, or t
NOTE - drive # is 1 - Number of drives
slot # is 1 - Number of slots
media access port # is 1 - Number of media access port elements
transport # is 1 - Number of transports
<type> = (d)rive, (s)lot, media access (p)ort, or (t)ransport

unload <drive> - Issue SCSI unload
<drive> = d1 or 1, d2 or 2, d3 or 3 ... d648 or 648

inquiry
Inquiry_data: STK L40 0213
test_ready
Unit is ready
q

Robot Selection
---------------
1) TLD 0
2) none/quit
Enter choice:

尝试发出test_ready命令,等待一段时间后,发现机械手状态已经恢复正常:

Index DriveName DrivePath Type Shared Status
***** ********* ********** **** ****** ******
0 IBMULTRIUM-TD10 /dev/rmt/1cbn hcart Yes UP
TLD(0) Definition DRIVE=1

Currently defined robotics are:
TLD(0) robotic path = /dev/sg/c2t4l1,
volume database host = backupcenter240

下面尝试备份:

$ rman target /

Recovery Manager: Release 9.2.0.4.0 - 64bit Production

Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.

connected to target database: BJDB01 (DBID=3255963758)

RMAN> backup current controlfile;

Starting backup at 11-MAR-08
using target database controlfile instead of recovery catalog
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: sid=19 devtype=SBT_TAPE
channel ORA_SBT_TAPE_1: VERITAS NetBackup for Oracle - Release 5.0GA (2003103006)
channel ORA_SBT_TAPE_1: starting full datafile backupset
channel ORA_SBT_TAPE_1: specifying datafile(s) in backupset
including current controlfile in backupset
channel ORA_SBT_TAPE_1: starting piece 1 at 11-MAR-08
channel ORA_SBT_TAPE_1: finished piece 1 at 11-MAR-08
piece handle=ttjb17ur_1_1 comment=API Version 2.0,MMS Version 5.0.0.0
channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:04:56
Finished backup at 11-MAR-08

Starting Control File Autobackup at 11-MAR-08
piece handle=c-3255963758-20080311-00 comment=API Version 2.0,MMS Version 5.0.0.0
Finished Control File Autobackup at 11-MAR-08

尝试备份终于成功。

可惜的是,备份小的文件似乎没有问题,一旦备份文件比较大的时候,仍然出现上面的错误信息:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch00 channel at 03/10/2008 05:14:15
ORA-19502: write error on file "bk_26552_1_648968690", blockno 664577 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
VxBSASendData: Failed with error:
Server Status: Communication with the server has not been iniatated or the server status has not been retrieved from the server.

而且后台日志出现大量的IO错误信息:

03/12/2008 09:42:51 backupcenter240 bjdb01 cannot write image to media id 000016, drive index 0, I/O错误
03/12/2008 09:42:51 backupcenter240 bjdb01 FREEZING media id 000016, it has had at least 3 errors in the last 12 hour(s)
03/12/2008 09:43:08 backupcenter240 bjdb01 CLIENT bjdb01 POLICY oracle SCHED Default-Application-Backup EXIT STATUS 84 (media write error)
03/12/2008 09:43:08 backupcenter240 bjdb01 backup by oracle on client bjdb01: media write error

看来现在不仅仅是软件问题了,经过供应商最后确认,是带库的读写头出现问题,最终通过更换配件,解决了这个问题。

 

 

转载于:https://www.cnblogs.com/myitworld/archive/2008/04/22/2214883.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Legal Notice Copyright © 2017 Veritas Technologies LLC. All rights reserved. Veritas and the Veritas Logo are trademarks or registered trademarks of Veritas Technologies LLC or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. This product may contain third party software for which Veritas is required to provide attribution to the third party (“Third Party Programs”). Some of the Third Party Programs are available under open source or free software licenses. The License Agreement accompanying the Software does not alter any rights or obligations you may have under those open source or free software licenses. Please see the Third Party Legal Notice Appendix to this Documentation or TPIP ReadMe File accompanying this product for more information on the Third Party Programs. The product described in this document is distributed under licenses restricting its use, copying, distribution, and decompilation/reverse engineering. No part of this document may be reproduced in any form by any means without prior written authorization of Veritas Technologies LLC and its licensors, if any. THE DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. VERITAS TECHNOLOGIES LLC SHALL NOT BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING, PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINED IN THIS DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE. The Licensed Software and Documentation are deemed to be commercial computer software as defined in FAR 12.212 and subject to restricted rights as defined in FAR Section 52.227-19 "Commercial Computer Software - Restricted Rights" and DFARS 227.7202, et seq. "Commercial Computer Software and Commercial Computer Software Documentation," as applicable, and any successor regulations, whether delivered by Veritas as on premises or hosted services. Any use, modification, reproduction release, performance, display or disclosure of the Licensed Software and Documentation by the U.S. Government shall be solely in accordance with the terms of this Agreement.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值