这篇文章描述为RAC环境创建STANDBY数据库。

由于篇幅限制,加上碰到了很多的bug,只能将文章拆分成多篇。

由于错误太多,导致一篇文章无法完全记录下来,这一篇继续记录STANDBY数据库创建过程中碰到的问题。



刚刚在进行RAC环境的DUPLICATE DATABASE的时候,就碰到了很多问题,由于二者命令比较相似,本来认为这次不会碰到太多的问题,没有想到的是,这次碰到的问题居然比DUPLICATE碰到的问题多出一倍。而且基本上所有碰到的问题都是DUPLICATE操作时不曾遇到的。

执行DUPLICATE DATABASE FOR STANDBY时出现下面的错误:

bash-3.00$ rman target sys/test@rac11gauxiliary sys/test@rac11g1_s


Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:28:54 2008


Copyright (c) 1982, 2007, Oracle.  All rights reserved.


connected to target database: RAC11G(DBID=1712482917)


connected to auxiliary database: RAC11G(not mounted)


RMAN> run


2> {


3> allocate channel c1 device type disk connect sys/test@rac11g1;


4> allocate channel c2 device type disk connect sys/test@rac11g2;


5> allocate auxiliary channel ac1 device type disk;


6> allocate auxiliary channel ac2 device type disk;


7> duplicate target database for standby


8> db_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'


9> dorecover


10> from active database


11> spfile


12> parameter_value_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'


13> set log_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'


14> set fal_client='RAC11G_S'


15> set fal_server='RAC11G'


16> set log_archive_dest_1='LOCATION=+DATA/RAC11GVALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'


17> set log_archive_dest_2='SERVICE=rac11gLGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'


18> set standby_archive_dest='+DATA/RAC11G'


19> set db_unique_name='rac11g_s';


20> }


using target database control file instead of recovery catalog


allocated channel: c1


channel c1: SID=619 instance=rac11g1 device type=DISK


allocated channel: c2


channel c2: SID=119 instance=rac11g2 device type=DISK


allocated channel: ac1


channel ac1: SID=112 device type=DISK


allocated channel: ac2


channel ac2: SID=39 device type=DISK


Starting Duplicate Db at 09-SEP-08


contents of Memory Script.:


{


  backup as copy reuse


  file  '/data/oracle/product/11.1/database/dbs/orapwrac11g2' auxiliary format


'/data/oracle/product/11.1/database/dbs/orapwrac11g1'   file


'/dev/vx/rdsk/datavg/rac11g_spfile' auxiliary format


'+DATA/rac11g/spfilerac11g.ora'   ;


  sql clone "alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''";


}


executing Memory Script


Starting backup at 09-SEP-08


RMAN-03009: failure of backup command on c1 channel at 09/09/2008 16:29:06


ORA-19505: failed to identify file "/data/oracle/product/11.1/database/dbs/orapwrac11g2"


ORA-27037: unable to obtain file status


SVR4 Error: 2: No such file or directory


Additional information: 3


continuing other job steps, job failed will not be re-run


released channel: c1


released channel: c2


released channel: ac1


released channel: ac2


RMAN-00571: ===========================================================


RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============


RMAN-00571: ===========================================================


RMAN-03002: failure of Duplicate Db command at 09/09/2008 16:29:08


RMAN-03015: error occurred in stored script. Memory Script


RMAN-03009: failure of backup command on c1 channel at 09/09/2008 16:29:06


ORA-19505: failed to identify file "/data/oracle/product/11.1/database/dbs/orapwrac11g2"


ORA-27037: unable to obtain file status


SVR4 Error: 2: No such file or directory


Additional information: 3


RMAN> exit



Recovery Manager complete.


这个错误是由于连接源数据库的服务名采用了RAC的总体配置,因此很可能连接到实例2上进行恢复,有两种方法解决这个问题,一个是在任意节点上保证密码文件orapwrac11g1和orapwrac11g2都存在,另一种方法更加简单,可以连接源数据库的时候指定唯一一个实例进行连接,比如这里将RAC11G替换为RAC11G1。

bash-3.00$ rman target sys/test@rac11gauxiliary sys/test@rac11g1_s


Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:31:00 2008


Copyright (c) 1982, 2007, Oracle.  All rights reserved.


connected to target database: RAC11G(DBID=1712482917)


connected to auxiliary database: RAC11G(not mounted)


RMAN> run


2> {


3> allocate channel c1 device type disk connect sys/test@rac11g1;


4> allocate channel c2 device type disk connect sys/test@rac11g2;


5> allocate auxiliary channel ac1 device type disk;


6> allocate auxiliary channel ac2 device type disk;


7> duplicate target database for standby


8> db_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'


9> dorecover


10> from active database


11> spfile


12> parameter_value_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'


13> set log_file_name_convert '/dev/vx/rdsk/datavg', '+DATA/RAC11G'


14> set fal_client='RAC11G_S'


15> set fal_server='RAC11G'


16> set log_archive_dest_1='LOCATION=+DATA/RAC11GVALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'


17> set log_archive_dest_2='SERVICE=rac11gLGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'


18> set standby_archive_dest='+DATA/RAC11G'


19> set db_unique_name='rac11g_s';


20> }


using target database control file instead of recovery catalog


allocated channel: c1


channel c1: SID=619 instance=rac11g1 device type=DISK


allocated channel: c2


channel c2: SID=119 instance=rac11g2 device type=DISK


allocated channel: ac1


channel ac1: SID=113 device type=DISK


allocated channel: ac2


channel ac2: SID=39 device type=DISK


Starting Duplicate Db at 09-SEP-08


contents of Memory Script.:


{


  backup as copy reuse


  file  '/data/oracle/product/11.1/database/dbs/orapwrac11g2' auxiliary format


'/data/oracle/product/11.1/database/dbs/orapwrac11g1'   file


'/dev/vx/rdsk/datavg/rac11g_spfile' auxiliary format


'+DATA/rac11g/spfilerac11g.ora'   ;


  sql clone "alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''";


}


executing Memory Script


Starting backup at 09-SEP-08


Finished backup at 09-SEP-08


sql statement: alter system set spfile= ''+DATA/rac11g/spfilerac11g.ora''


contents of Memory Script.:


{


  sql clone "alter system set  log_file_name_convert =


''/dev/vx/rdsk/datavg'', ''+DATA/RAC11G'' comment=


'''' scope=spfile";


  sql clone "alter system set  fal_client =


''RAC11G_S'' comment=


'''' scope=spfile";


  sql clone "alter system set  fal_server =


''RAC11G'' comment=


'''' scope=spfile";


  sql clone "alter system set  log_archive_dest_1 =


''LOCATION=+DATA/RAC11GVALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=rac11g_s'' comment=


'''' scope=spfile";


  sql clone "alter system set  log_archive_dest_2 =


''SERVICE=rac11gLGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=rac11g'' comment=


'''' scope=spfile";


  sql clone "alter system set  standby_archive_dest =


''+DATA/RAC11G'' comment=


'''' scope=spfile";


  sql clone "alter system set  db_unique_name =


''rac11g_s'' comment=


'''' scope=spfile";


  shutdown clone immediate;


  startup clone nomount ;


}


executing Memory Script


sql statement: alter system set  log_file_name_convert =  ''/dev/vx/rdsk/datavg'', ''+DATA/RAC11G'' comment= '''' scope=spfile


released channel: c1


released channel: c2


released channel: ac1


released channel: ac2


RMAN-00571: ===========================================================


RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============


RMAN-00571: ===========================================================


RMAN-03002: failure of Duplicate Db command at 09/09/2008 16:31:11


RMAN-03015: error occurred in stored script. Memory Script


RMAN-03009: failure of sql command on clone_default channel at 09/09/2008 16:31:11


RMAN-11003: failure during parse/execution of SQL statement: alter system set  log_file_name_convert =  '/dev/vx/rdsk/datavg', '+DATA/RAC11G' comment= '' scope=spfile


ORA-17510: Attempt to do i/o beyond file size


ORA-17510: Attempt to do i/o beyond file size


RMAN> exit



Recovery Manager complete.


检查了一下,这个问题应该是ASM的bug造成的。关于ORA-17510的错误有一些,但是没有和当前情况相似的。

由于找不到解决这个问题的办法,只好想办法绕过这个bug。于是从这里以后,采用了先在本地创建一个SPFILE的方式,将所有需要修改的参数都提前修改完成,避免在DUPLICATE命令的时候设置SPFILE参数。这样Oracle就不会执行ALTER SYSTEM语句,也就不会导致SPFILE的动态扩展。

随后利用SQLPLUS创建了本地的SPFILE,然后启动到NOMOUNT状态,通过RMAN连接却报错:

bash-3.00$ rman target sys/test@rac11gauxiliary sys/test@rac11g1_s


Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 16:44:09 2008


Copyright (c) 1982, 2007, Oracle.  All rights reserved.


connected to target database: RAC11G(DBID=1712482917)


RMAN-00571: ===========================================================


RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============


RMAN-00571: ===========================================================


RMAN-00554: initialization of internal recovery manager package failed


RMAN-04006: error from auxiliary database: ORA-01031: insufficient privileges


这是因为RMAN刚才的恢复操作覆盖了密码文件,所以这里必须重建。从源数据库再次拷贝密码文件到本地,再次执行:

bash-3.00$ rman target sys/test@rac11gauxiliary sys/test@rac11g1_s


Recovery Manager: Release 11.1.0.6.0 - Production on Tue Sep 9 17:25:24 2008


Copyright (c) 1982, 2007, Oracle.  All rights reserved.


connected to target database: RAC11G(DBID=1712482917)


connected to auxiliary database: RAC11GS (not mounted)


RMAN> run


2> {


3> allocate channel c1 device type disk connect sys/test@rac11g1;


4> allocate channel c2 device type disk connect sys/test@rac11g2;


5> allocate auxiliary channel ac1 device type disk;


6> allocate auxiliary channel ac2 device type disk;


7> duplicate target database for standby


8> dorecover


9> from active database;


10> }


using target database control file instead of recovery catalog


allocated channel: c1


channel c1: SID=621 instance=rac11g1 device type=DISK


allocated channel: c2


channel c2: SID=628 instance=rac11g2 device type=DISK


allocated channel: ac1


channel ac1: SID=333 instance=rac11g1 device type=DISK


allocated channel: ac2


channel ac2: SID=306 instance=rac11g1 device type=DISK


Starting Duplicate Db at 09-SEP-08


contents of Memory Script.:


{


  backup as copy reuse


  file  '/data/oracle/product/11.1/database/dbs/orapwrac11g1' auxiliary format


'/data/oracle/product/11.1/database/dbs/orapwrac11g1'   ;


}


executing Memory Script


Starting backup at 09-SEP-08


Finished backup at 09-SEP-08


contents of Memory Script.:


{


  backup as copy current controlfile for standby auxiliary format  '+DATA/rac11g/rac11g_control_1';


  restore clone controlfile to  '+DATA/rac11g/rac11g_control_2' from


'+DATA/rac11g/rac11g_control_1';


  restore clone controlfile to  '+DATA/rac11g/rac11g_control_3' from


'+DATA/rac11g/rac11g_control_1';


  sql clone 'alter database mount standby database';


}


executing Memory Script


Starting backup at 09-SEP-08


channel c1: starting datafile copy


copying standby control file


released channel: c1


released channel: c2


released channel: ac1


released channel: ac2


RMAN-00571: ===========================================================


RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============


RMAN-00571: ===========================================================


RMAN-03002: failure of Duplicate Db command at 09/09/2008 17:25:35


RMAN-03015: error occurred in stored script. Memory Script


RMAN-03009: failure of backup command on c1 channel at 09/09/2008 17:25:35


ORA-17629: Cannot connect to the remote database server


ORA-17627: ORA-01031: insufficient privileges


ORA-17629: Cannot connect to the remote database server


RMAN> exit



Recovery Manager complete.


这个问题导致的原因比较复杂。在本地测试连接,没有发现任何问题。在远端用sys测试连接辅助实例,也没有发现任何异常的情况。

但是只要执行DUPLICATE命令就会报这个错误。

由于没有其他可供参考的信息,只能根据现有的错误信息进行推测。首先报错发生在CHANNEL c1上,而且是连接远端数据库服务器。说明问题是在执行过程中,源数据库尝试连接辅助实例时出现的错误,怀疑问题可能与密码文件有关。

可是通过命令行测试没有发现任何的问题。而且前面采用SPFILE方式的时候,也经历了类似的步骤,为什么没有报错。

通过观察Oracle执行的脚本发现了问题,Oracle执行了下面的操作:

contents of Memory Script.:


{


  backup as copy reuse


  file  '/data/oracle/product/11.1/database/dbs/orapwrac11g1' auxiliary format


'/data/oracle/product/11.1/database/dbs/orapwrac11g1'   ;


}


executing Memory Script


这个操作时前面通过SPFILE方式所没有的,这也是为什么原来没有报错,而这里报错的原因。

不过问题又出现了,Oracle这个操作的目录是将远端的密码文件同步到本地服务器上,使得后续的SYS连接可以正常登陆。这个操作本来是没有任何问题的,而且实际上前面我就是采用了操作系统上面的类似操作,将源数据库的密码文件同步到本地的。那么为什么Oracle的rman执行了这个操作后,源数据库再连接辅助实例的时候反而报错了呢。

检查了源数据库的配置后,发现了问题的所在,源数据库采用了裸设备的方式,而本地的$ORACLE_HOME/dbs目录下的密码文件只是一个链接而已。

$ ls -l


total 38404


-rw-rw----   1 oracle   oinstall    1552 Sep  5 16:09 hc_rac11g1.dat


-rw-r--r--   1 oracle   oinstall    8385 Sep 11  1998 init.ora


-rw-r--r--   1 oracle   oinstall   12920 May  3  2001 initdw.ora


-rw-r-----   1 oracle   oinstall      43 Jul 16 15:48 initrac11g1.ora


lrwxrwxrwx   1 oracle   oinstall      34 Jul 16 14:39 orapwrac11g1 -> /dev/vx/rdsk/datavg/rac11g_pwdfile


-rw-r-----   1 oracle   oinstall 19611648 Sep  8 18:27 snapcf_rac11g1.f


很可能Oracle在这里执行备份的时候出现了错误,没有采用裸设备备份而是当作操作系统文件进行了拷贝,导致恢复到辅助实例后,密码文件不可用。

于是先去掉了连接,直接将裸设备中的密码文件拷贝到本地目录,使得RMAN可以正常的拷贝这个文件:

$ dd if=/dev/vx/rdsk/datavg/rac11g_pwdfile f=/data/rac11g_pwfile


204800+0 records in


204800+0 records out


bash-3.00$ rm orapwrac11g1


bash-3.00$ rm orapwrac11g2


bash-3.00$ cp /data/rac11g_pwfile orapwrac11g1


再次执行DUPLICATE,则权限不足的问题被解决。



oracle视频教程请关注:http://u.youku.com/user_video/id_UMzAzMjkxMjE2.html