oracle adg不传输归档,Oracle 11g Active DataGuard(ADG) 优化及 duplicate hang 诊断案例

前段时间因为硬件原因,standby DB crash,主库删除了日志且无备份,后期需要重建,该环境是主备两套11.2.0.3 2-NODES RAC ON EXADATA Machine ,存储在ASM, 数据库70TB 大小, 闲时每天1T归档日志, 忙时每天可达4T+归档, 之前朋友重建DG一直使用主机上一个shell 放后台,我查看了该shell 确实不算复杂,但是duplicate 一直hang在开始。下面我只是简单记录一下这个过程,及后期的DATAGUARD 相关的一些优化细节。因为数据库太大,这里使用了duplicate for standby from active DATAbase, 另个每天日志产生量大,所以在空间有限的情况下要格外注意速度。

附一下shell的调用

1,create pfile (not using spfile)

DB_NAME=orarpt

DB_UNIQUE_NAME=rptstby

DB_BLOCK_SIZE=16384

2, shell adg_setup.sh

export ORACLE_SID=orarpt2

rman msglog /home/oracle/work/adg_setup.log << EOF

connect target sys/xxxxxx@rptpri2

connect auxiliary sys/xxxxx@rptstby

run {

allocate channel ch001 type disk;

allocate channel ch002 type disk;

allocate channel ch003 type disk;

allocate channel ch004 type disk;

allocate channel ch006 type disk;

allocate channel ch007 type disk;

allocate channel ch008 type disk;

allocate auxiliary channel ch005 type disk;

duplicate target DATAbase for standby from active database NOFILENAMECHECK

spfile

set db_unique_name 'rptstby'

set db_create_file_dest '+DATA'

set standby_file_management 'AUTO'

set remote_listener 'anbob-scan:1521'

set log_archive_dest_2 ''

set log_archive_dest_state_2 'DEFER'

set fal_server 'rptpri1','rptpri2'

set fal_client 'rptstby'

set control_files='+DATA/RPTSTBY/CONTROLFILE/ctrl1','+DATA/RPTSTBY/CONTROLFILE/ctrl2'

set audit_file_dest '/oracle/app/oracle/admin/rptstby/adump'

;

release channel ch001;

release channel ch002;

release channel ch003;

release channel ch004;

release channel ch005;

release channel ch006;

release channel ch007;

release channel ch008;

}

EOF

Tip:

调用很简单nohup 放到后台,但是查看日志, 它睡的好沉…

...

allocated channel: c8

channel c8: SID=834 device type=DISK

Starting Duplicate Db at 04-FEB-16

contents of Memory Script:

{

backup as copy reuse

targetfile '/oracle/app/oracle/product/11.2.0.3/dbhome_1/dbs/orapworarpt2' auxiliary format

'/oracle/app/oracle/product/11.2.0.3/dbhome_1/dbs/orapworarpt2' targetfile

'+DATA/orarpt/spfileorarpt.ora' auxiliary format

'+DATA/rptstby/parameterfile/spfile.1123.886959551' ;

sql clone "alter system set spfile= ''+DATA/rptstby/parameterfile/spfile.1123.886959551''";

}

executing Memory Script

Starting backup at 04-FEB-16 --hang here

note:

正常情况下早应该set newname, copy datafile了, 咨询了下其他DBA说真有几个小时不动的, 下班回家,第二天发现日志还在那里等着我,那就有点不正常了。

[oracle@anbob02 (orarpt2)work]$ ps -ef|grep adg

oracle 20574 16423 0 Feb03 pts/0 00:00:00 sh adg_setup.sh

oracle 20578 20574 0 Feb03 pts/0 00:00:02 rman msglog /home/oracle/work/adg_setup.log

[oracle@anbob02 (orarpt2)work]$ strace -fp 20578 -o rman.trc -ttt

Process 20578 attached - interrupt to quit

[oracle@anbob02 (orarpt2)work]$ tail -f rman.trc

20578 1454551745.092009 nanosleep({10, 0}, NULL) = 0

20578 1454551755.092238 write(12, "\0)\0\0\6\0\0\0\0\0\3N\221\2\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\2\301"..., 41) = 41

20578 1454551755.092309 read(12, "\0\334\0\0\6\0\0\0\0\0\6\1\2\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 220

20578 1454551755.093573 nanosleep({10, 0}, NULL) = 0

20578 1454551765.093786 write(12, "\0)\0\0\6\0\0\0\0\0\3N\222\2\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\2\301"..., 41) = 41

20578 1454551765.093858 read(12, "\0\334\0\0\6\0\0\0\0\0\6\1\2\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 220

20578 1454551765.094551 nanosleep({10, 0}, NULL) = 0

20578 1454551775.094778 write(12, "\0)\0\0\6\0\0\0\0\0\3N\223\2\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\2\301"..., 41) = 41

20578 1454551775.094842 read(12, "\0\334\0\0\6\0\0\0\0\0\6\1\2\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 220

20578 1454551775.095816 nanosleep({10, 0}, NULL) = 0

20578 1454551785.096035 write(12, "\0)\0\0\6\0\0\0\0\0\3N\224\2\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\2\301"..., 41) = 41

20578 1454551785.096106 read(12, "\0\334\0\0\6\0\0\0\0\0\6\1\2\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 220

20578 1454551785.096849 nanosleep({10, 0}, NULL) = 0

20578 1454551795.097062 write(12, "\0)\0\0\6\0\0\0\0\0\3N\225\2\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\2\301"..., 41) = 41

20578 1454551795.097139 read(12, "\0\334\0\0\6\0\0\0\0\0\6\1\2\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 220

20578 1454551795.097882 nanosleep({10, 0}, NULL) = 0

20578 1454551805.098073 write(12, "\0)\0\0\6\0\0\0\0\0\3N\226\2\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\2\301"..., 41) = 41

20578 1454551805.098141 read(12, "\0\334\0\0\6\0\0\0\0\0\6\1\2\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 220

20578 1454551805.098974 nanosleep({10, 0}, NULL) = 0

20578 1454551815.099187 write(12, "\0)\0\0\6\0\0\0\0\0\3N\227\2\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\2\301"..., 41) = 41

20578 1454551815.099260 read(12, "\0\334\0\0\6\0\0\0\0\0\6\1\2\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 220

20578 1454551815.100527 nanosleep({10, 0}, NULL) = 0

20578 1454551825.100707 write(12, "\0)\0\0\6\0\0\0\0\0\3N\230\2\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\2\301"..., 41) = 41

20578 1454551825.100783 read(12, "\0\334\0\0\6\0\0\0\0\0\6\1\2\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 220

20578 1454551825.101540 nanosleep({10, 0}, NULL) = 0

20578 1454551835.101744 write(12, "\0)\0\0\6\0\0\0\0\0\3N\231\2\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\2\301"..., 41) = 41

20578 1454551835.101813 read(12, "\0\334\0\0\6\0\0\0\0\0\6\1\2\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 220

20578 1454551835.102565 nanosleep({10, 0}, NULL) = 0

20578 1454551845.102764 write(12, "\0\35\0\0\6\0\0\0\0\0\3\2\232\376\377\377\377\377\377\377\377\0\0\0\0\0\0\0\0", 29) = 29

20578 1454551845.102836 read(12, "\0\26\0\0\6\0\0\0\0\0\10\6\0\0\0\t\1\0\0\0\230!", 8208) = 22

20578 1454551845.103462 write(12, "\1B\0\0\6\0\0\0\0\0\3\3\233\6\0\0\0\0\0\0\0\376\377\377\377\377\377\377\377\26\1\0"..., 322) = 322

20578 1454551845.103521 read(12, "\0\222\0\0\6\0\0\0\0\0\4\1\0\0\0\231!\1\1\0\0\0\0\0\0\0\0\0\6\0\0\0"..., 8208) = 146

20578 1454551845.104163 write(12, "\1\305\0\0\6\0\0\0\0\0\3G\234\370\200\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 453) = 453

20578 1454551845.104215 read(12, "\0\254\0\0\6\0\0\0\0\0\4\1\0\0\0\232!\1\0\0\0\0{\5\0\0\0\0\6\0\0\0"..., 8208) = 172

20578 1454551845.108028 write(12, "\0*\0\0\6\0\0\0\0\0\3N\235\6\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\375\1"..., 42) = 42

20578 1454551845.108079 read(12, "\0\254\0\0\6\0\0\0\0\0\4\1\0\0\0\233!\1\0\0\0\0{\5\0\0\0\0\6\0\0\0"..., 8208) = 172

20578 1454551845.111745 write(12, "\0\21\0\0\6\0\0\0\0\0\3\10\236\6\0\0\0", 17) = 17

20578 1454551845.111793 read(12, "\0\21\0\0\6\0\0\0\0\0\t\1\0\0\0\234!", 8208) = 17

20578 1454551845.112397 write(12, "\0)\0\0\6\0\0\0\0\0\3N\237\2\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\2\301"..., 41) = 41

20578 1454551845.112475 read(12, "\0\334\0\0\6\0\0\0\0\0\6\1\2\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 220

20578 1454551845.113188 nanosleep({10, 0}, NULL) = 0

20578 1454551855.113399 write(12, "\0)\0\0\6\0\0\0\0\0\3N\240\2\0\0\0\1\0\0\0\340\0\0\0\0\0\0\0\7\2\301"..., 41) = 41

20578 1454551855.113477 read(12, "\0\334\0\0\6\0\0\0\0\0\6\1\2\0\1\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 8208) = 220

20578 1454551855.114217 nanosleep({10, 0}, NULL) = 0

...

note:

strace 看进程好像在一直循环nanosleep, 查看了句柄41 是socket. 下面查看rman 是否有堵塞和等待。

SQL> select S.SID, S.SERIAL#, S.INST_ID , P.SPID SPID, S.USERNAME, S.STATUS, S.OSUSER || '@' || S.MACHINE OSINFO, SUBSTR(S.PROGRAM,0,20) PROGRAM, S.EVENT, S.SECONDS_IN_WAIT SIW

from gv$session s, gv$process p

where s.type <> 'BACKGROUND'

and s.inst_id = p.inst_id

and s.paddr = p.addr and s.program like 'rman%'

SID SERIAL# INST_ID SPID USERNAME STATUS OSINFO PROGRAM EVENT SIW

---- ---------- ---------- -------- -- ----------- - -------- -------------- --------------- ------------------------------ ----------

147 61685 2 16192 SYS INACTIVE oracle@anbob02 rman@anbob02 SQL*Net message from client 4930

1014 26077 2 13700 SYS INACTIVE oracle@anbob02 rman@anbob02 SQL*Net message from client 4909

1833 40727 2 12810 SYS INACTIVE oracle@anbob02 rman@anbob02 SQL*Net message from client 4910

299 60835 2 15130 SYS INACTIVE oracle@anbob02 rman@anbob02 SQL*Net message from client 4933

2653 1403 2 13628 SYS INACTIVE oracle@anbob02 rman@anbob02 SQL*Net message from client 5

196 61379 2 16980 SYS INACTIVE oracle@anbob02 rman@anbob02 SQL*Net message from client 4928

576 45345 2 14183 SYS ACTIVE oracle@anbob02 rman@anbob02 remote db operation 4910

1016 27113 2 14593 SYS INACTIVE oracle@anbob02 rman@anbob02 SQL*Net message from client 4935

55 21301 2 17624 SYS INACTIVE oracle@anbob02 rman@anbob02 SQL*Net message from client 4927

2021 49333 2 15689 SYS INACTIVE oracle@anbob02 rman@anbob02 SQL*Net message from client 4932

Note:

以DG 和wait event 为关键字在MOS中发现RMAN “duplicate From Active Database” Hangs Between 2 Datacenters using Cisco Firewall ( Doc ID 2007835.1 ) ,但是网络确认近期无网络变动,且不存在firewall。 开了个SR也倾向于网络原因, 启用一些EVNET 排查,因使用duplicateauxiliary instance 要以pfile启动, pfile 中增加下面event:

*.event='10294 trace name context forever, level 1: 10298 trace name context forever,level 3:logon trace name krb_trace level 15'

同时启用rman 启用debug

rman debug trace=/tmp/rman_debug msglog=/home/oracle/work/adg_setup_weejar.log << EOF

...

EOF

并且查看了duplicate 进程的 call stack

SQL> oradebug short_stack

ksedsts()+461 select * from v$iostat_network ;

select * from v$iostat_network ;

--无网络IO

# 从上面的event 和debug 中trace 发现了一些,其它没找到有价值的信息

ksfdwtio:count=4294967295 aioflags=0x20 timeout=2147483647

ksfd_osmwat:count=4294967295 intr=0 posted=(nil) tout=2147483647

ksfdwat_internal:count=4294967295 timeout=32 aioflags=2147483647

ksfdwtio:count=4294967295 aioflags=0x20 timeout=2147483647

ksfd_osmwat:count=4294967295 intr=0 posted=(nil) tout=2147483647

ksfdwat_internal:count=4294967295 timeout=32 aioflags=2147483647

ksfdwtio:count=4294967295 aioflags=0x20 timeout=2147483647

ksfd_osmwat:count=4294967295 intr=0 posted=(nil) tout=2147483647

ksfdwat_internal:count=4294967295 timeout=32 aioflags=2147483647

ksfdwtio:count=4294967295 aioflags=0x20 timeout=2147483647

Number of resource hash buckets is 15115

Large Pages allocation failed (free: 22981 required: 192)

2016-02-19 13:30:28.626: [ default]failed to initialize skgp context

2016-02-19 13:30:28.626: [ default]slos op : sslssreghdlr

2016-02-19 13:30:28.626: [ default]slos dep : Error 0 (0)

2016-02-19 13:30:28.626: [ default]slos loc : sskgpinit1

2016-02-19 13:30:28.626: [ default]slos info:

[ CLWAL]clsw_Initialize: OLR initlevel [30000]

2016-02-19 13:30:28.627: [ default]a_init: Unable to get log name. Retval:[-4]

后来尝试在pfile逐渐增加了些参数,还有些是因为错误加了其它项 最终

*.audit_file_dest='/oracle/app/oracle/admin/rptstby/adump'

*.compatible='11.2.0.3.0'

*.control_files='+DATA/rptstby/controlfile/control01.ctl','+DATA/rptstby/controlfile/control02.ctl'

*.db_block_size=16384

*.db_create_file_dest='+DATA'

*.db_domain=''

*.db_file_multiblock_read_count=32

*.db_file_name_convert='+data/orarpt/datafile/','+data/rptstby/datafile/','+data/ORARPT/tempfile/','+data/rptstby/tempfile/','+flashts/ORARPT/datafile/','+data/rptstby/datafile/','+reco/ORARPT/datafile/','+reco/rptstby/datafile/','+DBFS_DG/orarpt/datafile/','+DBFS_DG/rptstby/datafile/'

*.db_lost_write_protect='TYPICAL'

*.db_name='orarpt'

*.db_unique_name='rptstby'

*.diagnostic_dest='/oracle/app/oracle'

*.log_archive_dest_1='location=+reco'

*.log_archive_dest_2=''

*.log_archive_dest_state_2='DEFER'

*.log_archive_format='arch_%t_%s_%r.arc'

*.log_file_name_convert='+data/orarpt/onlinelog/','+data/rptstby/onlinelog/'

*.standby_file_management='AUTO'

*.utl_file_dir='*'

*.cluster_database=false

orarpt2.thread=2

orarpt1.thread=1

orarpt1.instance_number=1

orarpt2.instance_number=2

*.sga_max_size=40G

*.sga_target=0

*.shared_pool_size=6G

*.db_files=2000

*.pga_aggregate_target=10120855552

*.processes=2000

再启动rman duplicate 错误变成了

RMAN-03009: failure of backup command on d1 channel at 03/14/2016 13:50:07

ORA-00245: control file backup failed; target is likely on a local file system

这个问题的解决方法

RMAN> show all;

..

CONFIGURESNAPSHOT CONTROLFILE NAME TO '/oracle/app/oracle/product/11.2.0.3/dbhome_1/dbs/snapcf_orarpt1.f'; # default

修改到共享磁盘中

RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO '+data/snapcf_orarpt1.f';

经过上面修改了pfile和修改自动备份controlfile 目录到ASM,  rman终于顺利开始copy datafile了,因网络限制用了4天3夜把终于把数据copy完了, 剩下工作不在详细描述的如

create spfile in ASM from pfile,

modify clustr_database=true,

create pfile on other nodes with spfile in ASM.

$GIRD_HOME/bin/srvctl add database -d rptstby -o $ORACLE_HOME -p "+DATA/rptstby/parameterfile/spfileorarpt2.ora" -n orarpt -r physical_standby -s mount

srvctl add instance -d rptstby -i orarpt1 -n qdexa1db01

srvctl add instance -d rptstby -i orarpt2 -n qdexa1db02

srvctl modify database -d rptstby -a

srvctl start database -d rptstby

–start real-time redo apply

alter database recover managed standby database using current logfile disconnect from session;

--stop redo apply

alter database recover managed standby database cancel;

NOTE:

刚开始日志相差太多,如果启用ADG, open standby read-only时可能不会成功。

下面这个有些优化技术可以加速日志传速及日志应用。

1, 日志启用压缩传输 — on primary DB

如 LOG_ARCHIVE_DEST_2=’SERVICE=xxxx COMPRESSION=ENABLE ….’

2, 增加归档进程

增加 log_archive_max_processes — on primary DB

增加 log_archive_max_processes = log_archive_max_processes on primary* number of threads — on Standby DB

3, 调整内存参数

增加 standby DB 的 buffer cache 值,要大于primary DB

4, 启用并行日志恢复, 并行度 #CPUs * 2

如: alter database recover managed standby database PARALLEL 48 using current logfile disconnect from session;

# 查询

— 查看primary 库上的日志进度

select thread#,max(sequence#) from v$archived_log group by thread#;

select * from v$log;

— 查看standby 库,日志传速进程,应用进程

select t.*,arched-applied gap from (select thread#,max(sequence#) arched, max(decode(applied,'YES',sequence#,1)) applied from v$archived_log group by thread#) t;

select to_char(START_TIME,'dd.mm.yyyy hh24:mi:ss') "Recover_start",to_char(item)||' = '||to_char(sofar)||' '||to_char(units)||' '|| to_char(TIMESTAMP,'dd.mm.yyyy hh24:mi') "Values" from v$recovery_progress where start_time=(select max(start_time) from v$recovery_progress);

打赏

8732971891f4ba05583674ca6b8145ac.png微信扫一扫,打赏作者吧~

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值