ERROR OGG-01028

 
  最近客户1套OGG系统,抽取进程经常报 OGG-01028错误,该错误一般是由于OGG Bug 10356426引起的,日志如下:

 ERROR   OGG-01028  Oracle GoldenGate Capture for Oracle, extoa.prm:  encountered commit SCN 2561.1344492794 (11000755737850) that is not greater than the highest SCN already processed 2561.1344493093 (11000755738149) Redo Thread 1 (1) xid 78.31.31562 (0x004e.001f.00007b4a), starting seq.rba 12835.396542992, scn 2561.1344492791 (11000755737847), commit seq.rba 12835.396544448 commit timestamp 2013-03-28 22:26:20.000000.
2013-03-28 22:26:22  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, extoa.prm:  PROCESS ABENDING.
2013-03-28 22:27:22  WARNING OGG-01475  Oracle GoldenGate Manager for Oracle, mgr.prm:  Cannot automatically restart EXTRACT EXTOA, which abended due to an out of order transaction. Issue ETROLLOVER to advance the output trail sequence past the current trail sequence and restart.  Then, use ALTER EXTSEQNO on the subsequent pump EXTRACT, or REPLICAT, process group to start reading from the new trail file created by ALTER ETROLLOVER; the downstream process will not automatically switch to the new trail file.
2013-03-28 22:28:22  WARNING OGG-01475  Oracle GoldenGate Manager for Oracle, mgr.prm:  Cannot automatically restart EXTRACT EXTOA, which abended due to an out of order transaction. Issue ETROLLOVER to advance the output trail sequence past the current trail sequence and restart.  Then, use ALTER EXTSEQNO on the subsequent pump EXTRACT, or REPLICAT, process group to start reading from the new trail file created by ALTER ETROLLOVER; the downstream process will not automatically switch to the new trail file.
2013-03-28 22:29:13  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpeoa.prm:  Command received from GGSCI: STATS  daily totalsonly *.
参考  MOS ID 957112.1文章
 
处理步骤如下
1.检查2个节点的SCN 情况
select thread#,LAST_REDO_CHANGE# from v$thread;
2.检查2个节点的OS TIME 是否一致
SQL> alter session set nls_date_format='YYYY-MM-DD hh24:mi:ss';
Session altered.
SQL> select sysdate,dbtimezone from dual;
node1$ date
node2$ date

3.检查2个节点的redo genration 是否有很大差距
select  inst_id,round(sum(blocks*block_size/(1024*1024*1024)))
from gv$archived_log where completion_time> sysdate-7
group by inst_id;
4.通过2个节点的AWR报告 或SQL查看7天内的log file parallel write及log file sync延迟时间
--在node1 运行
select 1 inst_id,avg(avg_ms)
from (
select
      round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),1) avg_ms
from (
select
       s.dbid,
       to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI')  btime,
       total_waits count_end,
       time_waited_micro/1000 time_ms_end,
       Lag (e.time_waited_micro/1000)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg,
       Lag (e.total_waits)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg
from
       DBA_HIST_SYSTEM_EVENT e,
       DBA_HIST_SNAPSHOT s
where
       s.BEGIN_INTERVAL_TIME    and    s.snap_id=e.snap_id
   and e.event_name in (
                  'log file sync',
                  'log file parallel write'
                )
   and  s.dbid=e.dbid
)
order by btime
)
/
--在node2 再运行一次
select 2 inst_id,avg(avg_ms)
();
 
5.修改extract 参数文件
THREADOPTIONS MAXCOMMITPROPAGATIONDELAY 90000 IOLATENCY 180000
--MAXCOMMITPROPAGATIONDELAY范围为0-90000 默认为3000ms (3s)
--IOLATENCY最大为180000 默认为1500ms (1.5s)
通过以上5个步骤,确认排除故障隐患。
6.具体操作Solution Details:
6.1. Do an ETROLLOVER on Extract, and take note of the new sequence number of the trail file.
操作之前在源端查看最新的trail file号
ls -lt /goldengate/dirdat/oaxxxx,那么这里需要记录的应该是 xxxx+1,即ALTER EXTRACT extoa, ETROLLOVER后生成的。
ALTER EXTRACT extoa, ETROLLOVER
info extoa detail
info dpeoa detail
6.2. Start extract
   START EXTRACT extoa
6.3. Send PUMP, LOGEND, to see if it's at the end of the previous trail.
  SEND EXTRACT DPEOA, LOGEND
6.4. Once it is at the end of the trail file, You must stop the pump, and do an ETROLLOVER for it too. Take note of the new trail file sequence number that is created from this step.
操作之前在再次在源端查看最新的trail file号/goldengate/dirdat/oaxxxx,并在操作以下步骤后记录最新的sequence number
STOP EXTRACT DPEOA
ALTER EXTRACT DPEOA, ETROLLOVER
6.5. Alter the pump to SEQNO to the new trail file created from step #1.
这里的#####取6.1 记录的xxxx+1
  ALTER EXTRACT DPEOA, EXTSEQNO ##### EXTRBA 0
6.6. Restart pump
 START EXTRACT DPEOA
6.7. Send Replicat, LOGEND to make sure it has processed all the remaining data, and stop Replicat.
SEND REPLICAT  REPOA, LOGEND    ----提示yes时,执行下一步
STOP REPLICAT  REPOA
6.8. If replicat is not at end of trail, generate a report and forcestop replicat
SEND REPLICAT  REPOA, REPORT
STOP REPLICAT  REPOA!
6.9. Add the following parameters to replicat parameter file to allow replicat to process each trail record as a single transaction, or set them to 1 if you have any of these parameters:
GROUPTRANSOPS 1
MAXTRANSOPS 1
6.10. Restart replicat
START REPLICAT  REPOA
6.11. Once replicat has completely processed the trail, stop the replicat
STOP REPLICAT  REPOA
6.12. Edit the replicat parameter file:
     - Add parameter HANDLECOLLISIONS to Replicat parameter file
     - Remove or comment out GROUPTRANSOPS and MAXTRANSOPS or revert them back to their original values.
Note: There are pre-conditions for using  HANDLECOLLISIONS . Ther emust be either
a. no pkupdates
or
b. extract  has "FETCHOPTIONS FETCHPKUPDATECOLS"
Also all the tables should have Primary key or unique index on the table to avoid data integrity issues when using handlecollisions.
要确保所有的同步的表均有主键或者唯一索引才能避免数据完整性
6.13. ALTER REPLICAT, SEQNO to the new trail file created in step #4.
ALTER REPLICAT  REPOA, EXTSEQNO ###### EXTRBA 0
6.14. Start Replicat
START REPLICAT  REPOA
6.15. Once Replicat has processed the out of order SCN operations, disable HANDLECOLLISIONS. You could also look for the CSN and wait for Replicat to checkpoint past it.
一但replicat处理完无序的SCN操作后就可以取消掉这个参数了。
SEND REPLICAT  REPOA, NOHANDLECOLLISIONS.

6.16.Edit the replicat parameter and comment out the HANDLECOLLISIONS parameter. You do not need to stop/restart replicat. This will ensure that on any subsequent replicat restarts the parameter is disabled.
Note:
If the out of order transactions happen to the SAME record on the SAME table, then the Replicat will probably abend in step 7 or step 10. If that is the case, look at the discard file find out that particular record (the primary key value). Then add handlecollisions, continue with the steps below. Later, once the Replicat caught up, that particular record needs to be manually synced.
However the chance of an out of order transactions happening to the SAME record on SAME table is rare. Given an example, this will mean that the user application insert record A from node 1, then immediately it updated this record A on node 2. Out of order transactions usually happens to different tables or different records on same table.
In other words, if you do see collisions been handled in this recovery procedure, that means you have to re-sync ONLY those particular records that have collisions.

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/27571661/viewspace-761121/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/27571661/viewspace-761121/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值