ERROR OGG-01028

最新推荐文章于 2023-12-27 07:59:27 发布

ckhwddy577380

最新推荐文章于 2023-12-27 07:59:27 发布

阅读量562

点赞数

文章标签：数据库

最近客户1套OGG系统，抽取进程经常报 OGG-01028错误，该错误一般是由于OGG Bug 10356426引起的，日志如下：

ERROR OGG-01028 Oracle GoldenGate Capture for Oracle, extoa.prm: encountered commit SCN 2561.1344492794 (11000755737850) that is not greater than the highest SCN already processed 2561.1344493093 (11000755738149) Redo Thread 1 (1) xid 78.31.31562 (0x004e.001f.00007b4a), starting seq.rba 12835.396542992, scn 2561.1344492791 (11000755737847), commit seq.rba 12835.396544448 commit timestamp 2013-03-28 22:26:20.000000.
2013-03-28 22:26:22 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, extoa.prm: PROCESS ABENDING.

2013-03-28 22:27:22 WARNING OGG-01475 Oracle GoldenGate Manager for Oracle, mgr.prm: Cannot automatically restart EXTRACT EXTOA, which abended due to an out of order transaction. Issue ETROLLOVER to advance the output trail sequence past the current trail sequence and restart. Then, use ALTER EXTSEQNO on the subsequent pump EXTRACT, or REPLICAT, process group to start reading from the new trail file created by ALTER ETROLLOVER; the downstream process will not automatically switch to the new trail file.
2013-03-28 22:28:22 WARNING OGG-01475 Oracle GoldenGate Manager for Oracle, mgr.prm: Cannot automatically restart EXTRACT EXTOA, which abended due to an out of order transaction. Issue ETROLLOVER to advance the output trail sequence past the current trail sequence and restart. Then, use ALTER EXTSEQNO on the subsequent pump EXTRACT, or REPLICAT, process group to start reading from the new trail file created by ALTER ETROLLOVER; the downstream process will not automatically switch to the new trail file.
2013-03-28 22:29:13 INFO OGG-01021 Oracle GoldenGate Capture for Oracle, dpeoa.prm: Command received from GGSCI: STATS daily totalsonly *.
参考 MOS ID 957112.1文章

处理步骤如下
1.检查2个节点的SCN 情况
select thread#,LAST_REDO_CHANGE# from v$thread;

2.检查2个节点的OS TIME 是否一致
SQL> alter session set nls_date_format='YYYY-MM-DD hh24:mi:ss';

Session altered.

SQL> select sysdate,dbtimezone from dual;
node1$ date
node2$ date

3.检查2个节点的redo genration 是否有很大差距
select inst_id,round(sum(blocks*block_size/(1024*1024*1024)))
from gv$archived_log where completion_time> sysdate-7
group by inst_id;

4.通过2个节点的AWR报告或SQL查看7天内的log file parallel write及log file sync延迟时间
--在node1 运行
select 1 inst_id,avg(avg_ms)
from (
select
      round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),1) avg_ms
from (
select
       s.dbid,
       to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI') btime,
       total_waits count_end,
       time_waited_micro/1000 time_ms_end,
       Lag (e.time_waited_micro/1000)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg,
       Lag (e.total_waits)
              OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg
from
       DBA_HIST_SYSTEM_EVENT e,
       DBA_HIST_SNAPSHOT s
where
       s.BEGIN_INTERVAL_TIME    and    s.snap_id=e.snap_id
   and e.event_name in (
                  'log file sync',
                  'log file parallel write'
                )
   and s.dbid=e.dbid
)
order by btime
)
/
--在node2 再运行一次
select 2 inst_id,avg(avg_ms)
();

5.修改extract 参数文件

THREADOPTIONS MAXCOMMITPROPAGATIONDELAY 90000 IOLATENCY 180000
--MAXCOMMITPROPAGATIONDELAY范围为0-90000 默认为3000ms (3s)
--IOLATENCY最大为180000 默认为1500ms (1.5s)

通过以上5个步骤，确认排除故障隐患。

6.具体操作Solution Details:
6.1. Do an ETROLLOVER on Extract, and take note of the new sequence number of the trail file.
操作之前在源端查看最新的trail file号
ls -lt /goldengate/dirdat/oaxxxx，那么这里需要记录的应该是 xxxx+1，即ALTER EXTRACT extoa, ETROLLOVER后生成的。

ALTER EXTRACT extoa, ETROLLOVER
info extoa detail
info dpeoa detail

6.2. Start extract

START EXTRACT extoa

6.3. Send PUMP, LOGEND, to see if it's at the end of the previous trail.

SEND EXTRACT DPEOA, LOGEND

6.4. Once it is at the end of the trail file, You must stop the pump, and do an ETROLLOVER for it too. Take note of the new trail file sequence number that is created from this step.
操作之前在再次在源端查看最新的trail file号/goldengate/dirdat/oaxxxx，并在操作以下步骤后记录最新的sequence number
STOP EXTRACT DPEOA
ALTER EXTRACT DPEOA, ETROLLOVER

6.5. Alter the pump to SEQNO to the new trail file created from step #1.
这里的#####取6.1 记录的xxxx+1

ALTER EXTRACT DPEOA, EXTSEQNO ##### EXTRBA 0

6.6. Restart pump

START EXTRACT DPEOA

6.7. Send Replicat, LOGEND to make sure it has processed all the remaining data, and stop Replicat.
SEND REPLICAT REPOA, LOGEND ----提示yes时，执行下一步
STOP REPLICAT REPOA

6.8. If replicat is not at end of trail, generate a report and forcestop replicat
SEND REPLICAT REPOA, REPORT
STOP REPLICAT REPOA!

6.9. Add the following parameters to replicat parameter file to allow replicat to process each trail record as a single transaction, or set them to 1 if you have any of these parameters:
GROUPTRANSOPS 1
MAXTRANSOPS 1

6.10. Restart replicat

START REPLICAT REPOA

6.11. Once replicat has completely processed the trail, stop the replicat
STOP REPLICAT REPOA

6.12. Edit the replicat parameter file:
- Add parameter HANDLECOLLISIONS to Replicat parameter file
- Remove or comment out GROUPTRANSOPS and MAXTRANSOPS or revert them back to their original values.

Note: There are pre-conditions for using HANDLECOLLISIONS . Ther emust be either

a. no pkupdates

or
b. extract has "FETCHOPTIONS FETCHPKUPDATECOLS"

Also all the tables should have Primary key or unique index on the table to avoid data integrity issues when using handlecollisions.
要确保所有的同步的表均有主键或者唯一索引才能避免数据完整性

6.13. ALTER REPLICAT, SEQNO to the new trail file created in step #4.

ALTER REPLICAT REPOA, EXTSEQNO ###### EXTRBA 0

6.14. Start Replicat

START REPLICAT REPOA

6.15. Once Replicat has processed the out of order SCN operations, disable HANDLECOLLISIONS. You could also look for the CSN and wait for Replicat to checkpoint past it.
一但replicat处理完无序的SCN操作后就可以取消掉这个参数了。
SEND REPLICAT REPOA, NOHANDLECOLLISIONS.

6.16.Edit the replicat parameter and comment out the HANDLECOLLISIONS parameter. You do not need to stop/restart replicat. This will ensure that on any subsequent replicat restarts the parameter is disabled.

Note:
If the out of order transactions happen to the SAME record on the SAME table, then the Replicat will probably abend in step 7 or step 10. If that is the case, look at the discard file find out that particular record (the primary key value). Then add handlecollisions, continue with the steps below. Later, once the Replicat caught up, that particular record needs to be manually synced.
However the chance of an out of order transactions happening to the SAME record on SAME table is rare. Given an example, this will mean that the user application insert record A from node 1, then immediately it updated this record A on node 2. Out of order transactions usually happens to different tables or different records on same table.
In other words, if you do see collisions been handled in this recovery procedure, that means you have to re-sync ONLY those particular records that have collisions.

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/27571661/viewspace-761121/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/27571661/viewspace-761121/

ckhwddy577380

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ERROR OGG-01028

最近客户1套OGG系统，抽取进程经常报OGG-01028错误，该错误一般是由于OGG Bug 10356426引起的，日志如下：ERROR OGG-01028 Oracle GoldenGate...
复制链接

扫一扫