GOLDENGATE:

database version:11.2.0.2 RAC ASM
goldengate version :11.2.1.0.1

GGSCI (DB2-PAN) 17> info all


Program     Status      Group       Lag at Chkpt  Time Since Chkpt


MANAGER     RUNNING                                          

EXTRACT     RUNNING     EXT_D6      00:00:00      00:00:05    

EXTRACT     RUNNING     EXT_E6      00:00:00      00:48:56  

APPLIES TO:

   Oracle GoldenGate - Version 11.1.1.0.6 and later
   Information in this document applies to any platform.
   ***Checked for relevance on 02-January-2014***

SYMPTOMS:

      When running Oracle Golden Gate 11.1.1.0.6 or higher,  extract is "abending" every 4 hours on the hour. This approximates the same time or interval that Bounded Recovery is set to by default.
Extract can be restarted and continues to work but then fails again after 4 hours with the same errors as shown below.
ERROR

--------------------------------------------------

2014-01-02 18:34:56  INFO    OGG-01478  Oracle GoldenGate Capture for Oracle, ext_e6.prm:  Output file ./dirdat/la is using format RELEASE 11.2.

2014-01-02 18:34:56  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, ext_e6.prm:  Rolling over remote file ./dirdat/la000000.

2014-01-02 18:34:56  INFO    OGG-01053  Oracle GoldenGate Capture for Oracle, ext_e6.prm:  Recovery completed for target file ./dirdat/la000001, at RBA 1072.

2014-01-02 18:34:56  INFO    OGG-01057  Oracle GoldenGate Capture for Oracle, ext_e6.prm:  Recovery completed for all targets.

2014-01-02 18:34:56  INFO    OGG-01517  Oracle GoldenGate Capture for Oracle, ext_e6.prm:  Position of first record processed for Thread 2, Sequence 4, RBA 51202064, SCN 0.1425644, 2014-1-2 下午06:34:32.

2014-01-02 22:34:59  INFO    OGG-01738  Oracle GoldenGate Capture for Oracle, ext_e6.prm:  BOUNDED RECOVERY: CHECKPOINT: for object pool 1: p16681_Redo Thread 1: start=SeqNo: 41, RBA: 66879504, SCN: 0.1559643 (1559643), Timestamp: 2014-01-02 22:34:57.000000, Thread: 1, end=SeqNo: 41, RBA: 66880000, SCN: 0.1559643 (1559643), Timestamp: 2014-01-02 22:34:57.000000, Thread: 1.

2014-01-02 22:34:59  INFO    OGG-01738  Oracle GoldenGate Capture for Oracle, ext_e6.prm:  BOUNDED RECOVERY: CHECKPOINT: for object pool 2: p16681_Redo Thread 2: start=SeqNo: 5, RBA: 51252240, SCN: 0.1559642 (1559642), Timestamp: 2014-01-02 22:34:56.000000, Thread: 2, end=SeqNo: 5, RBA: 51252736, SCN: 0.1559642 (1559642), Timestamp: 2014-01-02 22:34:56.000000, Thread: 2.


CAUSE:

Under these conditions, this may be a problem with the Bounded Recovery Checkpoint file. It is likely corrupted.

SOLUTION:

The solution is to reset the Bounded Recovery Checkpoint file when restarting the extract like:
GGSCI> start <extract> BRRESET


REFERENCES:


=====================================

小结:早上上班例行巡检中一线同事告知OGG主备数据不一致,登陆服务器查看主库OGG抽取进程状态为running,但是chkpt时间持续48分钟没有更新,该进程属于hang死状态。查看MOS如上,通过start ext_e6 brreset 不能正常启动,最后通过手工 ps -ef | grep ext_e6,kill -9 ext_e6 系统进程号,然后 start ext_e6 启动进程成功。。


        ----thank you & best regards