oracle goldengate extract抽取进程异常终止,报错OGG-01973 The redo record indicates data loss on object
问题描述:
最近一段时间发现goldengate经常出现异常,extract进程终止,无法启动,检查日志报错如下:
可以看到错误ERROR OGG-01973 The redo record indicates data loss on object 94,785.
问题处理
通过查找oracle mos文档,找到一遍关于这个错误的介绍
OGG v11.2 Extract abends with "OGG-01965 The redo record indicates data loss on object " or "OGG-01973 The redo record indicates data loss on object " (Doc ID 1483431.1)
APPLIES TO:
Oracle GoldenGate - Version 11.2.1.0.1 and later
Information in this document applies to any platform.
SYMPTOMS
2012-08-02 09:10:09 ERROR OGG-01965 The redo record indicates data loss on object xxx,xxx.
or
2012-18-02 05:14:03 ERROR OGG-01973 The redo record indicates data loss on object xxx,xxx
CHANGES
CAUSE
Redo log records indicate data missing for the object in question and thus the error
SOLUTION
Prior to the code change in bug 14158453 Extract would abend with errors similar to following as described in KM note 1465720.1 for
tables/partitions with NOLOGGING regardless of whether the tablespace/ database is in logging/force logging mode.
ERROR OGG-01960 Failed to validate table <OWNER>.<TABLENAME>. The table is created with the NOLOGGING option, which is not supported.
Extract may not be able to capture data from it.
And DBOPTIONS ALLOWNOLOGGING can be added in the extract to work-around the error with known limitations as described in 1465720.1
However post the code change in bug 14158453 the parameter DBOPTIONS ALLOWNOLOGGING is no longer valid
Extract no longer relies on the table metadata (logging/ nologging) and instead uses special records in the redo log to detect
missing data and when detected it would error as below indicating the object id of the table in question (same type of error is signaled when table data has been loaded via SQL*Loader with UNRECOVERABLE parameter):
2012-08-02 09:10:09 ERROR OGG-01965 The redo record indicates data loss on object xxx,xxx.
The table can be excluded from the extract, enable logging at the table/ partition/ tablespace level (or if table data is loaded via SQL*Loader do not use UNRECOVERABLE) and include it back in the extract
if it needs to be replicated across
If using DDL extraction following can be run to see the offending object name
SELECT metadata_text, fragmentNo, optime FROM <owner>.GGS_DDL_HIST WHERE seqno = (SELECT MIN(seqno) FROM <owner>.GGS_DDL_HIST
WHERE objectid = (SELECT objectId FROM <owner>.GGS_DDL_HIST_ALT WHERE altObjectId = <object id reported> AND rownum=1));
eg.,
SELECT metadata_text, fragmentNo, optime FROM ggs_owner.GGS_DDL_HIST WHERE seqno = (SELECT MIN(seqno) FROM ggs_owner.GGS_DDL_HIST
WHERE objectid = (SELECT objectId FROM ggs_owner.GGS_DDL_HIST_ALT WHERE altObjectId = xxxxxx AND rownum=1));
Also following note can be referred to prevent these nologging operations
The Gains and Pains of Nologging Operations (Doc ID 290161.1)
The error can also be seen for Compression Advisor related tables (DBMS_TABCOMP* in 11.2.0.3 & CMP<x>$<object_id> in 11.2.0.4) and can be safely excluded from replication
REFERENCES
BUG:14405239 - ALLOWNOLOGGING IS NOT WORKING IN LATEST OGG V11.2.1.0.1 MLR VERSIONS
这是一个ogg11.2版本的bug,我本机使用的ogg版本是12.2,应该不是这个原因造成的,根据报错的object_id,找到是oracle2019用户下面的zs_ac06表
查看表是loging模式
检查附加日志没有启动,将附加日志开启,之后再启动ogg进程,发现还是不行报错。
实在是找不到原因了,网上有一些贴子说过滤掉这张表的同步,但是该表是业务需要使用的到没法过滤,最后不死心在mos上看到另外一遍文章Classic Extract Fails With Error “The redo record indicates data loss on object nnnnn” (Doc ID 2316712.1)
APPLIES TO:
Oracle GoldenGate - Version 12.2.0.1.0 and later
Information in this document applies to any platform.
SYMPTOMS
Classic extract on 12.2 fails with ERROR OGG-01973 ...The redo record indicates data loss on object nnnnn
Note that, Table with same object id exists on database
SQL> select object_name,owner, object_type from dba_objects where object_id=nnnn;
CHANGES
none
CAUSE
Redo corruption, Activity logs may show messages like
2017-10-11 13:56:54.136 INFO |er.redo.ora.thread |Redo_Thread_2| 5875 oracle/redoorardr.c | DETECT ZERO LENGTH RECORD1 at 93185656!
2017-10-11 13:56:54.136 INFO |er.redo.ora.thread |Redo_Thread_2| 6867 oracle/redoorardr.c | Completed pdata buffering in sequence 1288, with nb = 1, tb = 1, lb = 182003.
2017-10-11 13:56:54.136 INFO |er.redo.ora.thread |Redo_Thread_2| 262 oracle/redoorapl.c | PL_get: len=104, vld=4, subscn=1, SCN = 2501.2438258045 (10744151465341), seqno=1288, rba=93185552
2017-10-11 13:56:54.136 ERROR|er.redo.ora.thread |Redo_Thread_2| 6993 oracle/redoorardr.c | Detected an SCN base increase greater than 1, last_record_scn = 2501.2438257951 (10744151465247) current_rechdr.scn = 2501.2438258045 (10744151465341)
2017-10-11 13:56:54.136 INFO |gglog.std.application |Redo_Thread_4| 69 ggapp/ggPrintLn.c | [rpt]:
2017-10-11 13:56:54 ERROR OGG-01973 The redo record indicates data loss on object 437,723.
SOLUTION
Tranlogoptions allowdataloss will work for 12.2 OGG versions. This is made internal parameter on 12.2 onwards
If customer is on OGG 12.2, use _allowdataloss and restart the extract.
Tranlogoptions _allowdataloss
版本和问题描述和我现在的问题一致,根据文档的提示,在抽取进程上面加上Tranlogoptions _allowdataloss参数,之后再进行启动,等了大概半个小时,总算看到抽取日志有变动没有报错了,问题解决大功告成。