公司生产环境ogg异常中断,该ogg为实时同步,目标端有业务调用
源端hp-ux Oracle 10.2.0.5
目标端 Linux Oracle 11.2.0.4
ogg11.2.1.0
GGSCI (node1) 1> view report EXT_E
***********************************************************************
Oracle GoldenGate Capture for Oracle
Version 11.2.1.0.20 18227972 OGGCORE_11.2.1.0.0OGGBP_PLATFORMS_140304.2209
HP/UX, IA64, 64bit (optimized), Oracle 10g on Mar 5 2014 01:53:55
Copyright (C) 1995, 2014, Oracle and/or its affiliates. All rights reserved.
Starting at 2019-04-10 17:20:58
***********************************************************************
Operating System Version:
HP-UX
Version U, Release B.11.31
Node: node1
Machine: ia64
soft limit hard limit
Address Space Size : unlimited unlimited
Heap Size : 2147483648 2147483648
File Size : unlimited unlimited
CPU Time : unlimited unlimited
Process id: 949
Description:
***********************************************************************
** Running with the following parameters **
***********************************************************************
2019-04-10 17:20:58 INFO OGG-03035 Operating system character set identified as hp-roman8. Locale: en_US_POSIX, LC_ALL:.
2019-04-10 17:20:58 INFO OGG-02696 NON-ANSI SQL parameter syntax is used for parameter parsing.
extract EXT_E
setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
2019-04-10 17:20:58 INFO OGG-02095 Successfully set environment variable NLS_LANG=AMERICAN_AMERICA.ZHS16GBK.
SETENV (ORACLE_HOME = /oracle/product)
2019-04-10 17:20:58 INFO OGG-02095 Successfully set environment variable ORACLE_HOME=/oracle/product.
userid ogg, password ***
exttrail ./dirdat/bk
DYNAMICRESOLUTION
REPORTCOUNT EVERY 1 MINUTES, RATE
DISCARDFILE ./dirrpt/bk.dsc,APPEND,MEGABYTES 1024
WARNLONGTRANS 5h,CHECKINTERVAL 30m
tranlogoptions altarchivelogdest instance orcl1 /arch1
tranlogoptions altarchivelogdest instance orcl2 /arch2
FETCHOPTIONS NOUSESNAPSHOT
table BJ.TAB_XX;
2019-04-10 17:20:58 INFO OGG-01815 Virtual Memory Facilities for: BR
anon alloc: mmap(MAP_ANON) anon free: munmap
file alloc: mmap(MAP_SHARED) file free: munmap
target directories:
/ogg/BR/EXT_E.
Bounded Recovery Parameter:
BRINTERVAL = 4HOURS
BRDIR = /ogg
2019-04-10 17:20:58 INFO OGG-01815 Virtual Memory Facilities for: COM
anon alloc: mmap(MAP_ANON) anon free: munmap
file alloc: mmap(MAP_SHARED) file free: munmap
target directories:
/ogg/dirtmp.
CACHEMGR virtual memory values (may have been adjusted)
CACHESIZE: 26.69G
CACHEPAGEOUTSIZE (normal): 8M
PROCESS VM AVAIL FROM OS (min): 34.69G
CACHESIZEMAX (strict force to disk): 30.69G
2019-04-10 17:20:59 WARNING OGG-01842 CACHESIZE PER DYNAMIC DETERMINATION (26.69G) LESS THAN RECOMMENDED: 64G (64bit system)
vm found: 34.69G
Check swap space. Recommended swap/extract: 128G (64bit system).
2019-04-10 17:20:59 INFO OGG-01639 BOUNDED RECOVERY: ACTIVE: for object pool 1: p5225_Redo Thread 1.
2019-04-10 17:20:59 INFO OGG-01640 BOUNDED RECOVERY: recovery start XID: 88.42.16180743.
2019-04-10 17:20:59 INFO OGG-01641 BOUNDED RECOVERY: recovery start position: SeqNo: 296008, RBA: 129491984, SCN: 25.1682570949 (
109056753349), Timestamp: 2019-04-09 06:38:41.000000.
2019-04-10 17:20:59 INFO OGG-01642 BOUNDED RECOVERY: recovery end position: SeqNo: 296008, RBA: 130515968, SCN: 25.1682571078 (10
9056753478), Timestamp: 2019-04-09 06:38:41.000000, Thread: 1.
2019-04-10 17:20:59 INFO OGG-01643 BOUNDED RECOVERY: CANCELED: for object pool 1: p5225_Redo Thread 1.
2019-04-10 17:20:59 INFO OGG-01639 BOUNDED RECOVERY: ACTIVE: for object pool 2: p5225_Redo Thread 2.
2019-04-10 17:20:59 INFO OGG-01640 BOUNDED RECOVERY: recovery start XID: 0.0.0.
2019-04-10 17:20:59 INFO OGG-01641 BOUNDED RECOVERY: recovery start position: SeqNo: 321771, RBA: 464402448, SCN: 25.1682570888 (
109056753288), Timestamp: 2019-04-09 06:40:58.000000, Thread: 2.
2019-04-10 17:20:59 INFO OGG-01642 BOUNDED RECOVERY: recovery end position: SeqNo: 321771, RBA: 464403456, SCN: 25.1682570888 (10
9056753288), Timestamp: 2019-04-09 06:40:58.000000, Thread: 2.
2019-04-10 17:20:59 INFO OGG-01643 BOUNDED RECOVERY: CANCELED: for object pool 2: p5225_Redo Thread 2.
2019-04-10 17:20:59 INFO OGG-01579 BOUNDED RECOVERY: VALID BCP: CP.EXT_E.000000687.
2019-04-10 17:20:59 INFO OGG-01629 BOUNDED RECOVERY: PERSISTED OBJECTS RECOVERED: <<NONE TO RECOVER>>.
Database Version:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE 10.2.0.5.0 Production
TNS for HPUX: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
Database Language and Character Set:
NLS_LANG = "AMERICAN_AMERICA.ZHS16GBK"
NLS_LANGUAGE = "SIMPLIFIED CHINESE"
NLS_TERRITORY = "CHINA"
NLS_CHARACTERSET = "ZHS16GBK"
2019-04-10 17:21:02 INFO OGG-00546 Default thread stack size: 4194304.
2019-04-10 17:21:02 INFO OGG-01513 Positioning to (Thread 1) Sequence 296012, RBA 479196176, SCN 25.1683917031.
2019-04-10 17:21:04 INFO OGG-01516 Positioned to (Thread 1) Sequence 296012, RBA 479196176, SCN 25.1683917031, Apr 9, 2019 9:22:3
5 AM.
2019-04-10 17:21:04 INFO OGG-01513 Positioning to (Thread 2) Sequence 321776, RBA 584531984, SCN 25.1684390878.
2019-04-10 17:21:05 INFO OGG-01516 Positioned to (Thread 2) Sequence 321776, RBA 584531984, SCN 25.1684390878, Apr 9, 2019 9:28:5
0 AM.
2019-04-10 17:21:05 INFO OGG-01517 Position of first record processed for Thread 1, Sequence 296012, RBA 479196176, SCN 25.168391
7016, Apr 9, 2019 9:22:35 AM.
2019-04-10 17:21:05 INFO OGG-01056 Recovery initialization completed for target file ./dirdat/bk000103, at RBA 2673, CSN 10906116
7989.
2019-04-10 17:21:05 INFO OGG-01478 Output file ./dirdat/bk is using format RELEASE 11.2.
2019-04-10 17:21:05 INFO OGG-01517 Position of first record processed for Thread 2, Sequence 321776, RBA 584564752, SCN 25.168439
0783, Apr 9, 2019 9:28:50 AM.
2019-04-10 17:21:05 INFO OGG-01026 Rolling over remote file ./dirdat/bk000103.
***********************************************************************
** Run Time Messages **
***********************************************************************
TABLE resolved (entry BJ.TAB_XX):
table "BJ"."TAB_XX";
Using the following key columns for source table BJ.TAB_XX: ID.
2019-04-10 17:22:08 INFO OGG-01054 Recovery completed for target file ./dirdat/bk000104, at RBA 1326, CSN 109061167989.
2019-04-10 17:22:08 INFO OGG-01057 Recovery completed for all targets.
ogg extract 无明显报错,查看ogg源端目标端无更改,检查dump 和replicate进程均正常,尝试start EXT_E,出现同样报错,陷入沉思,其他进程正常,抽取进程进程一直回滚和进行事物恢复,尝试跳过当前时间点启动。
GGSCI (node1) 2> info EXT_E
Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Thread #: 1
Sequence #: 296012
RBA: 479196176
Timestamp: 2019-04-09 09:22:35.000000
SCN: 25.1683917031 (109058099431)
Redo File: /arch1/log296012_1_672419717
Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Thread #: 2
Sequence #: 321776
RBA: 584531984
Timestamp: 2019-04-09 09:28:50.000000
SCN: 25.1684390878 (109058573278)
Redo File: /arch2/log321776_2_672419717
尝试从当前时间点启动--失败
alter extract EXT_E,tranlog , thread 1 ,begin 2019-04-09 09:22:35
alter extract EXT_E,tranlog , thread 2 ,begin 2019-04-09 09:28:50
后手残跳过一小时事物之后,ogg extract进程正常
alter extract EXT_E,tranlog , thread 1 ,begin 2019-04-09 11:07:06
alter extract EXT_E,tranlog , thread 2 ,begin 2019-04-09 11:09:26
start EXT_E
........................
第二天,发现extract正常,dump正常,replicate 进程down了,这里复制进程肯定会有问题,因为extract跳过了事物,而ogg又是实时业务。
由于目标端在另一台计算机上登录这里没有保留日志.....
报错大概意思为,update BJ.TAB_XX表 not found data ,和违反唯一约束,这里尝试跳过rba拉起进程。
alter REP_E, extseqno <sequence >, extrba <RBA>
报错为:
ERROR OGG - 01028 Incompatible record (101) in ./dirdat/bk000109, rba 678565467 (getting header)
查询资料无果,源端和目标端存在数据不一致,考虑补数。
对此故障处理思路:
1.skip
2.对目标端不一致数据手动更改
3.重建
显然1 3 不合适,重建代价太大还会全面影响目标端程序业务。
最终决定补数:
源端该表存在lob字段,目标端没有同步,这里目标端该表要比源端小的多,导出目标端表,此处用exp。数据泵在源端(10.2.0.5)导入报ora- 00600错误,此处没有细研究
exp xxx/xxx file=tab_xx_4_11.dmp log=tab_xx_4_11.log tables=bj.tab_xx buffer=655360 direct=y
此处需要该dmpheader
nohup imp text/text file=tab_xx_4_11.dmp log=tab_xx_4_11.log fromuser=xxx touser=text buffer=655360 &
源端查看表的行数与目标端比较
源端表
select /*+ parallel(a 10)*/ count(*) from BJ.TAB_XX as of timestamp to_timestamp('2019-04-12 11:50:00','YYYY-MM-DD HH24:MI:SS') --24531055
目标端表,此时ogg已经停止同步,不用指定timestamp
select /*+ parallel(a 5) */ count(*) from TAB_XX --24516887
相差14186行
---
创建临时表(注:此处timestamp 时间应为info EXT_E进程同步时间截取)
create TABLE TEXT as
select /*+ parallel(a 3)*/ ID,...(字段省略)....
from BJ.TAB_XX as of timestamp to_timestamp('2019-04-12 11:50:00','YYYY-MM-DD HH24:MI:SS')
minus
select /*+ parallel(b 3)*/ ID,...(字段省略)....
from text.TAB_XX ;
commit;
临时表大小为180339(有些为源端表update)
select count(*) from text
count(*)
---------
180339
导出临时表text到ogg目标端进行补数
expdp text/text DIRECTORY=DPUMP DUMPFILE=text_412_1.dmp LOGFILE=text_412_1.log TABLES=TEXT VERSION=11.2.0.4.0 parallel=8
impdp xxx/xxx directory=dpump dumpfile=text_412_1.dmp LOGFILE=text_412_1.log remap_schema=text:xxx
+++++++++++++++++++++++++++++
--源端和目标端minus
select count(*) from text --180339
--目标端已经存在的数据,此数据ogg同步时报违反唯一约束
select /*+ parallel(a 5) */ count(a.ID) from text a join tab_xx b on (a.ID=b.ID) --166171
--将目标端存在的数据备份临时表
create table text_temp as
select /*+ parallel(a 5) */ b.* from text a join tab_xx b on (a.ID=b.ID)
select count(*) from text_temp --166171
--验证数据(差值-重复值=缺值)
此处得出的结果正好和源端目标端 count(*)结果差一样
SELECT 180339-166171 FROM DUAL --14168
-创建索引
create index text_ind_ID on text_temp(ID)
select * from dba_indexes where table_name='TEXT_TEMP'
--删除ogg目标端表重复数据
select 'delete /*+ parallel(a 5) */ from tab_xx where ID='''||ID||''';' from text_temp;
--删除完毕检查数据
select /*+ parallel(a 5) */ count(*) from tab_XX ---24350716 -24350716
impdp xxx/xxx directory=dpump dumpfile=text_412_1.dmp LOGFILE=text_412_1.log remap_schema=text:xxx remap_table=text:tab_xx TABLE_EXISTS_ACTION=append remap_tablespace=USERS:DATA
--这里导入时 报一个触发器log字段不够,查发现此次ogg异常为trigger log字段引起的,导致ogg同步异常 extract 事物回滚恢复,由于不了解业务逻辑,咨询开发后加长报错字段导入成功
--impdp导入后检查数据
select /*+ parallel(a 5) */ count(*) from tab_XX ---24531055 -24531055
start REPLICAT rep_e ,begin 2019-04-12 11:50:00
启动正常,回复完毕
------------
本次出现插曲,原因是生成临时表timestamp时间没有选择extract的时间,是我自己指定的时间(引以为鉴)启动报错,insert 数据已经存在,原因是选定该时间点可能会有事物没有传输过去,导致目标端已经有该数据,ogg在此同步出错
参数文件加入
handlecollisions
start rep_e skiptransaction
send rep_e nohandlecollisions
正常
OCI Error ORA-00001: unique constraint (XXX.UK_TAB_XX$) violated (status = 1). INSERT /*+ RESTRICT_ALL_REF_CONS */ INTO "XXX"."
TAB_XX" ("ID","....省略..."
Aborting transaction on ./dirdat/bk beginning at seqno 109 rba 94382793
error at seqno 109 rba 94382793
Problem replicating BJ.TAB_XX to xxx.TAB_XX
Mapping problem with insert record (target format)...
*
ID = 89897878632
........省略
--------end