一起业务逻辑导致的ogg故障

公司生产环境ogg异常中断,该ogg为实时同步,目标端有业务调用


源端hp-ux   Oracle 10.2.0.5
目标端 Linux Oracle 11.2.0.4
ogg11.2.1.0

GGSCI (node1) 1> view report EXT_E


***********************************************************************
                 Oracle GoldenGate Capture for Oracle
 Version 11.2.1.0.20 18227972 OGGCORE_11.2.1.0.0OGGBP_PLATFORMS_140304.2209
  HP/UX, IA64, 64bit (optimized), Oracle 10g on Mar  5 2014 01:53:55
 
Copyright (C) 1995, 2014, Oracle and/or its affiliates. All rights reserved.


                    Starting at 2019-04-10 17:20:58
***********************************************************************

Operating System Version:
HP-UX
Version U, Release B.11.31
Node: node1
Machine: ia64
                         soft limit   hard limit
Address Space Size   :    unlimited    unlimited
Heap Size            :   2147483648   2147483648
File Size            :    unlimited    unlimited
CPU Time             :    unlimited    unlimited

Process id: 949

Description: 

***********************************************************************
**            Running with the following parameters                  **
***********************************************************************

2019-04-10 17:20:58  INFO    OGG-03035  Operating system character set identified as hp-roman8. Locale: en_US_POSIX, LC_ALL:.

2019-04-10 17:20:58  INFO    OGG-02696  NON-ANSI SQL parameter syntax is used for parameter parsing.
extract EXT_E
setenv (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)

2019-04-10 17:20:58  INFO    OGG-02095  Successfully set environment variable NLS_LANG=AMERICAN_AMERICA.ZHS16GBK.
SETENV (ORACLE_HOME = /oracle/product)

2019-04-10 17:20:58  INFO    OGG-02095  Successfully set environment variable ORACLE_HOME=/oracle/product.
userid ogg, password ***
exttrail ./dirdat/bk
DYNAMICRESOLUTION
REPORTCOUNT EVERY 1 MINUTES, RATE
DISCARDFILE ./dirrpt/bk.dsc,APPEND,MEGABYTES 1024
WARNLONGTRANS 5h,CHECKINTERVAL 30m
tranlogoptions altarchivelogdest instance orcl1 /arch1
tranlogoptions altarchivelogdest instance orcl2 /arch2
FETCHOPTIONS NOUSESNAPSHOT
table BJ.TAB_XX;

2019-04-10 17:20:58  INFO    OGG-01815  Virtual Memory Facilities for: BR
    anon alloc: mmap(MAP_ANON)  anon free: munmap
    file alloc: mmap(MAP_SHARED)  file free: munmap
    target directories:
    /ogg/BR/EXT_E.

Bounded Recovery Parameter:
BRINTERVAL = 4HOURS
BRDIR      = /ogg

2019-04-10 17:20:58  INFO    OGG-01815  Virtual Memory Facilities for: COM
    anon alloc: mmap(MAP_ANON)  anon free: munmap
    file alloc: mmap(MAP_SHARED)  file free: munmap
    target directories:
    /ogg/dirtmp.

CACHEMGR virtual memory values (may have been adjusted)
CACHESIZE:                            26.69G
CACHEPAGEOUTSIZE (normal):                8M
PROCESS VM AVAIL FROM OS (min):       34.69G
CACHESIZEMAX (strict force to disk):  30.69G

2019-04-10 17:20:59  WARNING OGG-01842  CACHESIZE PER DYNAMIC DETERMINATION (26.69G) LESS THAN RECOMMENDED: 64G (64bit system)
vm found: 34.69G
Check swap space. Recommended swap/extract: 128G (64bit system).

2019-04-10 17:20:59  INFO    OGG-01639  BOUNDED RECOVERY:  ACTIVE: for object pool 1: p5225_Redo Thread 1.

2019-04-10 17:20:59  INFO    OGG-01640  BOUNDED RECOVERY: recovery start XID: 88.42.16180743.

2019-04-10 17:20:59  INFO    OGG-01641  BOUNDED RECOVERY: recovery start position: SeqNo: 296008, RBA: 129491984, SCN: 25.1682570949 (
109056753349), Timestamp: 2019-04-09 06:38:41.000000.

2019-04-10 17:20:59  INFO    OGG-01642  BOUNDED RECOVERY: recovery end position: SeqNo: 296008, RBA: 130515968, SCN: 25.1682571078 (10
9056753478), Timestamp: 2019-04-09 06:38:41.000000, Thread: 1.

2019-04-10 17:20:59  INFO    OGG-01643  BOUNDED RECOVERY: CANCELED: for object pool 1: p5225_Redo Thread 1.

2019-04-10 17:20:59  INFO    OGG-01639  BOUNDED RECOVERY:  ACTIVE: for object pool 2: p5225_Redo Thread 2.

2019-04-10 17:20:59  INFO    OGG-01640  BOUNDED RECOVERY: recovery start XID: 0.0.0.

2019-04-10 17:20:59  INFO    OGG-01641  BOUNDED RECOVERY: recovery start position: SeqNo: 321771, RBA: 464402448, SCN: 25.1682570888 (
109056753288), Timestamp: 2019-04-09 06:40:58.000000, Thread: 2.

2019-04-10 17:20:59  INFO    OGG-01642  BOUNDED RECOVERY: recovery end position: SeqNo: 321771, RBA: 464403456, SCN: 25.1682570888 (10
9056753288), Timestamp: 2019-04-09 06:40:58.000000, Thread: 2.

2019-04-10 17:20:59  INFO    OGG-01643  BOUNDED RECOVERY: CANCELED: for object pool 2: p5225_Redo Thread 2.

2019-04-10 17:20:59  INFO    OGG-01579  BOUNDED RECOVERY: VALID BCP: CP.EXT_E.000000687.

2019-04-10 17:20:59  INFO    OGG-01629  BOUNDED RECOVERY: PERSISTED OBJECTS RECOVERED: <<NONE TO RECOVER>>.

Database Version:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for HPUX: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production

Database Language and Character Set:
NLS_LANG         = "AMERICAN_AMERICA.ZHS16GBK" 
NLS_LANGUAGE     = "SIMPLIFIED CHINESE" 
NLS_TERRITORY    = "CHINA" 
NLS_CHARACTERSET = "ZHS16GBK" 

2019-04-10 17:21:02  INFO    OGG-00546  Default thread stack size: 4194304.

2019-04-10 17:21:02  INFO    OGG-01513  Positioning to (Thread 1) Sequence 296012, RBA 479196176, SCN 25.1683917031.

2019-04-10 17:21:04  INFO    OGG-01516  Positioned to (Thread 1) Sequence 296012, RBA 479196176, SCN 25.1683917031, Apr 9, 2019 9:22:3
5 AM.

2019-04-10 17:21:04  INFO    OGG-01513  Positioning to (Thread 2) Sequence 321776, RBA 584531984, SCN 25.1684390878.

2019-04-10 17:21:05  INFO    OGG-01516  Positioned to (Thread 2) Sequence 321776, RBA 584531984, SCN 25.1684390878, Apr 9, 2019 9:28:5
0 AM.

2019-04-10 17:21:05  INFO    OGG-01517  Position of first record processed for Thread 1, Sequence 296012, RBA 479196176, SCN 25.168391
7016, Apr 9, 2019 9:22:35 AM.

2019-04-10 17:21:05  INFO    OGG-01056  Recovery initialization completed for target file ./dirdat/bk000103, at RBA 2673, CSN 10906116
7989.

2019-04-10 17:21:05  INFO    OGG-01478  Output file ./dirdat/bk is using format RELEASE 11.2.

2019-04-10 17:21:05  INFO    OGG-01517  Position of first record processed for Thread 2, Sequence 321776, RBA 584564752, SCN 25.168439
0783, Apr 9, 2019 9:28:50 AM.

2019-04-10 17:21:05  INFO    OGG-01026  Rolling over remote file ./dirdat/bk000103.

***********************************************************************
**                     Run Time Messages                             **
***********************************************************************

TABLE resolved (entry BJ.TAB_XX):
  table "BJ"."TAB_XX";
Using the following key columns for source table BJ.TAB_XX: ID.


2019-04-10 17:22:08  INFO    OGG-01054  Recovery completed for target file ./dirdat/bk000104, at RBA 1326, CSN 109061167989.

2019-04-10 17:22:08  INFO    OGG-01057  Recovery completed for all targets.


ogg extract 无明显报错,查看ogg源端目标端无更改,检查dump 和replicate进程均正常,尝试start EXT_E,出现同样报错,陷入沉思,其他进程正常,抽取进程进程一直回滚和进行事物恢复,尝试跳过当前时间点启动。

GGSCI (node1) 2> info EXT_E
 Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
    Thread #: 1
    Sequence #: 296012
    RBA: 479196176
    Timestamp: 2019-04-09 09:22:35.000000
    SCN: 25.1683917031 (109058099431)
    Redo File: /arch1/log296012_1_672419717

  Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
    Thread #: 2
    Sequence #: 321776
    RBA: 584531984
    Timestamp: 2019-04-09 09:28:50.000000
    SCN: 25.1684390878 (109058573278)
    Redo File: /arch2/log321776_2_672419717


尝试从当前时间点启动--失败
alter extract EXT_E,tranlog , thread 1 ,begin 2019-04-09 09:22:35
alter extract EXT_E,tranlog , thread 2 ,begin 2019-04-09 09:28:50

后手残跳过一小时事物之后,ogg extract进程正常


alter extract EXT_E,tranlog , thread 1 ,begin 2019-04-09 11:07:06
alter extract EXT_E,tranlog , thread 2 ,begin 2019-04-09 11:09:26

start  EXT_E

........................

第二天,发现extract正常,dump正常,replicate 进程down了,这里复制进程肯定会有问题,因为extract跳过了事物,而ogg又是实时业务。
由于目标端在另一台计算机上登录这里没有保留日志.....
报错大概意思为,update BJ.TAB_XX表 not found data ,和违反唯一约束,这里尝试跳过rba拉起进程。

alter REP_E, extseqno <sequence >, extrba <RBA>
报错为:
 ERROR  OGG - 01028 Incompatible record (101) in ./dirdat/bk000109, rba 678565467 (getting header)
查询资料无果,源端和目标端存在数据不一致,考虑补数。

对此故障处理思路:
1.skip 
2.对目标端不一致数据手动更改
3.重建

显然1 3 不合适,重建代价太大还会全面影响目标端程序业务。

最终决定补数:


源端该表存在lob字段,目标端没有同步,这里目标端该表要比源端小的多,导出目标端表,此处用exp。数据泵在源端(10.2.0.5)导入报ora- 00600错误,此处没有细研究

exp xxx/xxx file=tab_xx_4_11.dmp  log=tab_xx_4_11.log tables=bj.tab_xx buffer=655360 direct=y

此处需要该dmpheader
nohup imp text/text file=tab_xx_4_11.dmp  log=tab_xx_4_11.log  fromuser=xxx touser=text buffer=655360  &


源端查看表的行数与目标端比较

源端表  
select /*+ parallel(a 10)*/ count(*) from BJ.TAB_XX    as of timestamp to_timestamp('2019-04-12 11:50:00','YYYY-MM-DD HH24:MI:SS')  --24531055

目标端表,此时ogg已经停止同步,不用指定timestamp
select /*+ parallel(a 5) */ count(*) from TAB_XX  --24516887

相差14186行
---
创建临时表(注:此处timestamp 时间应为info EXT_E进程同步时间截取)

create TABLE TEXT as 
select /*+ parallel(a 3)*/ ID,...(字段省略)....
 from BJ.TAB_XX     as of timestamp to_timestamp('2019-04-12 11:50:00','YYYY-MM-DD HH24:MI:SS')
minus
select /*+ parallel(b 3)*/ ID,...(字段省略)....
  from text.TAB_XX  ; 
commit;

临时表大小为180339(有些为源端表update)
select count(*) from text
count(*) 
---------
180339

导出临时表text到ogg目标端进行补数


expdp  text/text DIRECTORY=DPUMP  DUMPFILE=text_412_1.dmp  LOGFILE=text_412_1.log TABLES=TEXT     VERSION=11.2.0.4.0  parallel=8
impdp  xxx/xxx directory=dpump dumpfile=text_412_1.dmp  LOGFILE=text_412_1.log  remap_schema=text:xxx

+++++++++++++++++++++++++++++

--源端和目标端minus 
select count(*)  from text  --180339
--目标端已经存在的数据,此数据ogg同步时报违反唯一约束
select /*+ parallel(a 5) */ count(a.ID)  from text  a join tab_xx b on (a.ID=b.ID) --166171
--将目标端存在的数据备份临时表
create table text_temp as 
select /*+ parallel(a 5) */ b.*  from text  a join tab_xx b on (a.ID=b.ID)

select count(*) from text_temp  --166171
--验证数据(差值-重复值=缺值)
此处得出的结果正好和源端目标端 count(*)结果差一样
SELECT 180339-166171 FROM DUAL  --14168 
-创建索引
create index text_ind_ID on text_temp(ID)

select * from dba_indexes where table_name='TEXT_TEMP'

--删除ogg目标端表重复数据

select 'delete /*+ parallel(a 5) */ from tab_xx where ID='''||ID||''';' from text_temp;


--删除完毕检查数据
select /*+ parallel(a 5) */ count(*) from tab_XX ---24350716 -24350716

impdp xxx/xxx directory=dpump dumpfile=text_412_1.dmp  LOGFILE=text_412_1.log  remap_schema=text:xxx remap_table=text:tab_xx TABLE_EXISTS_ACTION=append remap_tablespace=USERS:DATA 
--这里导入时 报一个触发器log字段不够,查发现此次ogg异常为trigger log字段引起的,导致ogg同步异常 extract 事物回滚恢复,由于不了解业务逻辑,咨询开发后加长报错字段导入成功

--impdp导入后检查数据 
select /*+ parallel(a 5) */ count(*) from tab_XX ---24531055 -24531055

start REPLICAT rep_e ,begin 2019-04-12 11:50:00
启动正常,回复完毕

------------
本次出现插曲,原因是生成临时表timestamp时间没有选择extract的时间,是我自己指定的时间(引以为鉴)启动报错,insert 数据已经存在,原因是选定该时间点可能会有事物没有传输过去,导致目标端已经有该数据,ogg在此同步出错
参数文件加入
handlecollisions

start rep_e skiptransaction

send rep_e nohandlecollisions

正常


OCI Error ORA-00001: unique constraint (XXX.UK_TAB_XX$) violated (status = 1). INSERT /*+ RESTRICT_ALL_REF_CONS */ INTO "XXX"."
TAB_XX" ("ID","....省略..."
Aborting transaction on ./dirdat/bk beginning at seqno 109 rba 94382793
                         error at seqno 109 rba 94382793
Problem replicating BJ.TAB_XX to xxx.TAB_XX
Mapping problem with insert record (target format)...
*
ID = 89897878632
........省略


--------end
 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值