linux6.9+oracle11.2.0.4 rac两节点+ogg12.2.0.1
1.故障现象:
开发对大表增加字段alter table XXX.TABX add IS_TROUBLE_POINT VARCHAR2(10) default '是';
出现卡住20分钟。
等字段加上后,发现ogg抽取进程异常,一直卡住不动,显示的时间是添加字段期间。
GGSCI (racj1) 17> info gdcq
EXTRACT GDCQ Last Started 2020-09-16 22:52 Status RUNNING
Checkpoint Lag 00:48:48 (updated 00:00:06 ago)
Process ID 118131
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:46 Thread 1, Seqno 26376, RBA 559309504
SCN 3956.660955621 (16991551578597)
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:52 Thread 2, Seqno 35339, RBA 5733392
SCN 3956.660966768 (16991551589744)
GGSCI (racj1) 18> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING GDCQ 00:49:09 00:00:01
EXTRACT RUNNING T180 00:00:00 00:00:00
EXTRACT RUNNING T245 00:00:00 00:00:09
EXTRACT RUNNING T247 00:00:00 00:00:00
GGSCI (racj1) 19> view report gdcq
***********************************************************************
Oracle GoldenGate Capture for Oracle
Version 12.2.0.1.170919 27731916_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Sep 4 2018 20:28:34
Copyright (C) 1995, 2017, Oracle and/or its affiliates. All rights reserved.
Starting at 2020-09-16 22:52:04
***********************************************************************
Operating System Version:
Linux
Version #1 SMP Sun Nov 10 22:19:54 EST 2013, Release 2.6.32-431.el6.x86_64
Node: racj1
Machine: x86_64
soft limit hard limit
Address Space Size : unlimited unlimited
Heap Size : unlimited unlimited
File Size : unlimited unlimited
CPU Time : unlimited unlimited
Process id: 118131
。。。。。。。。。。。。。。。。
2020-09-16 22:52:07 INFO OGG-01517 Position of first record processed for Thread 2, Sequence 35336, RBA 8674320, SCN 3956.658990278 (16991549613254), 2020 M09 16 21:12:08.
2020-09-16 22:52:07 INFO OGG-01053 Recovery completed for target file ./dirdat/gd014349, at RBA 1399.
2020-09-16 22:52:07 INFO OGG-01057 Recovery completed for all targets.
***********************************************************************
** Run Time Messages **
***********************************************************************
2020-09-16 22:52:25 INFO OGG-01644 BOUNDED RECOVERY: COMPLETE: for object pool 2: p187438_Redo_Thread_2 at SeqNo: 35337, RBA: 1555749904, SCN: 3956.659865963 (16991550488939), Thread: 2.
2020-09-16 22:52:41 WARNING OGG-00723 Record with class# 854, slt# 26, at seqno 35340, rba 638777872 SCN 3956.661637609 (16991552260585) has secondary transaction ID that is duplicate of existing open uncommitted
transaction.
2020-09-16 22:52:41 WARNING OGG-00715 [Thread #2] Purging transaction (transaction id: 419.26.15318211, start time: 2020-09-16 22:23:22, start seqno: 35340, start RBA: 631464976).
2020-09-16 22:52:41 WARNING OGG-00712 Updating I/O checkpoint after purging orphaned transactions on thread 2 with current position (Seq#: 35340, RBA: 638782168).
2020-09-16 22:52:46 INFO OGG-01644 BOUNDED RECOVERY: COMPLETE: for object pool 1: p187438_Redo_Thread_1 at SeqNo: 26373, RBA: 62238224, SCN: 3956.659866612 (16991550489588), Thread: 1.
2.分析原因:
GG Extract Report Shows "Purging Transaction", WARNING OGG-00723, OGG-00715, Updating I/O Checkpoint after Purging Orphaned Transactions, Has Secondary Transaction ID That Is Duplicate of Existing Open Uncommitted Transaction (Doc ID 1458472.1) |
In this Document
APPLIES TO: Oracle GoldenGate - Version 11.1.1.0.0 and later Extract report shows following messages: 2012-03-05 09:14:54 WARNING OGG-00723 Oracle GoldenGate Capture for Oracle, <group>.prm: Record with class# 34, slt# 4, at seqno xxxxx, 2012-03-05 09:14:54 WARNING OGG-00715 Oracle GoldenGate Capture for Oracle, <group>.prm: [Thread #1] Purging transaction 2012-03-05 09:14:54 WARNING OGG-00712 Oracle GoldenGate Capture for Oracle, <group>.prm: Updating I/O checkpoint after purging
1) This is due to the presence of XA (distributed transactions). This can be verified by running the following query: sql> spool output.txt Extract/ Classic Extract encountered a shutdown of an instance, and is purging any transactions that are open on that instance. 2) This can also be caused due to streamlined dblogreader code which was added in OGG version 11.2.1.0.7 and it was disabled in OGG version 11.2.1.0.32 SOLUTION 1) Upgrade to OGG v11.2 and configure "Integrated Extract" which supports XA transactions. How To Upgrade From Goldengate Classic Extract To Integrated Extract (Doc ID 1484313.1) For environments using Classic Extract, the following can be used as a work-around _clusterwide_global_transactions=false 1. What is the impact of setting "_clusterwide_global_transactions"=false This will ensure that ALL branches of XA goes to the same Oracle instance. However if the instance fails the whole XA transaction will roll back. 2. For existing connections will it be transparent ? It is not transparent. We still have to setup services to accept XA connections. From the Oracle 10.2 doc: For incoming transactions we will have to set up the services as explained above. Then mark the service for distributed transaction processing by Additional details on XA transactions are available in following document http://www.oracle.com/technetwork/products/clustering/overview/distributed-transactions-and-xa-163941.pdf 2) If using OGG versions 11.2.1.0.7 through 11.2.1.0.31 and 12.1.2.0.0 through 12.1.2.1.4 add TRANLOGOPTIONS _DISABLESTREAMLINEDDBLOGREADER to the extract to fix the issue with streamlined Note: The parameter _DISABLESTREAMLINEDDBLOGREADER is available only from v11.2.1.0.26 onwards. OGG Backport is needed for previous versions.
EFERENCES BUG:13837744 - OGG V11.2.1.0.0 CLASSIC EXTRACT SHOWING "PURGING TRANSACTION" AND DUPLICATE XID |
3.处理过程:
3.1尝试清除分布式事务,重启ogg抽取进程,无效:
SQL> execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('134.19.1059753');
commit;
rollback force '158.31.753501';
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('158.31.753501');
commit;
PL/SQL procedure successfully completed.
SQL>
Commit complete.
根据建议检查存在XA事务:
SQL> spool output.txt
SQL>
SQL> set headin off
SQL> select /*+ ORDERED */
2 '-------- Found Global XID -----------------------'||'
3 Current Time : '|| substr(to_char(sysdate,'dd-Mon-YYYY HH24.MI.SS'),1,22) ||'
4 '||'Ascii GTXID='||g.K2GTITID_ORA
5 from x$k2gte g;
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=ZYDB.786fbf59.140.14.96846
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=ZYDB.786fbf59.52.25.4831875
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=ZYDB.786fbf59.66.7.4564469
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=GDDB.6d36e46c.388.6.148806
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=GDDB.6d36e46c.44.16.14369948
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=ZYDB.786fbf59.58.32.5537388
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=ZYDB.786fbf59.410.11.2891826
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=ZYDB.786fbf59.414.2.7155947
-------- Found Global XID -----------------------
setCurrent Time : 16-Sep-2020 23.04.42
hea
din off
Ascii GTXID=ZYDB.786fbf59.128.21.36647
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=GDDB.6d36e46c.47.5.20547367
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=ZYDB.786fbf59.120.17.54075
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=ZYDB.786fbf59.120.17.54075
-------- Found Global XID -----------------------
Current Time : 16-Sep-2020 23.04.42
Ascii GTXID=GDDB.6d36e46c.229.32.173004
SQL> SQL> select /*+ ORDERED */
2 '----------------------------------------'||'
3 Curent Time : '|| substr(to_char(sysdate,'dd-Mon-YYYY HH24.MI.SS'),1,22) ||'
4 '||'GTXID='||substr(g.K2GTITID_EXT,1,10) ||'
5 '||'Ascii GTXID='||g.K2GTITID_ORA ||'
6 '||'Branch= '||g.K2GTIBID ||'
7 Local TX Id ='||substr(t.KXIDUSN||'.'||t.kXIDSLT||'.'||t.kXIDSQN,1,10) ||'
8 '||'KTUXESTA='|| x.KTUXESTA ||'
9 '||'KTUXEDFL='|| x.KTUXECFL
10 from x$k2gte g, x$ktcxb t, X$KTUXE x
11 where g.K2GTDXCB =t.ktcxbxba and
12 x.KTUXEUSN = t.KXIDUSN(+) and
13 x.KTUXESLT = t.kXIDSLT(+) and
14 x.KTUXESQN =t.kXIDSQN(+);
set headin on
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.47.5.20547367
Branch= 0000
Local TX Id =47.5.20547
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.388.6.148806
Branch= 0000
Local TX Id =388.6.1488
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.229.32.173004
Branch= 0000
Local TX Id =229.32.173
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.44.16.14369948
Branch= 0000
Local TX Id =44.16.1436
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.277.19.187305
Branch= 0000
Local TX Id =277.19.187
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.143.9.1082723
Branch= 0000
Local TX Id =143.9.1082
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.157.8.746628
Branch= 0000
Local TX Id =157.8.7466
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.377.11.149473
Branch= 0000
Local TX Id =377.11.149
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.382.20.149560
Branch= 0000
Local TX Id =382.20.149
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.91.13.2401791
Branch= 0000
Local TX Id =91.13.2401
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.82.11.2324289
Branch= 0000
Local TX Id =82.11.2324
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.67.12.20910544
Branch= 0000
Local TX Id =67.12.2091
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.130.4.1080328
Branch= 0000
Local TX Id =130.4.1080
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.77.28.8544656
Branch= 0000
Local TX Id =77.28.8544
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.263.0.196156
Branch= 0000
Local TX Id =263.0.1961
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.71.12.16407664
Branch= 5354415444425B31312E33312E31333539323439345D5B312E335D
Local TX Id =71.12.1640
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.71.12.16407664
Branch= 0000
Local TX Id =71.12.1640
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.342.4.261192
Branch= 0000
Local TX Id =342.4.2611
KTUXESTA=ACTIVE
KTUXEDFL=NONE
----------------------------------------
Curent Time : 16-Sep-2020 23.04.42
GTXID=474444422E
Ascii GTXID=GDDB.6d36e46c.107.4.731768
Branch= 0000
Local TX Id =107.4.7317
KTUXESTA=ACTIVE
KTUXEDFL=NONE
19 rows selected.
SQL> SQL> SQL>
SQL>
SQL>
SQL> spool off
3.2尝试添加隐含参数,重启数据库无效:
SQL> alter system set "_clusterwide_global_transactions"=false scope=spfile sid='*';
Does Oracle Commerce Require DTP Or _clusterwide_global_transactions Flags To Be Set For Oracle RAC (Doc ID 1945707.1) |
In this Document
APPLIES TO: Oracle Commerce Platform - Version 11.1 and later Should Distributed Transaction Processing (DTP) or Clusterwide XA transactions management (global_txn_processes) be used for RAC configuration when using Oracle Commerce?
Generally when XA is configured, either DTP or _clusterwide_global_transactions is set. However, when XA transactions use SUSPEND/RESUME operations the Clusterwide XA transactions management (global_txn_processes) introduced with 11g cannot be used. This limitation is documented in this document: ORA-24798 WHEN XA RESUMES JOIN AN EXISTING TRANSACTION BRANCH (Doc ID 1106793.1) Since Oracle Commerce transactions use SUSPEND/RESUME operations, _clusterwide_global_transactions will need to be set to "FALSE". Since Clusterwide XA transactions management cannot be used, per Oracle documentation it is recommended that DTP be used. For further details, look at Overview of XA Transactions and DTP Services section of the Real Application Clusters Administration and Deployment Guide. An additional reason to set _clusterwide_global_transactions to "FALSE" is discussed in NOTE:1361615.1 - High rdbms ipc reply and DFS lock handle in 11gR2 RAC With XA Enabled Application |
3.3抽取进程添加参数,无效:
STATOPTIONS RESETREPORTSTATS
EXTTRAIL ./dirdat/gd
TRANLOGOPTIONS DBLOGREADER
TRANLOGOPTIONS _DISABLESTREAMLINEDDBLOGREADER
3.4尝试删除XA进程,无效:
SQL> Select c.spid from v$process c where c.addr in (select b.paddr from v$session b where b.sid in(Select /*+ ORDERED */
w.sid
from x$k2gte g, x$ktcxb t, x$ksuse s, v$session_wait w, v$session s2
where g.K2GTDXCB =t.ktcxbxba
and g.K2GTDSES=t.ktcxbses
and s.addr=g.K2GTDSES
and w.sid=s.indx
and s2.sid = w.sid));
SPID
------------------------
104381
104384
137346
151232
121256
121420
121422
159793
8 rows selected.
SQL> !ps -ef|grep 104381
oracle 104381 1 94 Sep16 ? 00:28:50 oraclegddb1 (LOCAL=NO)
oracle 192341 129389 0 00:25 pts/3 00:00:00 /bin/bash -c ps -ef|grep 104381
oracle 192343 192341 0 00:25 pts/3 00:00:00 grep 104381
SQL> !ps -ef|grep 104384
oracle 104384 1 96 Sep16 ? 00:29:48 oraclegddb1 (LOCAL=NO)
oracle 192975 129389 0 00:25 pts/3 00:00:00 /bin/bash -c ps -ef|grep 104384
oracle 192977 192975 0 00:25 pts/3 00:00:00 grep 104384
SQL> !kill -9 104381
3.5尝试修改抽取rba起始位置,有变化:
GGSCI (racj1) 9> alter extract gdcq,tranlog,extseqno 35340,extrba 638782168,thread 2
ERROR: EXTRACT GDCQ is running and cannot be altered (1,2,No such file or directory).
GGSCI (racj1) 10> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING GDCQ 02:43:47 00:00:09
EXTRACT RUNNING T180 00:00:00 00:00:08
EXTRACT RUNNING T245 00:00:00 00:00:07
EXTRACT RUNNING T247 00:00:00 00:00:08
GGSCI (racj1) 11> stop gdcq
Sending STOP request to EXTRACT GDCQ ...
Request processed.
GGSCI (racj1) 12> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED GDCQ 02:44:01 00:00:04
EXTRACT RUNNING T180 00:00:00 00:00:07
EXTRACT RUNNING T245 00:00:00 00:00:06
EXTRACT RUNNING T247 00:00:00 00:00:07
GGSCI (racj1) 13> alter extract gdcq,tranlog,extseqno 35340,extrba 638782168,thread 2
EXTRACT altered.
GGSCI (racj1) 14> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED GDCQ 02:44:19 00:00:02
EXTRACT RUNNING T180 00:00:00 00:00:04
EXTRACT RUNNING T245 00:00:00 00:00:03
EXTRACT RUNNING T247 00:00:00 00:00:04
GGSCI (racj1) 15> start gdcq
Sending START request to MANAGER ...
EXTRACT GDCQ starting
GGSCI (racj1) 16> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STARTING GDCQ 02:44:19 00:00:06
EXTRACT RUNNING T180 00:00:00 00:00:07
EXTRACT RUNNING T245 00:00:00 00:00:06
EXTRACT RUNNING T247 00:00:00 00:00:07
GGSCI (racj1) 17> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING GDCQ 02:44:19 00:00:08
EXTRACT RUNNING T180 00:00:00 00:00:09
EXTRACT RUNNING T245 00:00:00 00:00:08
EXTRACT RUNNING T247 00:00:00 00:00:09
GGSCI (racj1) 21> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING GDCQ 02:44:19 00:00:30
EXTRACT RUNNING T180 00:00:00 00:00:02
EXTRACT RUNNING T245 00:00:00 00:00:01
EXTRACT RUNNING T247 00:00:00 00:00:02
GGSCI (racj1) 22> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING GDCQ 02:44:19 00:00:33
EXTRACT RUNNING T180 00:00:00 00:00:04
EXTRACT RUNNING T245 00:00:00 00:00:03
EXTRACT RUNNING T247 00:00:00 00:00:04
GGSCI (racj1) 26> info gdcq
EXTRACT GDCQ Initialized 2020-09-17 00:51 Status STOPPED
Checkpoint Lag 02:44:19 (updated 00:01:30 ago)
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:46 Thread 1, Seqno 26376, RBA 559309504
SCN 3956.660955621 (16991551578597)
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:52 Thread 2, Seqno 35340, RBA 638782168
SCN 0.0 (0)
GGSCI (racj1) 27> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED GDCQ 02:44:19 00:01:34
EXTRACT RUNNING T180 00:00:00 00:00:06
EXTRACT RUNNING T245 00:00:00 00:00:05
EXTRACT RUNNING T247 00:00:00 00:00:06
GGSCI (racj1) 41> view report gdcq
***********************************************************************
Oracle GoldenGate Capture for Oracle
Version 12.2.0.1.170919 27731916_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Sep 4 2018 20:28:34
Copyright (C) 1995, 2017, Oracle and/or its affiliates. All rights reserved.
Starting at 2020-09-17 00:54:51
***********************************************************************
Operating System Version:
Linux
Version #1 SMP Sun Nov 10 22:19:54 EST 2013, Release 2.6.32-431.el6.x86_64
Node: racj1
Machine: x86_64
soft limit hard limit
Address Space Size : unlimited unlimited
Heap Size : unlimited unlimited
File Size : unlimited unlimited
CPU Time : unlimited unlimited
Process id: 55218
。。。。。。。。。。。
2020-09-17 00:54:53 INFO OGG-01513 Positioning to (Thread 1) Sequence 26274, RBA 971472400, SCN 3955.971333569 (16987566989249).
Source Context :
SourceModule : [gglib.ggdatasource.DataSource]
SourceID : [/scratch/jkrath/view_storage/jkrath_soncustvew/oggcore/OpenSys/src/gglib/ggdatasource/DataSource.cpp]
SourceFunction : [establishStartPoint]
SourceLine : [175]
ThreadBacktrace : [11] elements
: [/ogg/12c/libgglog.so(CMessageContext::AddThreadContext()+0x1b) [0x7f9471c7820b]]
: [/ogg/12c/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x134) [0x7f9471c72d74]]
: [/ogg/12c/libgglog.so(_MSG_ERR_CHECKPOINT_GENERIC(CSourceContext*, char const*, CMessageFactory::MessageDisposition)+0x30) [0x7f9471c48634]]
: [/ogg/12c/extract(ggs::gglib::ggdatasource::DataSource::establishStartPoint(time_elt_def&, time_elt_def&)+0xef) [0x87fe4f]]
: [/ogg/12c/extract() [0x6ef0e1]]
: [/ogg/12c/extract(ggs::gglib::MultiThreading::MainThread::ExecMain()+0x60) [0x752c80]]
: [/ogg/12c/extract(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*)+0x14d) [0x753d5d]]
: [/ogg/12c/extract(ggs::gglib::MultiThreading::MainThread::Run(int, char**)+0xb1) [0x753e41]]
: [/ogg/12c/extract(main+0x3b) [0x6eff1b]]
: [/lib64/libc.so.6(__libc_start_main+0xfd) [0x3396a1ed1d]]
: [/ogg/12c/extract() [0x69aed1]]
2020-09-17 00:56:07 ERROR OGG-00446 Opening file +ARCHDG/1_26274_986573398.dbf in DBLOGREADER mode: (308) ORA-00308: cannot open archived log '+ARCHDG/1_26274_986573398.dbf'
ORA-17503: ksfdopn:2 Failed to open file +ARCHDG/1_26274_986573398.dbf
ORA-15173: entry '1_26274_986573398.dbf' does not exist in directory '/'
Not able to establish initial position for sequence 26274, rba 971472400.
2020-09-17 00:56:07 ERROR OGG-01668 PROCESS ABENDING.
这里发现报错信息发生变化,说明可以继续往前抽取日志了。此处由于数据库每4小时会自动备份归档,然后删除,导致归档日志文件不存在,所以只能考虑从带库中恢复归档日志。
通过grid用户asmcmd查看归档日志存在的情况,确定需要抽取哪些,从26274开始的归档日志。
[oracle@racj2 netbackup]$ more bp.conf
SERVER = nbujxq
SERVER = racj2
SERVER = nbu1
SERVER = nbu2
SERVER = 5240-jxq
CLIENT_NAME = racj2
CONNECT_OPTIONS = localhost 1 0 2
USE_VXSS = PROHIBITED
VXSS_SERVICE_TYPE = INTEGRITYANDCONFIDENTIALITY
EMMSERVER = nbujxq
HOST_CACHE_TTL = 3600
TELEMETRY_UPLOAD = YES
[oracle@racj2 netbackup]$ pwd
/usr/openv/netbackup
[oracle@racj2 netbackup]$ rman target /
Recovery Manager: Release 11.2.0.4.0 - Production on Thu Sep 17 01:02:02 2020
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
connected to target database: GDDB (DBID=3327026134)
RMAN> RUN {
allocate channel D1 type 'sbt_tape' parms 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
allocate channel D2 type 'sbt_tape' parms 'SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
send 'NB_ORA_SERV=nbujxq,NB_ORA_CLIENT=racj2';
restore archivelog from logseq 26274 until logseq 26340 thread 1;
RELEASE CHANNEL D1;
RELEASE CHANNEL D2;
}
提取上述归档花了40分钟。恢复后,ogg抽取进程可以继续了。
继续发现出现abend错误:
2020-09-17 01:24:03 WARNING OGG-01027 Detect inconsistency in pdata after 0 retries, expecting tb 2013265920, nf 64007, got tb -1145512490, nf 24390, bc 33620754, nb -1462937004, on record with seqno 35340, rba 7
09685800 for data starts at rba 656100585.
Source Context :
SourceModule : [er.redo.ora.rtc]
SourceID : [/scratch/jkrath/view_storage/jkrath_soncustvew/oggcore/OpenSys/src/app/er/redo/oracle/rtc.c]
SourceFunction : [RTC_producer]
SourceLine : [2207]
ThreadBacktrace : [6] elements
: [/ogg/12c/libgglog.so(CMessageContext::AddThreadContext()+0x1b) [0x7f7a4e5fc20b]]
: [/ogg/12c/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x134) [0x7f7a4e5f6d74]]
: [/ogg/12c/libgglog.so(_MSG_ERR_ER_GENERIC_FAILURE(CSourceContext*, char const*, CMessageFactory::MessageDisposition)+0x30) [0x7f7a4e5c446a]]
: [/ogg/12c/extract(RTC_producer+0x97b) [0x82b1fb]]
: [/lib64/libpthread.so.0() [0x3396e079d1]]
: [/lib64/libc.so.6(clone+0x6d) [0x3396ae8b6d]]
2020-09-17 01:24:12 ERROR OGG-01028 Detect inconsistency in pdata after 6 retries, expecting tb 2013265920, nf 64007, got tb -1145512490, nf 24390, bc 33620754, nb -1462937004, on record with seqno 35340, rba 7
09685800 for data starts at rba 656100585.
GGSCI (racj1) 66> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT ABENDED GDCQ 03:17:11 00:00:27
EXTRACT RUNNING T180 00:00:00 00:00:09
EXTRACT RUNNING T245 00:00:00 00:00:08
EXTRACT RUNNING T247 00:00:00 00:00:09
GGSCI (racj1) 67> view report gdcq
2020-09-17 01:24:12 ERROR OGG-01028 Detect inconsistency in pdata after 6 retries, expecting tb 2013265920, nf 64007, got tb -1145512490, nf 24390, bc 33620754, nb -1462937004, on record with seqno 35340, rba 7
09685800 for data starts at rba 656100585.
GGSCI (racj1) 68> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT ABENDED GDCQ 03:17:11 00:00:58
EXTRACT RUNNING T180 00:00:00 00:00:09
EXTRACT RUNNING T245 00:00:00 00:00:08
EXTRACT RUNNING T247 00:00:00 00:00:09
GGSCI (racj1) 69> info gdcq
EXTRACT GDCQ Last Started 2020-09-17 01:24 Status ABENDED
Checkpoint Lag 03:17:11 (updated 00:01:00 ago)
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:46 Thread 1, Seqno 26376, RBA 559309504
SCN 3956.660955621 (16991551578597)
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:52 Thread 2, Seqno 35340, RBA 638782168
SCN 0.0 (0)
继续尝试修改rba,无效:
GGSCI (racj1) 70> alter extract gdcq,tranlog,extseqno 35340,extrba 656100585,thread 2
EXTRACT altered.
GGSCI (racj1) 71> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED GDCQ 03:19:10 00:00:02
EXTRACT RUNNING T180 00:00:00 00:00:03
EXTRACT RUNNING T245 00:00:00 00:00:02
EXTRACT RUNNING T247 00:00:00 00:00:03
GGSCI (racj1) 72> start gdcq
Sending START request to MANAGER ...
EXTRACT GDCQ starting
GGSCI (racj1) 73> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING GDCQ 03:19:10 00:00:08
EXTRACT RUNNING T180 00:00:00 00:00:08
EXTRACT RUNNING T245 00:00:00 00:00:07
EXTRACT RUNNING T247 00:00:00 00:00:08
GGSCI (racj1) 74> info gdcq
EXTRACT GDCQ Last Started 2020-09-17 01:26 Status RUNNING
Checkpoint Lag 03:19:21 (updated 00:00:01 ago)
Process ID 114289
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:46 Thread 1, Seqno 26376, RBA 559309504
SCN 3956.660955621 (16991551578597)
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:52 Thread 2, Seqno 35340, RBA 656100585
SCN 0.0 (0)
GGSCI (racj1) 75> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING GDCQ 03:19:21 00:00:07
EXTRACT RUNNING T180 00:00:00 00:00:08
EXTRACT RUNNING T245 00:00:00 00:00:07
EXTRACT RUNNING T247 00:00:00 00:00:08
GGSCI (racj1) 76> info gdcq
EXTRACT GDCQ Last Started 2020-09-17 01:26 Status ABENDED
Checkpoint Lag 03:19:21 (updated 00:00:13 ago)
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:46 Thread 1, Seqno 26376, RBA 559309504
SCN 3956.660955621 (16991551578597)
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:52 Thread 2, Seqno 35340, RBA 656100585
SCN 0.0 (0)
GGSCI (racj1) 77> view report gdcq
2020-09-17 01:26:11 WARNING OGG-01027 Detect inconsistency in pdata after 0 retries, expecting tb 2013265920, nf 64007, got tb -1145512490, nf 24390, bc 33620754, nb -1462937004, on record with seqno 35340, rba 7
09685800 for data starts at rba 656100585.
Source Context :
SourceModule : [er.redo.ora.rtc]
SourceID : [/scratch/jkrath/view_storage/jkrath_soncustvew/oggcore/OpenSys/src/app/er/redo/oracle/rtc.c]
SourceFunction : [RTC_producer]
SourceLine : [2207]
ThreadBacktrace : [6] elements
: [/ogg/12c/libgglog.so(CMessageContext::AddThreadContext()+0x1b) [0x7fb7fd6f620b]]
: [/ogg/12c/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x134) [0x7fb7fd6f0d74]]
: [/ogg/12c/libgglog.so(_MSG_ERR_ER_GENERIC_FAILURE(CSourceContext*, char const*, CMessageFactory::MessageDisposition)+0x30) [0x7fb7fd6be46a]]
: [/ogg/12c/extract(RTC_producer+0x97b) [0x82b1fb]]
: [/lib64/libpthread.so.0() [0x3396e079d1]]
: [/lib64/libc.so.6(clone+0x6d) [0x3396ae8b6d]]
2020-09-17 01:26:20 ERROR OGG-01028 Detect inconsistency in pdata after 6 retries, expecting tb 2013265920, nf 64007, got tb -1145512490, nf 24390, bc 33620754, nb -1462937004, on record with seqno 35340, rba 7
09685800 for data starts at rba 656100585.
GGSCI (racj1) 80> alter extract gdcq,tranlog,extseqno 35340,extrba 709685800,thread 2
EXTRACT altered.
依然无效。
2020-09-17 01:27:37 ERROR OGG-01028 Detect inconsistency in pdata after 6 retries, expecting tb -1145512490, nf 24390, got tb 2187451, nf 19, bc -1462937004, nb 9733386, on record with seqno 35340, rba 90884808
8 for data starts at rba 709685800.
GGSCI (racj1) 88> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT ABENDED GDCQ 03:20:43 00:01:22
EXTRACT RUNNING T180 00:00:00 00:00:04
EXTRACT RUNNING T245 00:00:00 00:00:03
EXTRACT RUNNING T247 00:00:00 00:00:04
GGSCI (racj1) 89> info gdcq
EXTRACT GDCQ Last Started 2020-09-17 01:27 Status ABENDED
Checkpoint Lag 03:20:43 (updated 00:01:25 ago)
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:46 Thread 1, Seqno 26376, RBA 559309504
SCN 3956.660955621 (16991551578597)
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:06:52 Thread 2, Seqno 35340, RBA 709685800
SCN 0.0 (0)
3.6最终解决:
参考类似问题:http://blog.itpub.net/13885898/viewspace-1651350/
通过手工分别修改Extract的两个进程的TIMESTAMP才行,而不是修改rba.
GGSCI (racj1) 90> alter extract gdcq, thread 1, BEGIN 2020-09-16 22:06:46
EXTRACT altered.
GGSCI (racj1) 91> alter extract gdcq, thread 2, BEGIN 2020-09-16 22:06:46
EXTRACT altered.
GGSCI (racj1) 92> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT STOPPED GDCQ 03:23:41 00:00:03
EXTRACT RUNNING T180 00:00:00 00:00:08
EXTRACT RUNNING T245 00:00:00 00:00:07
EXTRACT RUNNING T247 00:00:00 00:00:08
GGSCI (racj1) 93> start gdcq
Sending START request to MANAGER ...
EXTRACT GDCQ starting
GGSCI (racj1) 94> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING GDCQ 03:23:41 00:00:08
EXTRACT RUNNING T180 00:00:00 00:00:03
EXTRACT RUNNING T245 00:00:00 00:00:02
EXTRACT RUNNING T247 00:00:00 00:00:03
GGSCI (racj1) 99> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING GDCQ 03:21:50 00:00:02
EXTRACT RUNNING T180 00:00:00 00:00:07
EXTRACT RUNNING T245 00:00:00 00:00:06
EXTRACT RUNNING T247 00:00:00 00:00:07
GGSCI (racj1) 100> info gdcq
EXTRACT GDCQ Last Started 2020-09-17 01:30 Status RUNNING
Checkpoint Lag 03:21:50 (updated 00:00:06 ago)
Process ID 122710
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:08:58 Thread 1, Seqno 26377, RBA 906540116
SCN 3956.661169975 (16991551792951)
Log Read Checkpoint Oracle Redo Logs
2020-09-16 22:08:50 Thread 2, Seqno 35339, RBA 402930484
SCN 3956.661167621 (16991551790597)
GGSCI (racj1) 112> info gdcq
EXTRACT GDCQ Last Started 2020-09-17 01:30 Status RUNNING
Checkpoint Lag 00:00:01 (updated 00:00:02 ago)
Process ID 122710
Log Read Checkpoint Oracle Redo Logs
2020-09-17 01:33:44 Thread 1, Seqno 26390, RBA 1137118736
SCN 3956.676788414 (16991567411390)
Log Read Checkpoint Oracle Redo Logs
2020-09-17 01:33:43 Thread 2, Seqno 35349, RBA 1160039424
SCN 3956.676788056 (16991567411032)
GGSCI (racj1) 113> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
EXTRACT RUNNING GDCQ 00:00:01 00:00:03
EXTRACT RUNNING T180 00:00:00 00:00:06
EXTRACT RUNNING T245 00:00:00 00:00:05
EXTRACT RUNNING T247 00:00:00 00:00:06
GGSCI (racj1) 114> info gdcq,showch
EXTRACT GDCQ Last Started 2020-09-17 01:30 Status RUNNING
Checkpoint Lag 00:00:02 (updated 00:00:02 ago)
Process ID 122710
Log Read Checkpoint Oracle Redo Logs
2020-09-17 01:34:22 Thread 1, Seqno 26390, RBA 1535739492
SCN 3956.676822001 (16991567444977)
Log Read Checkpoint Oracle Redo Logs
2020-09-17 01:34:23 Thread 2, Seqno 35349, RBA 1162095616
SCN 3956.676822412 (16991567445388)
Current Checkpoint Detail:
Read Checkpoint #1
Oracle Threaded Redo Log
Startup Checkpoint (starting position in the data source):
Thread #: 1
Sequence #: 26376
RBA: 557482496
Timestamp: 2020-09-16 22:06:46.000000
SCN: Not available
Redo File:
Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Thread #: 1
Sequence #: 26388
RBA: 734050320
Timestamp: 2020-09-17 00:33:25.000000
SCN: 3956.668546962 (16991559169938)
Redo File: Not Available
Current Checkpoint (position of last record read in the data source):
Thread #: 1
Sequence #: 26390
RBA: 1535739492
Timestamp: 2020-09-17 01:34:22.000000
SCN: 3956.676822001 (16991567444977)
Redo File: Not Available
Read Checkpoint #2
Oracle Threaded Redo Log
Startup Checkpoint (starting position in the data source):
Thread #: 2
Sequence #: 35339
RBA: 5629952
Timestamp: 2020-09-16 22:06:46.000000
SCN: Not available
Redo File:
Recovery Checkpoint (position of oldest unprocessed transaction in the data source):
Thread #: 2
Sequence #: 35348
RBA: 1830900240
Timestamp: 2020-09-17 01:00:37.000000
SCN: 3956.669812891 (16991560435867)
Redo File: Not Available
Current Checkpoint (position of last record read in the data source):
Thread #: 2
Sequence #: 35349
RBA: 1162095616
Timestamp: 2020-09-17 01:34:23.000000
SCN: 3956.676822412 (16991567445388)
Redo File: +DATADG/gddb/onlinelog/group_7.276.986638853
Write Checkpoint #1
GGS Log Trail
Current Checkpoint (current write position):
Sequence #: 14366
RBA: 9783655
Timestamp: 2020-09-17 01:34:25.885388
Extract Trail: ./dirdat/gd
Seqno Length: 6
Flip Seqno Length: No
Trail Type: EXTTRAIL
Header:
Version = 2
Record Source = A
Type = 11
# Input Checkpoints = 2
# Output Checkpoints = 1
File Information:
Block Size = 2048
Max Blocks = 100
Record Length = 4096
Current Offset = 0
Configuration:
Data Source = 3
Transaction Integrity = 1
Task Type = 0
Status:
Start Time = 2020-09-17 01:30:38
Last Update Time = 2020-09-17 01:34:25
Stop Status = A
Last Result = 0