oracle stuck archive,Master Note for Troubleshooting Streams capture 'WAITING For REDO' or INITIALIZ...

最新推荐文章于 2021-12-29 12:45:15 发布

Dark Nexus

最新推荐文章于 2021-12-29 12:45:15 发布

阅读量564

点赞数

文章标签： oracle stuck archive

[Release:

11.2 to 11.2]

Information in this document applies to any platform.

Oracle Server Enterprise Edition - Version: 10.1.0.3 to

10.1.0.5

Oracle Server Enterprise Edition - Version: 10.2.0.1 to

10.2.0.4

Oracle Server Enterprise Edition - Version: 11.1.0.6 to 11.1.0.7

Purpose

Streams capture process can exist in different states. The

different states of the capture process is explained in the

following note.

Note 471713.1 Different States of Capture &

Apply Process

This article provides troubleshooting steps when capture is

stuck in INITIALIZING/DICTIONARY INITIALIZATION, WAITING FOR

DICTIONARY REDO: FIRST SCN ;

WAITING FOR REDO: LAST SCN MINED

states.

Last Review

Date

March 5, 2012

Instructions for the Reader

Troubleshooting Guide is provided to assist in debugging a specific

issue. When possible, diagnostic tools are included in the document

to assist in troubleshooting.

Troubleshooting Details

Capture process does not advance

and appears stuck in one of the following states

select * from v$streams_capture shows :

1. INITALIZING / DICTIONARY INITIALIZATION (the state

alternates between these states), or

2. WAITING FOR DICTIONARY REDO: FIRST SCN , or

3. WAITING FOR REDO: LAST SCN MINED , or

4. WAITING FOR DICTIONARY REDO: FILE

The nature of the problem could be :

- a missing logfile; or

- a logfile is not registered; or

- a logfile is corrupted; or

- Capture is verifying/prechecking logfiles

The process will remain in this state until the log is located, it

is registered, the corruption is resolved in which case it will

also be necessary to reregister the log or the Capture process has

checked the logfiles on disk

CAUSE

Common issues which can cause Capture to stop in this state are

1.Incorrect use of RMAN backup command

or RMAN Issue

Only certain RMAN commands are Streams aware.

There are also a number of RMAN related issues where logfiles can

be deleted when Streams may still need to access these.

Refer to Note

421176.1 for more details.

2.Logs Stored in FRA - Flash Recovery

Area

It is recommended that the FRA should be avoided with

Streams.

Archive log files can be deleted and ORA-1291 reported when archive

logs are located in a Flash Recovery Area (FRA). More details can

be found with respect to FRA in Note 305648.1. This is a recognised behaviour and for this

reason it is recommended that archive logs not be located in a FRA

where a Capture process will mine from this location.

This is documented in the Oracle Streams Concepts and

Administration 10g Release 2 (10.2) guide under the section

Troublshooting Capture Problems subsection : Are Required Redo Log

Files Missing.

It should also be pointed out that archive logs may be deleted from

the FRA when the instance is shutdown and restarted. Those archive

logfiles deleted logs should be reported in the alert.log file.

3.Logs not registered

in dba_registered_archived_log

This issue is more likely at a downstream database. This could

be due to transport issues from upstream to downstream

database.

4.Logs marked as

corrupted in system.logmnr_log$

Manual intervention will likely be necessary to correct

this.

5.required_checkpoint_scn

issues

Logs may need to be manually deleted which can then affect

Capture restart if the required_checkpoint_scn (dba_capture view)

is not advancing. This is a key value for Capture as discussed

below.

6.Capture Prevalidation of Logs in

dba_registered_archived_log

The Capture process has to access all the logs in

dba_registered_archived_log before it can start mining changes.

Thus, in the case of message :WAITING FOR

DICTIONARY REDO: FILE, it may be that logs are being accessed but

the state of the capture process has not been updated to indicate

that it is prevalidating logs.

7.Logs deleted in error or damaged due

to OS/Storage issues

The following is useful background which should be used to help

understand what logs are relevant.

In a non RAC environment, all logs from the logfile containing the

required_checkpoint_scn have to be present on disk.

In a RAC environment, all logs from all threads have to be present

with respect to the required_checkpoint_scn; this is also

applicable for downstream Capture environments where the primary /

upstream database is RAC related but the downstream database may be

single instance.

These logfiles are also added to the dba_registered_archived_log

view.

Before Capture starts mining changes , all the logs in the

dba_registered_archived_log have to be located. On downstream

Capture environments, this can present confusion as the Capture

process may appear to be stuck / not moving (due to large numbers

of logs in the view) whilst in fact it is actually accessing log

files. This is more of an issue for Downstream Capture environments

where a Capture process may be down for some time but logs are

still being registered by the RFS processes associated with the

primary database.

The logfiles that a Capture process requires on restart as

specified by the required_checkpoint_scn can be located from the

Streams Healthcheck report output - Note 273674.1.

Refer to the section headed :

'++ Minimum Archive Log Necessary to Restart Capture

++'

Note: the same information can also be obtain

from

8.Capture can stuck due to missing

archive logs

Logs are registered in dba_registered_archived_log.

Logs does not exist in the archive location or damaged.

SOLUTION

In order to determine the problem

logfile(s) and understand what is going on please

provide to Oracle Suport Services the Streams Healthcheck output

(Note

273674.1)) and the alert.log file(s) from Upstream (if

relevant) and downstream databases.

Note: please provide details from all nodes.

Additionally, please collect the following:

1.SCN reported in the

Capture state related message - determine logs the

SCN relates to

connect / as sysdba

set pagesize 1000

col first_scn format 999999999999999999

col next_scn format 999999999999999999

alter session set nls_date_format='dd-mon-yyyy hh24:mi:ss';

select

source_database,thread#,sequence#,name,modified_time,first_scn,next_scn,dictionary_begin,dictionary_end

from dba_registered_archived_log where

reported> between first_scn and

next_scn;

Note : If the reported is the corresponds to

the next_scn SCN value for a thread then it is likely the problem

is with the next log in that thread; it may be missing or not

registered.

2.Check Capture

tracefile to understand if mising

logfile detailed

If no trace file is present,

identify the Capture and Reader processes from the following :

connect / as sysdba

set pagesize 1000

-- Capture

define capture_name=''

select p.spid Spid, 'C00'||c.capture#||' '||upper(lp.role) "Capture

Process ", c.capture_name "Capture Name", p.pga_used_mem "PGA

Memory Used", p.pga_alloc_mem "PGA Memory Allocated", p.pga_max_mem

"PGA Maximum Memory"

from v$streams_capture c, v$logmnr_process lp, v$session s,

v$process p

where c.logminer_id = lp.session_id

and lp.role in ('reader','preparer','builder')

and lp.sid = s.sid

and lp.serial# = s.serial#

and s.paddr = p.addr

and c.capture_name = '&capture_name'

union

select p.spid, 'C00'||c.capture#||' Coordinator', c.capture_name,

p.pga_used_mem, p.pga_alloc_mem, p.pga_max_mem

from v$streams_capture c, v$session s, v$process p

where c.sid = s.sid

and c.serial# = s.serial#

and s.paddr = p.addr

and c.capture_name = '&capture_name'

order by 6,5;

Use the SPID (OS Process or thread id) for both the Capture process

and Reader process as follows :

Note: it may be that a reader process does not

exist at the point a particular state is being reported.

connect / as sysdba

-- Capture BOTH Process and Reader Process :

oradebug setospid

-- event 1349 / level 32768 decimal

oradebug event 1349 trace name context forever, level 32768

oradebug setospid -- event 1349 / level 32768 decimal oradebug event 1349 trace name context forever, level 32768 -- wait 5 minutes, disable trace :

oradebug event 1349 trace name context off

oradebug setospid oradebug event 1349 trace name context off

Check the background_dump_dest location for trace files containing

the process ids (spids) from the above. The trace

files should identify the thread / sequence number and the

problematic logfile which Capture is looking for.

Note: ensure that you have the latest

patchset installed otherwise relevant trace information may not be

present.

If it is still not obvious what the file being accessed is specify

more complete event trace :

oradebug event 1349 trace name

context forever, level 26148863

-- wait 5 minutes, disable trace :

oradebug event 1349 trace name context off

For example, in a Capture trace file you may see something like the

following :

krvxalfs: Error 308 raised while opening log

/bugmnt21/em/celclnx8/tar20093554.6/app/oracle/product/10.2.0.3/dbs/arch1_35_677241985.dbf.

i.e ora-308 indicates : missing logfile

3.Determine if there are logfiles

unregistered

If there is a gap in the registered logfiles, the

Capture process will stop.

Specify the name of your capture process below as well as thread

number, i.e repeat for each thread used.

Note : The plsql block will need be be run

more than once for RAC configurations.

connect / as sysdba

set serveroutput on

declare

-- amend the following as required.

user_thread number := 1; -- change as necessary

user_capture varchar2(100) := '';

--change as necessary

rcscn number;

minseq number;

maxseq number;

rseq number;

cnt number;

cursor mlsns is select user_thread thread, sequence# from (

select (lvl + minseq) sequence# from (select * from (select level

lvl from dual connect by level <= maxseq -

minseq))

minus

select sequence# from dba_registered_archived_log where

thread#=user_thread)

order by 1;

begin

select required_checkpoint_scn into rcscn from dba_capture where

capture_name = user_capture;

select min(sequence#) into rseq from dba_registered_archived_log

where thread# = user_thread and consumer_name = user_capture and

rcscn between first_scn and next_scn;

select min(sequence#) into minseq from dba_registered_archived_log

where thread# = user_thread and consumer_name = user_capture and

sequence# >= rseq;

select max(sequence#) into maxseq from dba_registered_archived_log

where thread# = user_thread and consumer_name = user_capture;

select count(*) into cnt from (

select (lvl + minseq) sequence# from (select * from (select level

lvl from dual connect by level <= maxseq -

minseq))

minus

select sequence# from dba_registered_archived_log where

thread#=user_thread) ;

if (cnt is not null) then

for rec in mlsns loop

dbms_output.put_line('Log not registered ' ||rec.sequence#);

end loop;

else

dbms_output.put_line('There are no gaps in registered

logs.');

end if;

end;

For any logfile which are not registered, ensure that these are on

disk and register these using :

alter database register or replace

logical logfile '&fqfn' for

'&capture_name';

4.Determine if there

are logs marked as corrupt

A corrupt archive log file is indicated by the contents field of

system.logmnr_log$ having value 16.

connect / as sysdba

set pagesize 1000

select * from system.logmnr_log$ where contents = 16 order by

sequence#;

If any logs as marked as corrupted , proceed as follows :

- dump out the logfile; the dump pf the logfile will appear in the

user_dump_dest location.

connect / as sysdba

oradebug setmypid

oradebug unlimit

alter session set events '1354 trace name context forever, level

32768';

alter session set events '1348 trace name context forever, level

1032';

alter system dump logfile '&fqfn';

It may be that logfile will need to be restored or copied once

again from the primary.

One the above has been done the logfile can be reregistered as

follows :

- take a backup/copy of system.logmnr_log$ :

connect / as sysdba

create table system.logmnr_log$_bak as select * from

system.logmnr_log$;

- delete the entry in system.logmnr_log$ which is marked as

corrupted :

delete from system.logmnr_log$

where contents = 16 and thread# = and sequence# = ;

commit;

- reregister the logfile once again as detailed

above.

5.RMAN - Recovery Manager can be used to

identify archivelog files which may be missing

Note: this can only be used on upstream

environments - not downstream databases. This is due to the fact

that no controlfile entry is created for a registered logfile in

the downstream database.

For Oracle 10.2, logfiles being mined by a Capture process are

also logged to the alert.log. This information together with the

Capture restart information from the Streams Healthcheck as well as

logfile validation information reported by RMAN can be useful in

identifying those logfile(s) that are missing.

The RMAN command 'crosscheck archivelog' command can check the

physical presence of an archive log file.

Note: you do not have to be familiar with RMAN

for backup and restore or have any experience of RMAN to use the

commands detailed below.

The approach when using RMAN is as follows :

- First of all determine where a Capture process will start from it

were restarted. This would be a convenient point to specify where

to check the existence of logs from.

- Record the restart SCN , the Thread number and Log Sequence

number.

Note: The Thread number and the Log Sequence

number should typically be visible in the logfile name. This may

not be the case however and will depends on the setting of the

parameter : log_archive_format.

Use either the restart Capture SCN (YYYYYYYYY) or the Log Sequence

(ZZZZ) number from which to start the crosscheck validation. The

thread number can also be specified on some commands (X); in order

to understand if RAC or multiple threads are present issue :

select * from

gv$instance;

Having performed the above, start RMAN

Note :

1. If you specify a logfile you will receive no RMAN command output

to the screen. Therefore, whilst getting to grips with RMAN you

would be advised not to specify a logfile. Please also use set echo

on so that reevant commands can be seen in log file.

2. for problems relating to the running of RMAN , refer to the

Backup and Recovery Advanced User's Guide.

-- start rman without specifying a

logfile.

rman target /

-- start rman and specify a logfile to record command output

rman target / log /tmp/crosscheck.log

Useful commands to allow logs on disk to be crosschecked

on disk are as follows :

-- ensure command used is echoed to log file.

set echo on

- crosscheck archivelog all;

(crosscheck all logs on disk)

- crosscheck archivelog from sequence ZZZZ thread X;

(* crosscheck all the logs on disk from the sequence number)

- crosscheck archivelog from scn = YYYYYYYYYY;

(* crosscheck all the logs on disk from the scn)

- crosscheck archivelog like '%%';

(crosscheck all the logs on disk from a particular location)

- crosscheck archivelog like '%%';

(crosscheck that a specific logfile exists)

* above allow the starting point of a the crosscheck command to

be specified. Therefore , these should be used together with

restart information detailed in the Streams healthcheck.

Consider the following example where the Streams Healthcheck

reports that the capture process will restart from :

Capture will restart from SCN

2381906 in the following file:

/la2/1_228_657120968.dbf

(date/time)

/la1/1_227_657120968.dbf

(date/time)

The following commands can be used in order to determine whether

logs

are available on disk or otherwise.

rman target /

set echo on

- crosscheck archivelog all;

- crosscheck archivelog from sequence 227 thread 1;

- crosscheck archivelog from scn = 2381906;

- crosscheck archivelog like '%la1%';

- crosscheck archivelog like '%1_227_657120968.dbf%';

Checking RMAN output

If a logfile is not found, RMAN will report : 'validation failed

for archived log' error.

Note: This is really only significant if a

log is missing after the required_checkpoint_scn where a Capture

process may restart from. If a logfile is found, RMAN will report

'validation succeeded for archived log'

....

archive log filename=

/arch1_127_657120968.dbf recid=127 stamp=673799334

validation failed for archived log

archive log filename=

/arch1_128_657120968.dbf recid=128 stamp=673806193

validation succeeded for archived log

...

To exit RMAN, type :

exit

Search through the logfile for 'failed' keyword. This will

hopefully identify the problematic logfile.

i.e (on Unix) grep -i failed /tmp/crosscheck.log

If all logfiles are validated and RMAN performed the crosscheck

validation from the location that Capture would restart from , then

the logs missing may likely come after the last logs reported by

RMAN. Further inspection may be required.

6.Restore the archive

logs

Get all the archive logs required for the capture process to get

restarted :

A) When MISSING archive logs CAN be

restored.

Refer : Doc ID 290143.1 "Minimum Archived Log Necessary to Restart 10g and 11g

Streams Capture Process"

The output of the above query will give you

the name of the archive logs which are required for the capture

process.

Make sure all the archives are available to

the capture process from this archive sequence number.

Restore the archive logs from this sequence and register them for

the capture process to progress.

SQL>alter database register or replace logical

logfile '&fqfn' for

'&capture_name';

B) When MISSING archive logs CANNOT be

restored.

In case of restoration of the missing archive logs is not possible

due to unavailability of the backup of the old archive logs, Then

we need to recreate the capture process.

NOTE: There is no way that the capture process can skip these

missing archive logs and jump to the next available archive log.

Streams gets stuck when the archive logs are not available.

Refer Doc ID 471695.1 to recreate the capture process.

Still have questions ?

To discuss this information further with

Oracle experts and industry peers, we encourage you to review, join

or start a discussion via My Oracle Support Streams and Distributed Database Community

Enjoy a short Video about Oracle´s Support Communities - to quickly

understand it´s benefits for you right now (http://bcove.me/tlygjitz)

References

BUG:5770059

- REQUIRED_CHECKPOINT_SCN NOT MOVING FORWARD

BUG:6154377 - NO LOGMNR CHECKPOINTS ARE BEING EXECUTED

NOTE:290143.1 - Minimum Archived Log Necessary to Restart 10g

and 11g Streams Capture Process

NOTE:471695.1 - Required Steps to Recreate a Capture

Process

oracle stuck archive,Master Note for Troubleshooting Streams capture 'WAITING For REDO' or INITIALIZ...

“相关推荐”对你有帮助么？