Considerations For Choosing To Use A Datapump [ID 966106.1]

cuanmo8775

于 2011-06-24 18:17:04 发布

阅读量85

点赞数

Considerations For Choosing To Use A Datapump [ID 966106.1]

		修改时间 24-MAY-2011 类型 HOWTO 状态 PUBLISHED

In this Document
Solution

Applies to:

Oracle GoldenGate - Version: 4.0.0 and later [Release: 4.0.0 and later ]
Information in this document applies to any platform.

Solution

Issue:
Choosing Whether to Use Datapumps or Not

Solution Summary:
GoldenGate offers two types (among others) of Extracts to move data extracted from TMF to a node. These are generally referred to as a TMF Extract and a Datapump or a Datapump Extract (hereafter Datapump).

Solution Details:
The difference is that a TMF Extract (hereafter Extract) reads data via Audserv from TMF trails and writes a GoldenGate trail whereas a Datapump reads any trail created by a GoldenGate process and writes a copy of that trail elsewhere. Both of these processes are capable of writing trails to either the local or a remote node. The question becomes, "Which is better, to use the TMF Extract to write to a remote node or to write locally with the Extract, and let a Datapump read that trail and copy it to the remote node?" Although there are secondary issues of operational overhead and performance, the question boils down to a balance of TCP stability and storage space.

The TMF Extract is a necessary part of an audited file/table configuration and cannot be eliminated. It will capture and write to its trails an amount of data up to about 50% of the TMF trail size. Typically, the amount of data is much less. Thus, If TMF is writing 1 gig of data an hour to its trails, it can be expected that Extract will write somewhat less that .5 gig per hour in its trails. This amount of space is not in itself a large burden because the trails can be configured to be purged as they are replicated. If replication is keeping up, then the amount of disk space in use for GoldenGate is a limited subset of the .5 gig. This is generally not an issue whether the data is stored on a local node or a remote node.

The Extract can write the data in trails locally or to the remote node directly. If it writes locally, a Datapump must be used to read those trails and copy them to the remote node. Here again, if replication is keeping up, the burden on disks is small, no matter what their location. The problem, and the question to be considered for configuration, is whether replication (capture/transport/deliver) continues end to end. Replication volume at the target end is configurable and keeping up from a process standpoint should not be an issue. The problem is the interruption of replication at the network level.

If there is a network issue, this stops the transport of data from the source node to the target node. If the Extract is configured to write to a remote trail across the network, and connectivity is lost, the Extract abends. It cannot be restarted until connectivity is restored. If the network loss is extended, TMF trails may roll off or become permanently unavailable. At that point, trails must be restored and processed or an initial load of all affected files must be done. Both procedures are time consuming and operationally intensive. No data is lost if the TMF trails are recoverable or if the underlying source tables have not been damaged. Operationally, it may be less trouble to do an initial load to recover rather than to restore and process TMF trails. If the network loss is brief, the Extract can be restarted and there is little impact except Audserv uses more CPU cycles until it catches up in the TMF trails.

If, on the other hand, the Extract writes trails to its local node disk, a network interruption does not affect the Extract. It does stop delivery of data to the target node by the Datapump, but the Extract continues to run. This requires two things to work. First, a Datapump must be configured to copy the local trails to the remote node. This process will die if the network is lost, but the Extract continues uninterrupted, which is a more important issue. When the network is restored, the Datapump can be restarted and replication resumes on the target. The Datapump is a very efficient process, very fast and consuming few resources.

Using a Datapump does require that there be a local store of data on disk sufficient for the Extract to write to until the network comes up, the Datapump can start copying the data to remote trails and purging the local trails. How big a store of data? Four days worth is recommended if the disk is available. That way, if Operations comes back after a three day weekend and discovers a problem, there is still time to correct the problem before data space runs out. If there is minimal space, then written trails may be PAKed and stored elsewhere for processing later, extending the useful amount of data space. Again, at some point it becomes easier to reload the target tables with an initial load rather than to process the saved trails. Customers have generally determined this tipping point to be somewhere between three and seven days worth of data.

Summary.

The Extract is required; the Datapump is not. If there is no local storage on the source node then the Extract may write its trails directly to the remote node. When there are network outages, this may cause severe delays and operational problems. If there is local storage space available, it is preferred that the Extract write the data to local store and that a Datapump be used to copy the local trails to the remote node. This does cause storage of data on two nodes in the trails although in normal usage this should not be burdensome. The target trails space may be much smaller than the local store space. In anticipation of possible network outages, it is recommended that the local store be sized at up to four days worth of GoldenGate trails.

Example configurations.
Consider a source NSK node called Source and a target node running UNIX called TargetNode. The target name is necessary only if it is DNS resolvable to the target's IP address. Consider the target to have an IP address of 192.168.100.100 and that there is a GoldenGate manager running on port 7809.
On Source, TMF generates 100 Megabytes of TMF trail data per hour. The disk for TargetNode resides at /home/ggs/dirdat.

The manager parameter file /home/ggs/dirprm/mgr.prm on the target has the entry:

PORT 7809

If we have an Extract with no Datapump, we configure to write data directly to the target. Here is what an Extract named XTMF looks like.

Its parameter file resides on $data.ggsparm.
The Extract's edit type parameter file is named XTMF, just like the Extract name. The parameters are:

Extract XTMF
rmthost TargetNode, mgrport 7809 << the target DNS and manager port
or
rmthost 192.168.100.100, mgrport 7809 << the explict IP address
-- configure the arbitrarily named trail rt on the target with one hour's data per file in the trail
rmttrail /home/ggs/dirdat/rt, megabytes 50
Table $data.datavol.sometabl;

In ggsci, we add the extract checkpoints and the trail:
ggsci> add extract XTMF, begin now, CPU 0, pri 150
ggsci> add rmttrail /home/ggs/dirdat/rt, extract XTMF
ggsci> start extract XTMF

Now, if we wanted to use a Datapump, we first configure the TMF Extract differently, add a Datapump, and two trails. We size the local trails larger, to accomodate 4 days worth of data.

Extract XTMF
exttrail $data04.dirdatsv.lt, megabytes 100, maxfiles 48
Table $data.datavol.sometabl;

In ggsci, we add the extract checkpoints and the trail:

ggsci> add extract XTMF, begin now, CPU 0, pri 150
ggsci> add exttrail $data04.dirdatsv.lt, extract XTMF
ggsci> start extract XTMF

This begins storing the data in the local store trails $data04.dirdatsv.lt*
A Datapump must now be configured to copy the data to the target.

Extract XDP
rmthost TargetNode, mgrport 7809 << the target DNS and manager port

or

rmthost 192.168.100.100, mgrport 7809 << the explict IP address

-- configure the arbitrarily named trail rt on the target with one hour's data

rmttrail /home/ggs/dirdat/rt, megabytes 50
Table $*.*.*; (the only file $data.datavol.sometabl in the original extract)

We must now add the Datapump, associate it with the XTMF trails, and create the remote trail

ggsci> add extract XDP, exttrailsource $data.dirdatsv.lt, CPU 1, pri 150
ggsci> add rmttrail /home/ggs/dirdat/rt, extract XDP
ggsci> start extract XDP

Ideally, the purging of the trails on the source for the XTMF are handled by the manager. The manager parameters on Source would then include:

PURGEOLDEXTRACTS $data.dirdatsv.lt, usecheckpoints

Additionally, the purging of processed trails on TargetNode should also be done by the manager which would require the parameter:

PURGEOLDEXTRACTS /home/ggs/dirdat/rt, usecheckpoints

***Checked for relevance on 24-May-2011***

显示相关信息 相关的