impdp 的顺序 以及250MB的限制

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Information in this document applies to any platform.

GOAL

How does Datapump import the indexes? Is there an optimal way to import the index data?
The goal being: to minimize the time the database is unavailable.


1) What order are the indexes build in?
2) Is there a way to parallelize the index creation?

SOLUTION


1)     What order are the indexes build in:

        Normally index creation in a schema level import will follow this order:

            1.    Metadata import (user, roles & system privileges)
            2.    Objects like type, sequences and related grants
            3.    Tables, table data, table grants
            4.    Indexes

        Data Pump Import processes the database objects in the following order if you use the PARALLEL parameter:

            1. The first worker begins to load all the metadata: the tablespaces, schemas, etc., until all the tables are created.
            2. Once the tables are created, the first worker starts loading data instead of metadata and the rest of the workers start loading data too.
            3. Once the table data is loaded, the first worker returns to loading metadata again. The rest of the workers are idle until the first worker
            loads all the metadata up to package bodies.
            4. Multiple workers load package bodies in parallel.
            5. One worker loads metadata up to and including secondary tables.
            6. Multiple workers load secondary table data.
            7. One worker loads the remaining metadata.

        Here One worker creates all the indexes but uses PX processes up to the PARALLEL value so indexes get created faster.

2)     Is there a way to parallelize the index creation:

        There is one known Bug 8604502 that exists for lower releases (<11.2.0.2); so if you are at this level, please ensure that this is fixed in your environment (Refer to <Doc ID 1081069.1><Doc ID 8604502.8>)
        To achieve the optimal method, you may need to test the following action plans and adopt the best one appropriate/ suitable for your environment:

        Action plan 1:

        - Import everything with the PARALLEL parameter


        Action plan 2:

        - Pre-create users, roles, privileges
        - Generate DDL for objects by using the SQLFILE option and modify it by excluding indexes (index DDL can be in a different SQL file)  这怎么改脚本呢,一个一个注释掉create index?
        - Execute the above script so as to create all the objects in the target database/ schema. Cross-check it.
        - Import with CONTENT=DATA_ONLY, TABLE_EXISTS_ACTION and PARALLEL parameter (note that if CONTENT=DATA_ONLY is specified, the default is APPEND, not SKIP)
        - Create indexes parallel
        - Run statistics collection


        Action plan 3:

        - Create index DDL using SQLFILE option
        - Import by excluding index, statistics
        - Create indexes parallel
        - Run statistics collection

import excluding index 后再import including index呢?

export table 还有250MB的 限制!!

对于Oracle 数据泵expdp,impdp是一种逻辑导出导入迁移数据的一个工具,是服务端的工具,常见于DBA人员使用,用于数据迁移。从A库迁移至B库,或者从A用户迁移至B用户等。

那么有个疑问?

在Oracle 11.2.0.4的版本,对一个表,使用并行参数,是否真实的起用了并行?假设并行为2,是否真的分2个进程,一个进程负责导出一半的数据???

1.测试导出两个不同的表,使用并行2

复制代码

$ expdp scott/tiger directory=dump dumpfile=D%U.dmp parallel=2 cluster=n tables=dept,emp
Total estimation using BLOCKS method: 128 KB
. . exported "SCOTT"."DEPT"                              5.929 KB       4 rows
. . exported "SCOTT"."EMP"                               8.562 KB      14 rows
Dump file set for SCOTT.SYS_EXPORT_TABLE_01 is:
  /home/oracle/D01.dmp
  /home/oracle/D02.dmp
  
这两个文件,一个大一个小,根本不是大小相等的文件格式。 并且dump文件是二进制文件,无法观察里面具体存放什么信息,因此直接读取dump文件观察这条路不通。

复制代码

2.阅读MOS文档

复制代码

Parallel Capabilities of Oracle Data Pump (Doc ID 365459.1)    
  
For every export operation, Data Pump estimates how much disk space each table data object in the export job will consume (in bytes).
 This is done whether or not the user uses the ESTIMATE parameter. The estimate is printed in the log file and displayed on the client's 
standard output device. The estimate is for table row data only; it does not include metadata. This estimate is used to determine how many
 PX processes will be applied to the table data object, if any.
  对于每个导出的表来说,会估算导出大小,只包含表的行记录对应预估的大小。并且以此评估真正使用并行时,需要使用多少个进程?
  
The columns of the tables are examined to determine if direct path, external tables, or both methods can be used. For direct path,
 the parallel number for the table data object is always one since direct path does not support parallel unload of a table data object. 
PX processes are only used with external tables. 
  Oracle导出两种方式,直接路径读 or 外部表,直接路径读并行一直1,外部表才允许并行导出。
  
If the external tables method is chosen, Data Pump will determine the maximum number of PX processes that can work on a table data object.
 It does this by dividing the estimated size of the table data object by 250 MB and rounding the result down. If the result is zero or one, 
then PX processes are not used to unload the table. 
如果选择了外部表方法,则数据泵将确定可在表数据对象上运行的PX进程的最大数量。它通过将表数据对象的估计大小除以250 MB并将结果四舍五入来实现。如果结果为零或一,则不使用PX进程卸载表。

If a job is not big enough to make use of the maximum parallel number, then the user will not see the maximum number of active workers and
 Parallel Execution Processes. For example, if there is one 800 MB table, and it has been determined that external tables will be used,
 there will be one worker for the metadata, one worker for the data, and three PX processes. As mentioned above, the worker process for
 the data acts as the coordinator for the PX processes and does not count toward the parallel total. So, if a user specifies PARALLEL = 10,
 the degree of parallelism is actually four. The user will only see one active worker in the STATUS display. Data Pump is working optimally;
 the job is too small for the specified degree of parallelism. 

这段话不太理解,有点懵,后续测试下。

复制代码

3.模拟一个800M的非分区表,并行使用10个并行导出,结果如何?

疑问? 表导出,是根据表统计信息估算大小?  还是根据DBA_SEGMENTS 估算?

复制代码

SQL> select sum(bytes)/1024/1024 from dba_segments where owner='SCOTT' and segment_name='A';
SUM(BYTES)/1024/1024
--------------------
                 824

SQL> select owner,table_name,NUM_ROWS,BLOCKS*8/1024,SAMPLE_SIZE from dba_tables where owner='SCOTT' and table_name='A';

OWNER TABLE_NAME NUM_ROWS BLOCKS*8/1024 SAMPLE_SIZE

---------- ------------------------------ ---------- ------------- -----------
SCOTT A

复制代码

导出

复制代码

$ expdp scott/tiger directory=dump dumpfile=D%U.dmp parallel=10 tables=a
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 824 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . exported "SCOTT"."A"                                 708.3 MB 7315680 rows
******************************************************
Dump file set for SCOTT.SYS_EXPORT_TABLE_01 is:
  /u01/dump/D01.dmp
  /u01/dump/D02.dmp
  /u01/dump/D03.dmp
Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug 5 06:03:58 2020 elapsed 0 00:00:11
$ ls -lrt D*.dmp
-rw------- 1 oracle oinstall 245936128 Aug  5 06:03 D03.dmp
-rw------- 1 oracle oinstall 248098816 Aug  5 06:03 D02.dmp
-rw------- 1 oracle oinstall 248860672 Aug  5 06:03 D01.dmp


$expdp \'/ as sysdba\' attach=SYS_EXPORT_TABLE_01 
Job: SYS_EXPORT_TABLE_01
  Owner: SCOTT                          
  Operation: EXPORT                         
  Creator Privs: TRUE                           
  GUID: AC156DF4AC940F1DE053453CA8C0F1FA
  Start Time: Wednesday, 05 August, 2020 6:03:48
  Mode: TABLE                          
  Instance: tt
  Max Parallelism: 10
  EXPORT Job Parameters:
  Parameter Name      Parameter Value:
     CLIENT_COMMAND        scott/******** directory=dump dumpfile=D%U.dmp parallel=10 tables=a 
  State: EXECUTING                      
  Bytes Processed: 0
  Current Parallelism: 10
  Job Error Count: 0
  Dump File: /u01/dump/D01.dmp
    bytes written: 4,096
  Dump File: /u01/dump/D%u.dmp
  
Worker 1 Status:
  Process Name: DW00
  State: EXECUTING                      
  Object Schema: SCOTT
  Object Name: A
  Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
  Completed Objects: 1
  Total Objects: 1
  Worker Parallelism: 1
  
Worker 2 Status:
  Process Name: DW01
  State: EXECUTING                      
  Object Schema: SCOTT
  Object Name: A
  Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
  Completed Objects: 1
  Total Objects: 1
  Worker Parallelism: 3

自问自答:数据泵这个值说是预估,实际上是直接使用DBA_SEGMENTS里面的数据。
Total estimation using BLOCKS method: 824 MB

并且此时在结合MOS最后一段话的理解,并行参数使用10,但是实际上相当于使用了两个并行主进程,一个进程负责导出元数据,导出完成元数据,在帮忙导出一点数据;
另一个进程主要负责导输出,根据算法,出现3个辅助进程,加快导出!  824/250=3 四舍五入。
这也侧面验证了对于非分区表,实际上并行导出,并不会多出几个进程并行。
疑问又来了???  那么对于单表的导出,使用并行参数是否真的快?  还是慢?
并行10 ,11秒执行。
$ expdp scott/tiger directory=dump dumpfile=D1%U.dmp tables=a

******************************************************************************
Dump file set for SCOTT.SYS_EXPORT_TABLE_01 is:
/u01/dump/D101.dmp
Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug 5 06:12:40 2020 elapsed 0 00:00:15

15s的时间,与11s相比,差距并不大。   因此对于单表来说,其实并行并不会真正意义上明细加快速度。

复制代码

4.模拟分区表,800M数据,使用并行参数导出时间如何?

复制代码

create table RANGE_PART_TAB(id number,
    deal_date date, area_code number, contents varchar2(4000))
 partition by range(deal_date)
 (
 partition p1 values less  than(to_date('2020-02-01','yyyy-mm-dd')),
 partition p2 values less  than(to_date('2020-03-01','yyyy-mm-dd')),
 partition p3 values less  than(to_date('2020-04-01','yyyy-mm-dd')),
 partition p4 values less  than(to_date('2020-05-01','yyyy-mm-dd')),
 partition p5 values less  than(to_date('2020-06-01','yyyy-mm-dd')),
 partition p6 values less  than(to_date('2020-07-01','yyyy-mm-dd')),
 partition p7 values less  than(to_date('2020-08-01','yyyy-mm-dd')),
 partition p8 values less  than(to_date('2020-09-01','yyyy-mm-dd')),
 partition p9 values less  than(to_date('2020-10-01','yyyy-mm-dd')),
 partition p10 values less than(to_date('2020-11-01','yyyy-mm-dd')));
 
 insert into range_part_tab (id,deal_date,area_code,contents)
 select rownum,
          to_date(to_char(to_date('20200101','yyyymmdd'),'J')+ trunc(dbms_random.value(0,300)),'J'),
ceil(dbms_random.value(590,599)),
 rpad('*',400,'*')
 from dual
 connect by rownum <= 100000;
SQL> r   多次自插入数据
  1* insert into range_part_tab select * from range_part_tab
800000 rows created.
SQL> commit;
 SQL> select sum(bytes)/1024/1024 from dba_segments where owner='SCOTT' and segment_name='RANGE_PART_TAB';
SUM(BYTES)/1024/1024
--------------------
                 792
SQL>  select PARTITION_NAME,sum(bytes)/1024/1024 from dba_segments where owner='SCOTT' and segment_name='RANGE_PART_TAB' group by
 PARTITION_NAME order by 1;

PARTITION_NAME                 SUM(BYTES)/1024/1024
------------------------------ --------------------
P1                                               80
P10                                              72
P2                                               80
P3                                               80
P4                                               80
P5                                               80
P6                                               80
P7                                               80
P8                                               80
P9                                               80
10 rows selected.
[oracle@test dump]$ expdp scott/tiger directory=dump dumpfile=D2%U.dmp tables=RANGE_PART_TAB parallel=10


  
Worker 1 Status:
  Process Name: DW00
  State: EXECUTING                      
  Object Schema: SCOTT
  Object Name: RANGE_PART_TAB
  Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
  Completed Objects: 10
  Total Objects: 10
  Completed Rows: 138,592
  Completed Bytes: 58,754,176
  Percent Done: 77
  Worker Parallelism: 1
  
Worker 2 Status:
  Process Name: DW01
  State: WORK WAITING                   
******************************************************************************
Dump file set for SCOTT.SYS_EXPORT_TABLE_01 is:
  /u01/dump/D201.dmp
  /u01/dump/D202.dmp
  /u01/dump/D203.dmp
  /u01/dump/D204.dmp
  /u01/dump/D205.dmp
  /u01/dump/D206.dmp
  /u01/dump/D207.dmp
  /u01/dump/D208.dmp
  /u01/dump/D209.dmp
Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug 5 06:26:04 2020 elapsed 0 00:00:22

[oracle@test dump]$ expdp scott/tiger directory=dump dumpfile=D3%U.dmp tables=RANGE_PART_TAB 
Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug 5 06:28:14 2020 elapsed 0 00:00:20 !!!

测试了一个寂寞,根本不是想象中的十个不同的主进程,每个人负责一个分区,而还是一个主进程,串行化导出每个分区!  很傻很天真。

???  难道是对于每个Segments 大小 250M的限制???

对测试分区表扩大几倍。

666 果然是250MB的阈值,当单个分区SEGMENTS大于250M,才真正的开始了并行的作用,导出存在10个主进程导出。


SQL> select PARTITION_NAME,sum(bytes)/1024/1024 from dba_segments where owner='SCOTT' and segment_name='RANGE_PART_TAB' group by PARTITION_NAME;
PARTITION_NAME SUM(BYTES)/1024/1024
------------------------------ --------------------
P7 309
P4 304
P1 312
P2 288
P3 304
P6 296
P10 264
P9 300
P8 312
P5 312

10 rows selected.
[oracle@test dump]$ expdp scott/tiger directory=dump dumpfile=D4%U.dmp tables=RANGE_PART_TAB parallel=10
Worker 1 Status:
Process Name: DW00
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1

Worker 2 Status:
Process Name: DW01
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1

Worker 3 Status:
Process Name: DW02
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1

Worker 4 Status:
Process Name: DW03
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1

Worker 5 Status:
Process Name: DW04
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1

Worker 6 Status:
Process Name: DW05
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1

Worker 7 Status:
Process Name: DW06
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1

Worker 8 Status:
Process Name: DW07
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1

Worker 9 Status:
Process Name: DW08
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1

Worker 10 Status:
Process Name: DW09
State: EXECUTING
Object Schema: SCOTT
Object Name: RANGE_PART_TAB
Object Type: TABLE_EXPORT/TABLE/TABLE_DATA
Completed Objects: 1
Total Objects: 10
Worker Parallelism: 1
******************************************************************************
Dump file set for SCOTT.SYS_EXPORT_TABLE_01 is:
/u01/dump/D401.dmp
/u01/dump/D402.dmp
/u01/dump/D403.dmp
/u01/dump/D404.dmp
/u01/dump/D405.dmp
/u01/dump/D406.dmp
/u01/dump/D407.dmp
/u01/dump/D408.dmp
/u01/dump/D409.dmp
/u01/dump/D410.dmp
Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug 5 06:37:20 2020 elapsed 0 00:00:40
[oracle@test dump]$ expdp scott/tiger directory=dump dumpfile=D5%U.dmp tables=RANGE_PART_TAB

测试下时间对比。

Job "SCOTT"."SYS_EXPORT_TABLE_01" successfully completed at Wed Aug 5 06:39:14 2020 elapsed 0 00:01:21

并行10,非并行时间   40s:80s=1:2  时间上减少了很多。数据越大,效率越明显。

复制代码

总结:1.对于单个表而言,表小于250M,并行无任何意义,都是1;

            2.对单个表来说,并行度2 ,基本上都是第一个进程导出元数据,第二个导出数据,第一个导出元数据的进程导出后,数据未导完毕,会继续往dump灌数据,但是几乎没有并行效果;

            3.对于非分区表来说,表很大,并行开很多,仅仅只是导出时辅助进程多几个,效率这块未验证,但是初步对比,没啥效果;

            4.对于分区表来说,表很大,单个分区大于250M,并行很多才有用,分区都很小,可以忽略等同于无并行。  数据量越大,并行才能真正使用。

APPLIES TO:

Oracle Database - Enterprise Edition - Version 10.1.0.2 to 12.2.0.1 [Release 10.1 to 12.2]
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.

GOAL

Starting with Oracle10g, Oracle Data Pump can be used to move data in and out of a database. Data Pump can make use of different methods to move the data, and will automatically choose the fastest method. It is possible though, to manually enforce a specific method. This document demonstrates how to specify the method with which data will be loaded or unloaded with Data Pump.

SOLUTION

1. Introduction.

Data Pump can use four mechanisms to move data in and out of a database:

  • Data file copying;
  • Direct path;
  • External tables;
  • Network link import.

The two most commonly used methods to move data in and out of databases with Data Pump are the "Direct Path" method and the "External Tables" method.

1.1. Direct Path mode.
After data file copying, direct path is the fastest method of moving data. In this method, the SQL layer of the database is bypassed and rows are moved to and from the dump file with only minimal interpretation. Data Pump automatically uses the direct path method for loading and unloading data when the structure of a table allows it.

1.2. External Tables mode.
If data cannot be moved in direct path mode, or if there is a situation where parallel SQL can be used to speed up the data move even more, then the external tables mode is used. The external table mechanism creates an external table that maps the dump file data for the database table. The SQL engine is then used to move the data. If possible, the APPEND hint is used on import to speed the copying of the data into the database.
Note: When the Export NETWORK_LINK parameter is used to specify a network link for an export operation, a variant of the external tables method is used. In this case, data is selected from across the specified network link and inserted into the dump file using an external table.

1.3. Data File Copying mode.
This mode is used when a transport tablespace job is started, i.e.: the TRANSPORT_TABLESPACES parameter is specified for an Export Data Pump job. This is the fastest method of moving data because the data is not interpreted nor altered during the job, and Export Data Pump is used to unload only structural information (metadata) into the dump file.

1.4. Network Link Import mode.
This mode is used when the NETWORK_LINK parameter is specified during an Import Data Pump job. This is the slowest of the four access methods because this method makes use of an INSERT SELECT statement to move the data over a database link, and reading over a network is generally slower than reading from a disk.

The "Data File Copying" and "Network Link Import" methods to move data in and out of databases are outside the scope of this article, and therefore not discussed any further.

For details about the access methods of the classic export client (exp), see:
Note:155477.1 "Parameter DIRECT: Conventional Path Export Versus Direct Path Export"

2. Export Data Pump: unloading data in "Direct Path" mode.

Export Data Pump will use the "Direct Path" mode to unload data in the following situations:

EXPDP will use DIRECT_PATH mode if:

2.1. The structure of a table allows a Direct Path unload, i.e.:
     - The table does not have fine-grained access control enabled for SELECT.
     - The table is not a queue table.
     - The table does not contain one or more columns of type BFILE or opaque, or an object type containing opaque columns.
     - The table does not contain encrypted columns.
     - The table does not contain a column of an evolved type that needs upgrading.
     - If the table has a column of datatype LONG or LONG RAW, then this column is the last column.

2.2. The parameters QUERY, SAMPLE, or REMAP_DATA parameter were not used for the specified table in the Export Data Pump job.

2.3. The table or partition is relatively small (up to 250 Mb), or the table or partition is larger, but the job cannot run in parallel because the parameter PARALLEL was not specified (or was set to 1).

Note that with an unload of data in Direct Path mode, parallel I/O execution Processes (PX processes) cannot be used to unload the data in parallel (paralllel unload is not supported in Direct Path mode).

3. Export Data Pump: unloading data in "External Tables" mode.

Export Data Pump will use the "External Tables" mode to unload data in the following situations:

EXPDP will use EXTERNAL_TABLE mode if:

3.1. Data cannot be unloaded in Direct Path mode, because of the structure of the table, i.e.: 
     - Fine-grained access control for SELECT is enabled for the table.
     - The table is a queue table.
     - The table contains one or more columns of type BFILE or opaque, or an object type containing opaque columns.
     - The table contains encrypted columns.
     - The table contains a column of an evolved type that needs upgrading.
     - The table contains a column of type LONG or LONG RAW that is not last.

3.2. Data could also have been unloaded in "Direct Path" mode, but the parameters QUERY, SAMPLE, or REMAP_DATA were used for the specified table in the Export Data Pump job.

3.3. Data could also have been unloaded in "Direct Path" mode, but the table or partition is relatively large (> 250 Mb) and parallel SQL can be used to speed up the unload even more.

Note that with an unload of data in External Tables mode, parallel I/O execution Processes (PX processes) can be used to unload the data in parallel. In that case the Data Pump Worker process acts as the coordinator for the PX processes. However, this does not apply when the table has a LOB column: in that case the table parallelism will always be 1. This restriction has been lifted in Oracle12c 12.1.0.1. See also:
Bug:5943346 "PRODUCT ENHANCEMENT: PARALLELISM OF DATAPUMP JOB ON TABLE WITH LOB COLUMN"
Bug:12866324 "34708: DATA PUMP PREVENTS PARALLEL OPTS AND DPAPI LOADS THAT SHOULD BE ALLOWED" (not a public bug), fixed in 12.1.0.1 (cannot be backported to earlier releases)

4. Import Data Pump: loading data in "Direct Path" mode.

Import Data Pump will use the "Direct Path" mode to load data in the following situations:

IMPDP will use DIRECT_PATH if:

4.1. The structure of a table allows a Direct Path load, i.e.:
     - A global index does not exist on a multipartition table during a single-partition load. This includes object tables that are partitioned.
     - A domain index does not exist for a LOB column.
     - The table is not in a cluster.
     - The table does not have BFILE columns or columns of opaque types.
     - The table does not have VARRAY columns with an embedded opaque type.
     - The table does not have encrypted columns.
     - Supplemental logging is not enabled or supplemental logging is enabled and the table does not have a LOB column.
     - The table into which data is being imported is a pre-existing table and:
        – There is not an active trigger, and:
        – The table is not partitioned, and:
        – Fine-grained access control for INSERT mode is not enabled, and:
        – A constraint other than table check does not exist, and:
        – A unique index does not exist.

4.2 The parameters QUERY, REMAP_DATA parameter were not used for the specified table in the Import Data Pump job.

4.3. The table or partition is relatively small (up to 250 Mb), or the table or partition is larger, but the job cannot run in parallel because the parameter PARALLEL was not specified (or was set to 1).

5. Import Data Pump: loading data in "External Tables" mode.

Import Data Pump will use the "External Tables" mode to load data in the following situations:

IMPDP will use EXTERNAL_TABLE if:

5.1. Data cannot be loaded in Direct Path mode, because at least one of the following conditions exists:
     - A global index on multipartition tables exists during a single-partition load. This includes object tables that are partitioned.
     - A domain index exists for a LOB column.
     - A table is in a cluster.
     - A table has BFILE columns or columns of opaque types.
     - A table has VARRAY columns with an embedded opaque type.
     - The table has encrypted columns.
     - Supplemental logging is enabled and the table has at least one LOB column.
     - The table into which data is being imported is a preexisting table and at least one of the following conditions exists:
       - There is an active trigger
       - The table is partitioned
       - Fine-grained access control is in insert mode
       - A referential integrity constraint exists
       - A unique index exists

5.2. Data could also have been loaded in "Direct Path" mode, but the parameters QUERY, or REMAP_DATA were used for the specified table in the Import Data Pump job.

5.3. Data could also have been loaded in "Direct Path" mode, but the table or partition is relatively large (> 250 Mb) and parallel SQL can be used to speed up the load even more.

Note that with a load of data in External Tables mode, parallel I/O execution Processes (PX processes) can be used to load the data in parallel. In that case the Data Pump Worker process acts as the coordinator for the PX processes. However, this does not apply when the table has a LOB column: in that case the table parallelism will always be 1. This restriction has been lifted in Oracle12c 12.1.0.1. See also:
Bug:5943346 "PRODUCT ENHANCEMENT: PARALLELISM OF DATAPUMP JOB ON TABLE WITH LOB COLUMN"
Bug:12866324 "34708: DATA PUMP PREVENTS PARALLEL OPTS AND DPAPI LOADS THAT SHOULD BE ALLOWED" (not a public bug), fixed in 12.1.0.1 (cannot be backported to earlier releases)

6. How to enforce a specific load/unload method ?

In very specific situations, the parameter ACCESS_METHOD can be used to enforce a specific method to unload or load the data. Example:

%expdp system/<password> ... ACCESS_METHOD=DIRECT_PATH 
%expdp system/<password> ... ACCESS_METHOD=EXTERNAL_TABLE 

or:

%impdp system/<password> ... ACCESS_METHOD=DIRECT_PATH 
%impdp system/<password> ... ACCESS_METHOD=EXTERNAL_TABLE 

Important Need-To-Know's when the parameter ACCESS_METHOD is specified for a job:

  • In Oracle 11gR1 and before, the parameter ACCESS_METHOD should only be used when requested by Oracle Support.
    Note:
    With Oracle 11gR2, the parameter ACCESS_METHOD has been documented.
      Please refer to Oracle® Database Utilities
      11g Release 2 (11.2)
      E22490-05
  • If the parameter is not specified, then Data Pump will automatically choose the best method to load or unload the data.
  • If import Data Pump cannot choose due to conflicting restrictions, an error will be reported:
    ORA-31696: unable to export/import TABLE_DATA:"<SCHEMA_NAME>"."<TABLE_NAME>" using client specified AUTOMATIC method
  • The parameter can only be specified when the Data Pump job is initially started (i.e. the parameter cannot be specified when the job is restarted).
  • If the parameter is specified, the method of loading or unloading the data is enforced on all tables that need to be loaded or unloaded with the job.
  • Enforcing a specific method may result in a slower performance of the overall Data Pump job, or errors such as:

...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
ORA-31696: unable to export/import TABLE_DATA:"<SCHEMA_NAME>"."<TABLE_NAME>" using client specified DIRECT_PATH method
...

  • To determine which access method is used, a Worker trace file can be created, e.g.:

%expdp system/<PASSWORD> DIRECTORY=my_dir \
DUMPFILE=expdp_s.dmp LOGFILE=expdp_s.log \
TABLES="<SCHEMA_NAME>"."<TABLE_NAME>" TRACE=400300

The Worker trace file shows the method with which the data was loaded (or unloaded for Import Data Pump):

...
KUPW:14:57:14.289: 1: object: TABLE_DATA:"<SCHEMA_NAME>"."<TABLE_NAME>"
KUPW:14:57:14.289: 1: TABLE_DATA:"<SCHEMA_NAME>"."<TABLE_NAME>" external table, parallel: 1
...

For details, see also:
Note:286496.1 " Export/Import DataPump Parameter TRACE - How to Diagnose Oracle Data Pump"

7. Known issues.

7.1. Bug 4722517 - Materialized view log not updated after import into existing table
Defect:  Bug:4722517 "MATERIALIZED VIEW LOG NOT UPDATED AFTER IMPORT DATAPUMP JOB INTO EXISTING TABLE"
Symptoms:  a materialized view is created with FAST REFRESH on a master table; if data is imported into this master table, then these changes (inserts) do not show up in the materialized view log
Releases:  10.1.0.2.0 and higher
Fixed in:  not applicable, closed as not-a-bug
Patched files:  not applicable 
Workaround:  if possible import into a temporary holding table then copy the data with "insert as select" into the master table
Cause:  a fast refresh does not apply changes that result from bulk load operations on masters, such as an INSERT with the APPEND hint used by Import Data Pump
Trace:  not applicable, changes are not propagated
Remarks:  see also Note:340789.1 "Import Datapump (Direct Path) Does Not Update Materialized View Logs "

7.2. Bug 5599947 - Export Data Pump is slow when table has a LOB column
Defect:  Bug:5599947 "DATAPUMP EXPORT VERY SLOW"
Symptoms:  Export Data Pump has low performance when exporting table with LOB column
Releases:  11.1.0.6 and below
Fixed in:  not applicable, closed as not feasible to fix
Patched files:  not applicable
Workaround:  if possible re-organize the large table with LOB column and make it partitioned
Cause:  if a table has a LOB column, and the unload or load takes place in "External Tables" mode, then we cannot make use of parallel I/O execution Processes (PX processes)
Trace:  not applicable
Remarks:  see also Bug:5943346 "PRODUCT ENHANCEMENT: PARALLELISM OF DATAPUMP JOB ON TABLE WITH LOB COLUMN"

7.3. Bug 5941030 - Corrupt blocks after Import Data Pump when table has LONG / LONG RAW column
Defect:  Bug:5941030 "Datapump import can produce corrupt blocks when there is a LONG / LONG RAW"
Symptoms:  Direct Path import of a LONG / LONG RAW column can create corrupt blocks in the database. If DB_BLOCK_CHECKING is enabled then an ORA-600 [6917] error can be signalled. If not then the corrupt block can cause subsequent problems, like ORA-1498 (block check failure) on an analyze of the table.
Releases:  11.1.0.6 and below
Fixed in:  10.2.0.5.0 and 11.1.0.7.0 and higher; for some platforms a fix on top of 10.2.0.2.0 and on top of 10.2.0.3.0 is available with Patch:5941030
Patched files:  kdbl.o
Workaround:  if possible use the classic export and import clients to transfer this table
Cause:  internal issue with column count when loading table with LONG/LONG RAW column in Direct Path mode
Trace:  not applicable
Remarks:  see also Note:457128.1 "Logical Corruption Encountered After Importing Table With Long Column Using DataPump"



NOTE:469439.1 - DataPump Import (IMPDP) Can Fail With Error ORA-31696 If ACCESS_METHOD=DIRECT_PATH Is Manually Specified
NOTE:340789.1 - Import Datapump (Direct Path) Does Not Update Materialized View Logs
NOTE:365459.1 - Parallel Capabilities of Oracle Data Pump


NOTE:453895.1 - Checklist For Slow Performance Of DataPump Export (expdp) And Import (impdp)
NOTE:286496.1 - Export/Import DataPump Parameter TRACE - How to Diagnose Oracle Data Pump
BUG:4722517 - MATERIALIZED VIEW LOG NOT UPDATED AFTER IMPORT DATAPUMP JOB INTO EXISTING TABLE

BUG:5599947 - DATAPUMP EXPORT VERY SLOW

BUG:5941030 - DATAPUMP IMPORT CAN CORRUPT DATA WHEN THERE IS A LONG / LONG RAW

NOTE:457128.1 - Logical Corruption Encountered After Importing Table With Long Column Using DataPump
NOTE:155477.1 - Parameter DIRECT: Conventional Path Export Versus Direct Path Export

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值