impdp 慢的大坑

jnrjian

已于 2024-05-26 17:05:03 修改

阅读量830

点赞数 10

文章标签： oracle 数据库

于 2024-05-22 01:24:36 首次发布

本文链接：https://blog.csdn.net/jnrjian/article/details/139105753

版权

这篇文章虽然没用到unusable index 再建，只是改变了index的并行度就能变快。

可以考虑import之前修改表和index的并行度试试。。。

ORACLE迁移索引并行未生效处理方式

案例是本地服务器 oracle12c->19c实施过程中遇到的问题及解决方法如下：

为提高oracle数据导入效率，尽可能减少数据迁移时间，现总结如下：
使用网络泵进行数据库迁移命令：
impdp system/******** directory=LOG_DIR parallel=8 network_link=sourcedb schemas=CCFMP transform=disable_archive_logging:y logtime=all logfile=LOG_DIR:impdp_bdi20201211.log exclude=statistic
虽然impdp设置了并行参数parallel=8，但是在迁移索引时是不使用并行的，parallel并行参数不起作用，查看数据泵日志，impdp已迁移索引数据*00G，耗时*小时，每小时迁移索引数据量*G左右，为提升索引创建效率，采用如下方式手工生成索引创建脚本,同时使用并行度参数，每小时迁移索引数据量**0G左右，较数据泵方式迁移效率提升2.6倍。
在源库上执行：
su - oracle
sqlplus / as sysdba

set pagesize 0
set long 900000
set feedback off
set echo off
set linesize 255
spool "/tmp/create_index.sql";
select dbms_metadata.get_ddl('INDEX',u.index_name,u.owner) ||'parallel'|| ';' from dba_indexes u where u.owner in ('CCFMP');
spool off;

注：CCFMP为待迁移的数据库schema用户，需要根据实际迁移情况修改。
将索引创建脚本ftp上传到目标数据库，执行索引创建脚本：
su - oracle
sqlplus / as sysdba
SQL>@create_index.sql
这样impdp语句在遇到已创建的索引时会提示索引已创建并跳过，从而提升迁移速度。

为避免开启索引并发度引起性能争用，索引数据迁移完成后，关闭并行度。
在源库上执行：
su - oracle
sqlplus / as sysdba
SQL>set pagesize 0
set long 900000
set feedback off
set echo off
set linesize 255
SQL>alter_index.sql
SQL>select ’alter index ’ || u.index_name || ' noparallel;' from dba_indexes u where u.owner in ('CCFMP');
SQL>spool off;
将索引修改脚本ftp上传到目标数据库，执行索引修改脚本：
su - oracle
sqlplus / as sysdba

SQL>@alter_index.sql

原文链接：https://blog.csdn.net/danny0798/article/details/114584768

-----LOB 类型， rowid分割数据---------------

概述
Oracle备份方式主要分为数据泵导出备份、热备份与冷备份三种，数据泵导出/导入属于逻辑备份，热备份与冷备份都属于物理备份。oracle10g开始推出了数据泵(expdp /impdp)，可以使用并行参数选项，因此，相对于传统的exp命令来说，执行效率更高。

今天主要分享下EXPDP、IMPDP规范和一些技巧。

一、并行提升速度
在EXPDP/IMPDP过程中，为了提升速度，而使用并行，有人说不就是加个parallel=n参数嘛，但该如何利用该参数做到真正提升速度呢？

1、单表导出导入数据时使用parallel=n参数无效

2、导入(IMPDP)索引过程中是串行的方式，虽然在执行计划中看到建索引时是带并行参数，但建议使用sqlfile参数生成语句的方式建索引，避免因创建了结构后，再导入数据，这样就会变得异常慢。大概步骤如下：

cat >expdp_testdump.par < expdp_testdump.par.out &--通过sqlfile参数生成创建索引语句

cat >impdp_testdump_idx.par < impdp_testdump_idx.par.out &--修改创建索引的并行度，并行度建议不超过CPU核数的1.5倍

--LINUX环境使用sed -i 's/PARALLEL 1/PARALLEL 16/g' impdp_testdump_idx.sql

--因AIX环境sed没有-i参数，可以使用如下两种方法：perl -pi -e 's/ PARALLEL 1/PARALLEL 16/g' impdp_testdump_idx.sql

或者vi impdp_testdump_idx.sql << EOF:%s/ PARALLEL 1/PARALLEL 16/g:wqEOF--等导入完数据之后，执行建索引的SQL：cat create_testdump_index.sh. ~/.profilesqlplus / as sysdba < create_testdump_index.sh.out &
1.
以上可能看起来简单的事变的复杂，步骤多了，本来一个参数能解决的事(实际也不能解决)，这个步骤已经经过多次实践，解决索引不能并行建的问题。

3、在线调整并行度

当导出导入动作已经发起后，发现并行还可以加大，或者需要减少，该怎么办？

expdp '/ as sysdba' attach=SYS_EXPORT_SCHEMA_01
1.
再执行paralele n就可以调整。

二、含LOB大表导出技巧
在导出含LOB字段的大表时，表未分区，并且表大小已过TB，在导出过程中经常遇到因undo表空间大小和undo_retention设置保留时间，导致ORA-01555: snapshot too old的报错。那该怎么避免呢？

通过rowid或者主键的方式分批导出(推荐使用rowid)

--创建一个表用于记录要导出表的rowid，并分批，这里分成50create table scott.exp_rowid as select mod(rownum,50) sou_seq,rowid sou_rowid from scott.lobtest;--根据分批次数生成对应的parfile文件cat >expdp_lobtest_seq0.par < expdp_lobtest_seq0.par.out &…..nohup expdp expdp_lobtest_20181219_seq50.par > expdp_lobtest__seq50.par.out & 分成50个批次导出,可通过循环生成导出脚本
1.
或者通过如下脚本：

chunk=10for ((i=0;i<=9;i++));doexpdp /as sysdba TABLES=LOBTEST QUERY=LOBTEST:"where mod(dbms_rowid.rowid_block_number(rowid), ${chunk}) = ${i}" directory=DMP dumpfile=lobtest_${i}.dmp logfile= lobtest_${i}.log &echo $idone
1.
三、进度查询和监控
当领导问你导出数据进度时，会不会手忙脚乱的，无从查起？当然，作为一个负责任的DBA，实时的知道导出导入的进度，是必须掌握的技能。

1、查看数据泵作业的运行情况

select owner_name owr,job_name jbn,operation ope,job_mode jbm,state,degree, attached_sessions atts,datapump_sessions dats from dba_datapump_jobs;select sid, serial#,session_type from v$session s, dba_datapump_sessions d where s.saddr = d.saddr;
1.
2、监控数据泵的逻辑备份程度

SELECT sid, serial#, context, sofar, totalwork, ROUND(sofar/totalwork*100,2) "%_COMPLETE" FROM v$session_longops WHERE totalwork != 0 AND sofar <> totalwork; SID SERIAL# CONTEXT SOFAR TOTALWORK %_COMPLETE103 89 0 54 7 83.33
1.
3、查看数据泵的详细进度

expdp '/ as sysdba' attach= SYS_IMPORT_TABLE_01 Import> status Job: SYS_IMPORT_TABLE_01 Operation: IMPORT Mode: TABLE State: EXECUTING Bytes Processed: 1,364,791,288 Percent Done: 99 Current Parallelism: 2 Job Error Count: 0 Dump File: /opt/datadump/expdp_testdump.dmp
1.

-----------------------------------
impdp 建索引并行 impdp 导入索引
https://blog.51cto.com/u_56701/9760371

数据仓库系统数据库的一部分数据在导出导入迁移到12C数据库时，碰到一个问题，导入分区表(3000多分区，30000多子分区)的时候非常慢，
导入的数据量其实并不是非常大， 600多G , 观察发现一个子分区导入结束需要 1-2小时，进入impdp中查看进度，发现只有一个worker在工作，
虽然并行有16个。在metalink搜索发现一个类似文章，原来不能事先创建分区表结构。需要导入的时候一起导入结构及数据才能利用到并行。

Slow DataPump Import (Impdp) Performance using Parallel Option with Pre-created Partitioned Tables (文档 ID 2098931.1)

SYMPTOMS

Partitioned tables have been created prior to the Import with IMPDP and the PARALLEL option is being used but only one of the workers is active and the others are idle resulting in a slow import compared to the export time.

The tables involved will show in the trace:

INSERT /*+ APPEND PARALLEL */

Using "data_options=disable_append_hint" does not change the "Append" hint being used.

CAUSE

IMPDP is working as designed: In order to use Parallel, let IMPDP create the Partitioned tables itself.

According to Oracle documentation
PARTITION_OPTIONS

"Parallel processing during import of partitioned tables is subject to the following:

If a partitioned table is imported into an existing partitioned table, then Data Pump only processes one partition or subpartition at a time, regardless of any value that might be specified with the PARALLEL parameter.

If the table into which you are importing does not already exist and Data Pump has to create it, then the import runs in parallel up to the parallelism specified on the PARALLEL parameter when the import is started.

SOLUTION

Let IMPDP create the Partitioned Tables (rather than creating them before the import) if you wish to use the Parallel option with IMPDP to speed up the Import process.

-----------------

This is an impdp performance issue where an import with external table access_method is not running the job in parallel even after starting the PQ slave correctly.
The performance issue is only hit when running the job with SYSDBA user. The root cause of this issue has been determined in Bug 19520061 - IMPDP: EXTREMELY SLOW IMPORT FOR A PARTITIONED TABLE.

SOLUTION

1. Upgrade the database to 12.2, when available, where issue is fixed.

- OR -

2. For earlier database releases please check Patch 19520061, if available for your platform and RDBMS version.

- OR -

3. Run the DataPump import job with an user other than SYS.

-----

SYMPTOMS

When running IMPDP against an existing partitioned table with the TABLE_EXISTS_ACTION command line parameter set to TRUNCATE or APPEND and also using parallellism, the worker processes seem to be idle and waiting for the following event:

enq: TM - contention

The problem does not occur when the table does not yet exist in the database, or when the IMPDP operation is run in serial mode.

CHANGES

CAUSE

The cause of this problem has been identified in Bug 7197735, closed as duplicate of unpublished Bug 8448327. It is caused by multiple Datapump worker processes being assigned to a single partition, causing one worker process to be able to write to the partition, and the other worker processes waiting to get an exclusive lock on the same partition.

SOLUTION

This issue has been fixed in Oracle12c release and the 11.2.0.2 patchset.

Workarounds for the other Oracle versions are:

apply Patch 8448327 (for Generic platform) if available for your Oracle version

- OR -
run IMPDP in serial mode (so not using the PARALLEL command line parameter)

- OR -
drop the table prior to running IMPDP

---------------------

When importing partitioned tables into existing partitioned tables with PARALLEL > 1 there will be Worker processes waiting for an exclusive table lock which is indicated by event "enq: TM - contention".

To confirm the issue, execute the following queries during the Impdp client session reports:

Processing object type TABLE_EXPORT/TABLE/TABLE_DATA

-- DW process which is holding the exclusive lock
select a.sid, a.type, a.id1, b.program
from   v$lock a, v$session b
where a.sid = b.sid and
       a.type = 'TM' and
       a.lmode = 6 and
       b.program like '%DW%';

-- DW process(es) requesting an exclusive lock
select a.sid, a.type, a.id1, b.program, b.event
from   v$lock a, v$session b
where a.sid = b.sid and
       a.type = 'TM' and
       a.request = 6 and
       b.program like '%DW%';

-- Affected table
select c.owner, c.object_name, b.program, b.event
from   v$lock a, v$session b, dba_objects c
where a.sid = b.sid and
       a.type = 'TM' and
       a.id1 = c.object_id and
       a.request = 6 and
       b.program like '%DW%';

NOTE: In the testcase content below, the user information and metadata used represents fictitious data. Any similarity to actual persons, living or dead, is purely coincidental and not intended in any manner.

The following example will illustrate the scenario:

impdp tc1/****** DIRECTORY=test_dp DUMPFILE=test%U.dmp tables=tc1.t1 TABLE_EXISTS_ACTION=APPEND PARALLEL=10

Or:

impdp tc1/****** DIRECTORY=test_dp DUMPFILE=test%U.dmp tables=tc1.t1 TABLE_EXISTS_ACTION=TRUNCATE PARALLEL=10

-- There will be a single DW process which is holding an exclusive lock on the table

-- DW process which is holding the exclusive lock
select a.sid, a.type, a.id1, b.program
from   v$lock a, v$session b
where a.sid = b.sid and
       a.type = 'TM' and
       a.lmode = 6 and
       b.program like '%DW%';

SID TY ID1        PROGRAM
----- -- ---------- -------------------------
115 TM     282361 ORACLE.EXE (DW02)

-- Several other worker processes are waiting until the exclusive lock is released.

-- DW process(es) requesting an exclusive lock
select a.sid, a.type, a.id1, b.program, b.event
from   v$lock a, v$session b
where a.sid = b.sid and
       a.type = 'TM' and
       a.request = 6 and
       b.program like '%DW%';

SID TY ID1        PROGRAM                EVENT
----- -- ---------- ---------------------- ------------------------
116 TM     282361 ORACLE.EXE (DW07)      enq: TM - contention
136 TM     282361 ORACLE.EXE (DW03)      enq: TM - contention
138 TM     282361 ORACLE.EXE (DW09)      enq: TM - contention
152 TM     282361 ORACLE.EXE (DW04)      enq: TM - contention

-- The query will report the affected table
select c.owner, c.object_name, b.program, b.event
from   v$lock a, v$session b, dba_objects c
where a.sid = b.sid and
       a.type = 'TM' and
       a.id1 = c.object_id and
       a.request = 6 and
       b.program like '%DW%';

OWNER OBJECT_NAME          PROGRAM                 EVENT
----- -------------------- ----------------------- ---------------------------
TC1   T1                   ORACLE.EXE (DW04)       enq: TM - contention
TC1   T1                   ORACLE.EXE (DW09)       enq: TM - contention
TC1   T1                   ORACLE.EXE (DW03)       enq: TM - contention
TC1   T1                   ORACLE.EXE (DW07)       enq: TM - contention

Note: A TM table lock is usually acquired during the execution of a transaction when referencing a table with a DML statement to prevent the object from being dropped or altered during the execution of the transaction.

CHANGES

This issue occurs if the following is true:

- Imported table is a partitioned table
- Data Pump Import (Impdp) uses parameter PARALLEL > 1 and TABLE_EXISTS_ACTION=APPEND or TRUNCATE

CAUSE

This is produced by unpublished Bug 8448327.

SOLUTION

The issue is fixed in release 11.2.0.2 and 12.1. There is no workaround.

Please check, if patch 8448327 is available for your RDBMS version and platform.

Important

Please read the README file delivered with the patch. There are post-install steps to be completed, otherwise the fix won't work.

---------------------------

You try to DataPump import (impdp) a partitioned table into an empty schema using PARALLEL > 1. There are DataPump Worker processes waiting for an exclusive table lock which is indicated by event 'enq: TM - contention'. The table import started with command line:

#> impdp system/<password> directory=<DIRECTORY_NAME> dumpfile=<DUMP_NAME>_%u.dmp tables="<SCHEMA_NAME>"."<TABLE_NAME>" parallel=5

is slow. The table is defined as:

create table <TABLE_NAME>
(
<COL1> varchar2(36 byte) not null,
<COL2> date not null,
<COL3> varchar2(2000)
)
partition by list (a_id)
(
partition <PART_NAME1> values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa') compress for oltp,
partition <PART_NAME2> values ('bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb') compress for oltp,
partition <PART_NAME3> values ('cccccccccccccccccccccccccccccccccccc') compress for oltp,
partition <PART_NAME4> values ('dddddddddddddddddddddddddddddddddddd') compress for oltp,
partition <PART_NAME5> values ('eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee') compress for oltp
);

DataPump import (impdp) internally uses a statement (obtained with event 10046) similar to:

INSERT /*+ APPEND PARALLEL("TAB_LP",1)+*/ INTO RELATIONAL("<SCHEMA_NAME>"."<TABLE_NAME>" NOT XMLTYPE) ("<COL1>", "<COL2>", "<COL3>")
SELECT "<COL1>", "<COL2>", "<COL3>"
FROM "ET$0095000D0001" KU$

and the following select statement started during impdp runs:

select program, event
from v$session
where program like '%DW%';

returns:

PROGRAM EVENT
-------------- ---------------------
oracle (DW00) direct path sync
oracle (DW02) enq: TM - contention
oracle (DW04) enq: TM - contention
oracle (DW01) enq: TM - contention
oracle (DW03) enq: TM - contention

CHANGES

CAUSE

Please check, if a NLS conversion is in place between source and target database.

If the target database has a different character set, then the import cannot rely on the current partition (the NLS conversion may require a repartition of the inserted values) and the worker process nulls out the partition or sub-partition name from the insert query. So now import is inserting the data into the table instead of (sub)partition. Here the external table insert query have the APPEND hint, so it's acquiring the exclusive lock on table and that is the reason of hang for other worker process.

SOLUTION

This is an expected behavior.

If no NLS conversion finds place (source and target databases are on same character set) then the same import internally uses a statement similar to:

INSERT /*+ APPEND PARALLEL("TAB_LP",1)+*/ INTO RELATIONAL("<SCHEMA_NAME>"."<TABLE_NAME>" NOT XMLTYPE) PARTITION ( "<PART_NAME1>" ) ("<COL1>", "<COL2>", "<COL3>")
SELECT "<COL1>", "<COL2>", "<COL3>"
FROM "ET$008B2DE10001" KU$

and the select:

select program, event
from v$session
where program like '%DW%';

returns:

PROGRAM EVENT
-------------- -------------------
oracle (DW03) direct path write
oracle (DW01) direct path write
oracle (DW04) direct path write
oracle (DW00) direct path write
oracle (DW02) direct path write

Note:
- A way to force real partition-level load in parallel (INSERT /*+ APPEND PARALLEL("<TABLE_NAME>",4)+*/ INTO RELATIONAL("<SCHEMA_NAME>"."<TABLE_NAME>" ) PARTITION(XXX) ... SELECT ... FROM ET$) was introduced in 12.2.0.1 via ER Bug 9590164 : TABLE_EXISTS_ACTION=TRUNCATE|APPEND ON PARTITIONED TABLE => SERIAL IMPORT.
- This enhancement was implemented in 12.2.0.1, with the parameter DATA_OPTIONS=TRUST_EXISTING_TABLE_PARTITIONS.

-------------------

SYMPTOMS

DataPump is slow with partitions.

You divided for example the export of a 900G partitioned table into four expdp dumps with partitions. The export of the whole table takes about 12 hours, the parallel run of the partition dumps about 5 hours. This saves 7 hours.

The import of the whole table running parallel 32 takes about 3 hours, but the parallel import of the four partition dumps never ends (canceled after 12 hours). Only one worker is seen in each import dump is running (31 are waiting).

CHANGES

CAUSE

This is normal behavior. When DataPump imports into an existing table, it assumes that data that existed in one partition may end up in different partitions in the import database. The repartitioning may happen for multiple reasons, for example:

The partitioning isn't the same on the import and export database
The partition names (especially system generated names) are different
Character sets for the import database are different than the character sets for the export database
The timezone is different and one of the partition fields contains a timezone

By default, DataPump uses the APPEND hint to force direct loads of the data, since direct loads are faster than conventional loads. One of the reasons direct load is faster is that it locks the table from updates from other sessions. In our case, there are multiple import jobs trying to load data into the same table. When one job gets the lock to do a direct load into a table, all other import jobs are locked out from loading that table until the current job has finished the its current load. So, even though you are executing multiple import jobs at the same time, only one is ever actually loading data. This is a feature of direct load and not something the DataPump can change.

SOLUTION

1. Export all of the data in one DataPump job using PARALLEL. The DataPump import will still load one partition at a time, but if the partition partitions are large enough, they will be loaded in parallel.

- OR -

2. Use the DISABLE_APPEND_HINT option on import. This will cause the loads to conventional load instead of direct load, but it will allow multiple import jobs to load the same table simultaneously.

Note:

There is an enhancement in datapump in version 12.2.0.1 that creates a way to force real partition-level load in parallel

(See the following note: DataPump Import (IMPDP) In Parallel Is Slow When Importing Partitioned Tables (Doc ID 1681444.1) )

jnrjian

关注

10
点赞
踩
18

收藏

觉得还不错? 一键收藏
0
评论
impdp 慢的大坑

数据仓库系统数据库的一部分数据在导出导入迁移到12C数据库时，碰到一个问题，导入分区表(3000多分区，30000多子分区)的时候非常慢，导入的数据量其实并不是非常大， 600多G , 观察发现一个子分区导入结束需要 1-2小时，进入impdp中查看进度，发现只有一个worker在工作，虽然并行有16个。在metalink搜索发现一个类似文章，原来不能事先创建分区表结构。需要导入的时候一起导入结构及数据才能利用到并行。
复制链接

扫一扫