Conventional and Direct Path Loads

Conventional Path Load

Conventional path load (the default) uses the SQL INSERT statement and a bind array buffer to load data into database tables.

During conventional path loads, the input records are parsed according to the field specifications, and each data field is copied to its corresponding bind array (an area in memory where SQL*Loader stores data to be loaded).When the bind array is full (or no more data is left to read), an array insert operation is performed.

When SQL*Loader performs a conventional path load, it competes equally with(竞争) all other processes for buffer resources. This can slow the load significantly. Extra overhead is added as SQL statements are generated, passed to Oracle Database, and executed.

Oracle Database looks for partially filled blocks and attempts to fill them on each insert. Although appropriate during normal use, this can slow bulk loads dramatically.

  1. Conventional Path Load of a Single Partition

During a conventional path load of a single partition, SQL*Loader uses the partition-extended syntax of the INSERT statement, which has the following form:

INSERT INTO TABLE T PARTITION (P) VALUES ...

The SQL layer of the Oracle kernel determines if the row being inserted maps to the specified partition. If the row does not map to the partition, then the row is rejected, and the SQL*Loader log file records an appropriate error message.

  1. When to Use a Conventional Path Load

Direct path不能用才用conventional path

  1. When accessing an indexed table concurrently with the load, or when applying inserts or updates to a nonindexed table concurrently with the load

SQL * Loader对表具有独占写访问权,并且对任何索引都具有独占读/写访问权。在导入时同时有并发DML操作时只能用conventional path load

To use a direct path load (except for parallel loads), SQL*Loader must have exclusive write access to the table and exclusive read/write access to any indexes.

  1. When loading data into a clustered table. A direct path load does not support loading of clustered tables.
  2. When loading a relatively small number of rows into a large indexed table

During a direct path load, the existing index is copied when it is merged with the new index keys. If the existing index is very large and the number of new keys is very small, then the index copy time can offset the time saved by a direct path load.

  1. When loading a relatively small number of rows into a large table with referential and column-check integrity constraints

Because these constraints cannot be applied to rows loaded on the direct path, they are disabled for the duration of the load. Then they are applied to the whole table when the load completes. The costs could outweigh the savings for a very large table and a small number of new rows.

  1. When loading records and you want to ensure that a record is rejected under any of the following circumstances:

If the record, upon insertion, causes an Oracle error

If the record is formatted incorrectly, so that SQL*Loader cannot find field boundaries

If the record violates a constraint or tries to make a unique index non-unique

  • Direct Path Load

Direct path使用direct path API解析记录,然后把字段数据转化为表列的数据类型,并把它发送到direct path load engine, load engine创建column array,并把它格式化为oracle blocks并创建index key. 使用多个buffer来缓存格式化的blocks,在一个或多个buffer写满后使用异步IO写入数据文件

Instead of filling a bind array buffer and passing it to the Oracle database with a SQL INSERT statement, a direct path load uses the direct path API to pass the data to be loaded to the load engine in the server. The load engine builds a column array structure from the data passed to it(A direct path load parses the input records according to the field specifications, converts the input field data to the column data type, and builds a column array.

The direct path load engine uses the column array structure to format Oracle data blocks and build index keys.Internally, multiple buffers are used for the formatted blocks. While one buffer is being filled, one or more buffers are being written if asynchronous I/O is available on the host platform. Overlapping computation with I/O increases load performance.

A direct load does not compete with other users for database resources, so it can usually load data at near disk speed.

  1. Data Conversion During Direct Path Loads

During a direct path load, data conversion occurs on the client side rather than on the server side. This means that NLS parameters in the initialization parameter file (server-side language handle) will not be used. To override this behavior, you can specify a format mask in the SQL*Loader control file that is equivalent to the setting of the NLS parameter in the initialization parameter file, or set the appropriate environment variable.

For example, to specify a date format for a field, you can either set the date format in the SQL*Loader control file or set an NLS_DATE_FORMAT environment variable

  1. Direct Path Load of a Partitioned or Subpartitioned Table   --导入分区表

导入整个分区会对全局和局部索引维护,但只导入一个分区或子分区不会对全局索引维护

When loading a partitioned or subpartitioned table, SQL*Loader partitions the rows and maintains indexes (which can also be partitioned). 

导入整个分区表会产生严重的资源争用

Note that a direct path load of a partitioned or subpartitioned table can be quite resource-intensive for tables with many partitions or subpartitions.

  1. Direct Path Load of a Single Partition or Subpartition   --导入一个分区或子分区

Local index partitions that correspond to the data partition or subpartition being loaded are maintained by SQL*Loader. Global indexes are not maintained on single partition or subpartition direct path loads. During a direct path load of a single partition, SQL*Loader uses the partition-extended syntax of the LOAD statement, which has either of the following forms:

LOAD INTO TABLE T PARTITION (P) VALUES ...

LOAD INTO TABLE T SUBPARTITION (P) VALUES ...

除导入分区外的分区可以DML或执行direct path load

While you are loading a partition of a partitioned or subpartitioned table, you are also allowed to perform DML operations on, and direct path loads of, other partitions in the table.

  1. Advantages of a Direct Path Load

A direct path load is faster than the conventional path for the following reasons:

  1. Partial blocks are not used, so no reads are needed to find them, and fewer writes are performed.
  2. SQL*Loader need not execute any SQL INSERT statements; therefore, the processing load on the Oracle database is reduced.
  3. A direct path load calls on Oracle to lock tables and indexes at the start of the load and releases them when the load is finished. A conventional path load calls Oracle once for each array of rows to process a SQL INSERT statement.
  4. A direct path load uses multiblock asynchronous I/O for writes to the database files.
  5. During a direct path load, processes perform their own write I/O, instead of using Oracle's buffer cache. This minimizes contention with other Oracle users.
  6. The sorted indexes option available during direct path loads enables you to presort data using high-performance sort routines that are native to your system or installation.
  7. When a table to be loaded is empty, the presorting option eliminates the sort and merge phases of index-building. The index is filled in as data arrives.
  8. Protection against instance failure does not require redo log file entries during direct path loads. Therefore, no time is required to log the load when:

The Oracle database has the SQL NOARCHIVELOG parameter enabled

The SQL*Loader UNRECOVERABLE clause is enabled

The object being loaded has the SQL NOLOGGING parameter set

  1. Restrictions on Using Direct Path Loads
  1. Tables to be loaded cannot be clustered.
  2. Tables to be loaded cannot have Oracle Virtual Private Database (VPD) policies active on INSERT.
  3. Segments to be loaded cannot have any active transactions pending.

To check for this condition, use the Oracle Enterprise Manager command MONITOR TABLE to find the object ID for the tables you want to load. Then use the command MONITOR LOCK to see if there are any locks on the tables.

  1. Beginning with Oracle9i, you can perform a SQL*Loader direct path load when the client and server are different releases. However, both releases must be at least release 9.0.1 and the client release must be the same as or lower than the server release. For example, you can perform a direct path load from a release 9.0.1 database into a release 9.2 database. However, you cannot use direct path load to load data from a release 10.0.0 database into a release 9.2 database.
  2. The following features are not available with direct path load:

Loading BFILE columns

Use of CREATE SEQUENCE during the load. This is because in direct path loads there is no SQL being generated to fetch the next value since direct path does not generate INSERT statements.

  1. Restrictions on a Direct Path Load of a Single Partition

In addition to the previously listed restrictions, loading a single partition has the following restrictions:

  1. The table that the partition is a member of cannot have any global indexes defined on it.
  2. Enabled referential and check constraints on the table that the partition is a member of are not allowed.
  3. Enabled triggers are not allowed.
  1. Integrity Constraints

All integrity constraints are enforced during direct path loads, although not necessarily at the same time. 

NOT NULL constraints are enforced during the load. Records that fail these constraints are rejected.

UNIQUE constraints are enforced both during and after the load. A record that violates a UNIQUE constraint is not rejected (the record is not available in memory when the constraint violation is detected).

Integrity constraints that depend on other rows or tables, such as referential constraints, are disabled before the direct path load and must be reenabled afterwards. If REENABLE is specified, then SQL*Loader can reenable them automatically at the end of the load. When the constraints are reenabled, the entire table is checked. Any rows that fail this check are reported in the specified error log. See "Direct Path Loads_ Integrity Constraints_ and Triggers".

  1. Field Defaults on the Direct Path

Default column specifications defined in the database are not available when you use direct path loading. Fields for which default values are desired must be specified with the DEFAULTIF clause. If a DEFAULTIF clause is not specified and the field is NULL, then a null value is inserted into the database.

  1. Loading into Synonyms

You can load data into a synonym for a table during a direct path load, but the synonym must point directly to either a table or a view on a simple table. Note the following restrictions:

Direct path mode cannot be used if the view is on a table that has user-defined types or XML data.

In direct path mode, a view cannot be loaded using a SQL*Loader control file that contains SQL expressions.

  • Using Direct Path Load
  1. 使用前需要在服务器执行catldr.sql来创建相关视图
  2. 命令行参数中要指定DIRECT=TRUE
  3. 关于维护索引使用的临时段

You can improve performance of direct path loads by using temporary storage. After each block is formatted, the new index keys are put in a sort (temporary) segment. The old index and the new keys are merged at load finish time to create the new index. The old index, sort (temporary) segment, and new index segment all require storage until the merge is complete. Then the old index and temporary segment are removed.

如果有多个索引则顺序进行维护

When multiple indexes are built, the temporary segments corresponding to each index exist simultaneously, in addition to the old indexes. The new keys are then merged with the old indexes, one index at a time. As each new index is created, the old index and the corresponding temporary segment are removed.

如果导入数据已被排序好并有导入表索引为空则不会使用临时段

If, during a direct load, you have specified that the data is to be presorted and the existing index is empty, then a temporary segment is not required, and no merge occurs—the keys are put directly into the index.

During a conventional path load, every time a row is inserted the index is updated. This method does not require temporary storage space, but it does add processing time.

如果内存有限制可以使用SINGLEROW 选项来提高性能

可以使用1.3*key_storage来评估需要的临时段空间大小(如果导入数据完全与索引是相反的顺序,这是最坏的情况则需要2倍的大小,而如果导入数据与索引顺序一致,则1倍即可), key_storage可用(number_of_rows) * ( 10 + sum_of_column_sizes + number_of_columns )来评估

  1. 索引的unusable状态

在导入数据时会将索引标为unusable,这使表数据比索引数据更新,这时任何使用过些的SQL都会报错,下面的情况会导致导入结果索引(包含分区索引)仍为unusable状态:

  1. SQL*Loader runs out of space for the index and cannot update the index.
  2. The data is not in the order specified by the SORTED INDEXES clause.
  3. There is an instance failure, or the Oracle shadow process fails while building the index.
  4. There are duplicate keys in a unique index.
  5. Data savepoints are being used, and the load fails or is terminated by a keyboard interrupt after a data savepoint occurred.

查看索引是否unusable:

SELECT INDEX_NAME, STATUS FROM DBA_INDEXES WHERE TABLE_NAME = 'tablename';

SELECT INDEX_NAME, PARTITION_NAME, STATUS FROM DBA_IND_PARTITIONS WHERE STATUS != 'VALID';

  1. Using Data Saves to Protect Against Data Loss

使用data save可以保证导入数据提交,因为索引最后导入所以data save不能保证索引数据,继续导入后需要重建

You can use data saves to protect against loss of data due to instance failure. All data loaded up to the last savepoint is protected against instance failure. To continue the load after an instance failure, determine how many rows from the input file were processed before the failure, then use the SKIP parameter to skip those processed rows.

If there are any indexes on the table, drop them before continuing the load, and then re-create them after the load.

Using the ROWS Parameter

使用rows选项来使用data save, Rows表示导入多少行后进行data save,rows指定的行数要最少15分钟才完成,将ROWS的值设置较小会对性能和数据块空间利用率产生不利影响

During a data save, loading stops until all of SQL*Loader's buffers are successfully written. You should select the largest value for ROWS that is consistent with safety.

A data save is an expensive operation. The value for ROWS should be set high enough so that a data save occurs once every 15 minutes or longer. The intent is to provide an upper boundary (high-water mark) on the amount of work that is lost when an instance failure occurs during a long-running direct path load. Setting the value of ROWS to a small number adversely affects performance and data block space utilization.

注:在conventional load中,rows选项表示导入多少行后进行commit, 这里data save与commit稍有区别,即data save不对索引处理,即索引仍为unusable,它们的相似点:

  1. A data save will make the rows visible to other users.
  2. Rows cannot be rolled back after a data save.

  • Direct Path Loads, Integrity Constraints, and Triggers

With the conventional path load method, arrays of rows are inserted with standard SQL INSERT statements—integrity constraints and insert triggers are automatically applied.

  1. Integrity Constraints

During a direct path load, some integrity constraints are automatically disabled. Others are not.

Enabled Constraints :NOT NULL, UNIQUE, PRIMARY KEY (unique-constraints on not-null columns)

NOT NULL constraints are checked at column array build time. Any row that violates the NOT NULL constraint is rejected.

Even though UNIQUE constraints remain enabled during direct path loads, any rows that violate those constraints are loaded anyway (this is different than in conventional path in which such rows would be rejected). When indexes are rebuilt at the end of the direct path load, UNIQUE constraints are verified and if a violation is detected, then the index will be left in an Index Unusable state.

Disabled constraints: CHECK constraints, Referential constraints (FOREIGN KEY)

You can override the automatic disabling of CHECK constraints by specifying the EVALUATE CHECK_CONSTRAINTS clause. SQL*Loader will then evaluate CHECK constraints during a direct path load. Any row that violates the CHECK constraint is rejected.

The following example shows the use of the EVALUATE CHECK_CONSTRAINTS clause in a SQL*Loader control file:

LOAD DATA

INFILE *

APPEND

INTO TABLE emp

EVALUATE CHECK_CONSTRAINTS

FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'

(c1 CHAR(10) ,c2)

BEGINDATA

Jones,10

Smith,20

Brown,30

Taylor,40

总的来说下面语法没什么用,不如手动reenable来的简单:

When the load completes, the integrity constraints will be reenabled automatically if the REENABLE clause is specified.

The syntax for the REENABLE clause is as follows:


 

The SQL*Loader log file describes the constraints that were disabled, the ones that were reenabled, and what error, if any, prevented reenabling or validating of each constraint. It also contains the name of the exceptions table specified for each loaded table.

If the REENABLE clause is not used, then the constraints must be reenabled manually, at which time all rows in the table are verified. If the Oracle database finds any errors in the new data, then error messages are produced. The names of violated constraints and the ROWIDs of the bad data are placed in an exceptions table, if one is specified.

If the REENABLE clause is used, then SQL*Loader automatically reenables the constraint and verifies all new rows. If no errors are found in the new data, then SQL*Loader automatically marks the constraint as validated. If any errors are found in the new data, then error messages are written to the log file and SQL*Loader marks the status of the constraint as ENABLE NOVALIDATE. The names of violated constraints and the ROWIDs of the bad data are placed in an exceptions table, if one is specified.

  1. Database Insert Triggers

Table insert triggers are also disabled when a direct path load begins. After the rows are loaded and indexes rebuilt, any triggers that were disabled are automatically reenabled. The log file lists all triggers that were disabled for the load. There should not be any errors reenabling triggers.

Unlike integrity constraints, insert triggers are not reapplied to the whole table when they are enabled. As a result, insert triggers do not fire for any rows loaded on the direct path. When using the direct path, the application must ensure that any behavior associated with insert triggers is carried out for the new rows.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值