Using the SQL*Loader Utility

最新推荐文章于 2022-07-22 10:52:10 发布

juliaputao

最新推荐文章于 2022-07-22 10:52:10 发布

阅读量978

点赞数

分类专栏： DATABASE

DATABASE 专栏收录该内容

33 篇文章 0 订阅

订阅专栏

Using the SQL*Loader Utility

The SQL*Loader utility, which comes with the Oracle database server, is commonly used by DBAs to load external data into an Oracle database. SQL*Loader is an immensely powerful tool that's capable of performing more than just a data load from text files. Here's a quick list of the SQL*Loader utility's capabilities:

You can use SQL*Loader to transform data before it's loaded into the database or during the data load itself (limited capabilities).
You can load data from multiple sources: disk, tape, and named pipes. You can also use multiple datafiles in the same loading session.
You can load data across a network.
You can selectively load from the input file based on conditions.
You can load all or part of a table. You can also load data into several tables simultaneously.
You can perform simultaneous data loads.
You can automate the load process, so it runs at scheduled times.
You can load complex object-relational data.

You can use the SQL*Loader utility to perform several types of data loading:

Conventional data loading: Under conventional data loading, SQL*Loader reads multiple rows at a time and stores them in a bind array. SQL*Loader subsequently inserts this whole array at once into the database and commits the operation.
Direct-path loading: The direct-path loading method doesn't use the SQL INSERT statement to load the data into Oracle tables. Column array structures are built from the data to be loaded, and these structures are used to format Oracle data blocks that are then written directly to the database tables.
External data loading: The new external tables feature of Oracle relies on the functionality of SQL*Loader to access data in external files as if it were part of the database tables. When you use the ORACLE_LOADER access driver to create an external table, you are basically using the SQL*Loader's functionality. In Oracle Database 11g, you can also use the new ORACLE_DATAPUMP access driver, which provides the ability to write to external tables.

The conventional and direct-path loading methods offer their own benefits and drawbacks. Because the direct-path loading method bypasses the Oracle SQL mechanism, it is much faster than the conventional loading method. However, when it comes to the data transformation capabilities, the conventional loading method is much more powerful than direct-path loading, because it allows a full range of functions to be applied to the table columns during the load. The direct-path loading method supports a far more limited number of transformations during the load. Oracle recommends that you use the conventional loading method for small data loads and the direct-path loading method for larger loads. You'll learn the specifics of direct-path loading after examining the main SQL*Loader features and using the conventional loading method. External data loading is covered in the "Using External Tables to Load Data" section later in this chapter.

Loading data using the SQL*Loader utility involves two main steps:

Select the datafile that contains the data you want to load. The datafile usually ends with the extension .dat and contains the data you want to load. The data could be in several formats.
Create a control file. The control file tells SQL*Loader how to map the data fields to an Oracle table and specifies whether the data needs to be transformed in some way. The control file usually ends with the extension .ctl.

The control file will provide the mapping of the table columns to the data fields in the input file. There is no requirement that you have a separate datafile for the load. If you wish, you can include the data in the control file itself, after you specify the load control information such as the field specification and so on. The data can be supplied in fixed-length fields or in free format, separated by a character such as a comma (,) or a pipe (|). Let's begin by studying the all-important SQL*Loader control file.

Exploring the SQL*Loader Control File

The SQL*Loader control file is a simple text file in which you specify important details about the data load job, such as the location of the source datafile. The control file is also the place where you map the datafiles to the table columns. You can also specify any transformation during the load process within the control file. The control file contains the names of the log files for the load and files for catching bad and rejected data. The control file instructs SQL*Loader regarding the following aspects of the SQL*Loader session:

The source of the data to be loaded into the database
The column specification of the target table
The nature of the input file formatting
The mapping of the input file fields to the table columns
Data transformation rules (applying SQL functions)
The locations for the log files and error files

Listing 13-1 shows a typical SQL*Loader control file. SQL*Loader considers data rows in the source datafiles to be records, and you can specify the record formats in the control file. Note that you can also use a separate file for the data. In this example, you see the control information followed by in-line data, as shown by the use of the INFILE * specification in the control file. This specification indicates that the data for the load will follow the control information for the load. If you are doing a onetime data load, it is probably better to keep things simple and place the data in the control file itself. The keyword BEGINDATA tells SQL*Loader where the data portion of the control file starts.

Listing 13-1: A Typical SQL*Loader Control File

LOAD DATA
INFILE *
BADFILE test.bad
DISCARDFILE test.dsc

INSERT
INTO TABLE tablename
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY""
(column1  POSITION (1:2) CHAR,
column2   POSITION (3:9) INTEGER EXTERNAL,
column3   POSITION (10:15) INTEGER EXTERNAL,
column4   POSITION (16:16) CHAR
)
BEGINDATA
AY3456789111111Y
/*   Rest of the data here . . .*/

The portion of the control file that describes the data fields is called the field list. In the control file in Listing 13-1, the field list is the following section:

(column1   POSITION (1:2) char,
 column2   POSITION (3:9) integer external,
 column3   POSITION (10:15) integer external,
 column4   POSITION (16:16) char
)

The field list shows the field names, position, data type, delimiters, and any applicable conditions.

You can specify numerous variables in the control file, and you can informally sort them into the following groups:

Loading-related clauses
Datafile-related clauses
Table- and field-mapping clauses
Command-line parameters in the control file

The following sections describe the parameters you can specify in the control file to configure your data loads.

Tip

If you aren't sure which parameters you can use for your SQL*Loader run, just type sqlldr at the operating system prompt to view all the available options. You will see a complete list of all the parameters and their operating system-specific default values (if any exist).

Loading-Related Clauses

The keywords LOAD DATA start off a control file. This simply means that the data is to be loaded from the input datafile to the Oracle tables using the SQL*Loader utility.

The INTO TABLE clause indicates into which table the data will be loaded. If you're loading into multiple tables simultaneously, you'll need an INTO TABLE statement for each table. The keywords INSERT, REPLACE, and APPEND instruct the database how the load will be done. If it is an INSERT, the table is assumed to be empty; otherwise, the loading process will generate an error and stop. The REPLACE clause will truncate the table and start loading new data. You'll often see that a load job using the REPLACE option seems to hang initially. This is because Oracle is busy truncating the table before it starts the load process. The APPEND clause will add the new rows to existing table data.

Datafile-Related Clauses

You can use several clauses to specify the locations and other characteristics of the datafile(s) from which you're going to load data using SQL*Loader. The following sections cover the important datafile-related clauses.

Datafile Specification

You specify the name and location of the input datafile by using the INFILE parameter:

INFILE='/a01/app/oracle/oradata/load/consumer.dat'

If you don't want to use the INFILE specification, you can include the data in the control file itself. When you include the data in the control file instead of a separate input file, you omit the file location and use the * notation, as follows:

INFILE = *

If you choose to have the data in the control file itself, you must use the BEGINDATA clause before your data starts:

BEGINDATA
Nicholas Alapati,243 New Highway,Irving,TX,75078
. . .

Physical and Logical Records

Every physical record in the source datafile is equivalent to a logical record by default, but the control file can specify that more than one physical record be combined into a single logical record. For example, in the following input file, three physical records are also considered three logical records:

Nicholas Alapati,243 New Highway,Irving,TX,75078
Shannon Wilson,1234 Elm Street,Fort Worth,TX,98765
Nina Alapati,2629 Skinner Drive,Flower Mound,TX,75028

You can transform these three physical records by using either of two parameters in the control file: the CONCATENATE clause or the CONTINUEIF clause.

If your input is in the fixed format, you can specify the number of rows of data to be read for each logical record in the following way:

CONCATENATE 4

This CONCATENATE clause will combine four rows of data. If each row of data has 80 characters, then the total number of characters in the new logical record that is created will be 320. Therefore, when you use the CONCATENATE clause, you should also specify a record length (RECLEN) clause along with it. In this case, the record length clause is as follows:

RECLEN 320

The CONTINUEIF clause lets you combine physical records into logical records by specifying one or more characters in a specified location. Here's an example:

CONTINUEIF THIS (1:4) = 'next'

In this line, the CONTINUEIF clause means that if SQL*Loader finds the four letters next at the beginning of a line, it should treat the data that follows as a continuation of the previous line (the four characters and the word next are arbitrary—continuation indicators can be any arbitrary characters).

If you are using fixed-format data, the CONTINUEIF character may be placed in the very last column, as shown in the following example:

CONTINUEIF LAST = '&'

This line means that if SQL*Loader encounters the ampersand (&) character at the end of a line, it will treat the following line as a continuation of the preceding line.

Note

Using either CONTINUEIF or CONCATENATE will slow down SQL*Loader, so map physical and logical records one to one. You should do this because when you join more than one physical record to make a single logical record, SQL*Loader must perform additional scanning of the input data, which takes more time.

Record Format

You may specify a record format in one of three ways:

Stream record format: This is the most common record format, which uses a record terminator to indicate the end of a record. When SQL*Loader scans the input file, it knows it has reached the end of a record when it encounters the terminator string. If no terminator string is specified, the last character defaults to a newline character or a linefeed (carriage return followed by a linefeed on Windows) character. The set of three records in the previous example uses this record format.
Variable record format: In this format, you explicitly specify the length at the beginning of the each record, as shown in the following example:
```
INFILE 'example1.dat' "var 2"
06sammyy12johnson,1234
```
This line contains two records: the first with six characters (sammyy) and the second with twelve characters (johnson,1234). var 2 indicates that the data records are of variable size, with record size indicators specified as a field of length 2, before every new record.
Fixed record format: In this format, you specify that all records are a specific fixed size. Here's an example, which specifies that every record is 12 bytes long:
```
INFILE 'example1.dat' "fix 12"
sammyy,1234, johnso,1234
```
Although at first glance in this example, the record seems to include the entire line (sammyy,1234, johnso,1234), the fix 12 specification means that there are actually two 12-byte records in this line. Thus, when you use the fixed record format, you may have several loader records on each line in your source datafile.

Table- and Field-Mapping Clauses

During a load session, SQL*Loader takes the data fields in the data records and converts them into table columns. The table- and field-mapping clauses pertain to the mapping process between data fields and table columns. The control file provides details about fields, including the column name, position, input record data types, delimiters, and data transformation parameters.

Table Column Name

Each column in the table is specified clearly, with the position and data type of the matching field value in the input file. You don't need to load all the columns in the table. If you omit any columns in the control file, they're set to null.

Position

SQL*Loader must have a way of knowing the location of the various fields in the input file. Oracle calls the individual items in the datafile fields, and there is no direct correspondence between these fields and the columns in the table in which you are loading the data. The process of mapping fields in the input datafile to the table columns in the database is called field setting, and it is the biggest contributor to CPU time taken during the load. The POSITION clause specifies exactly where in the data record the various fields are. You have two ways to specify the location of the fields: relative and absolute.

Relative position implies that you specify the position of a field with respect to the position of the preceding field, as shown in the following example:

employee_id  POSITION(*) NUMBER EXTERNAL 6
employee_name  POSITION(*) CHAR 30

In this example, the load starts with the first field, employee_id. SQL*Loader then expects employee_name to start in position 7 and continue for 30 characters. It will look for the next field starting at position 37, and so on.

When you use the POSITION clause in an absolute position sense, you just specify the position at which each field starts and ends, as follows:

employee_id POSITION(1:6) INTEGER EXTERNAL
employee_name POSITION(7:36) CHAR

Data Types

The data types used in the control file refer to the input records only and aren't the same as the column data types within the database tables. The following are the main data types used in SQL*Loader control files:

INTEGER(n)—binary integer, where n can be 1, 2, 4, or 8
SMALLINT
CHAR
INTEGER EXTERNAL
FLOAT EXTERNAL
DECIMAL EXTERNAL

Delimiters

After each column's data type is specified, you can specify a delimiter, which indicates how the field should be delimited. You can delimit data by using one of the following two clauses: TERMINATED BY or ENCLOSED BY.

TERMINATED BY limits the field to the character specified and denotes the end of a field. Here are a couple of examples:

TERMINATED BY WHITESPACE
TERMINATED BY ","

The first example indicates that the field is terminated by the first blank that is encountered. The second example simply indicates that the fields are separated by commas.

The ENCLOSED BY " " delimiter specifies that the field is enclosed by a pair of quotation marks. Here is an example:

FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'

Tip

Oracle recommends that you avoid delimited fields and choose positional fields (using the POSITION parameter) where possible. Choosing positional fields means that the database avoids scanning the datafile to find the delimiters you chose, thus reducing processing time.

Data Transformation Parameters

You can apply SQL functions to the field data before loading it into table columns. Only SQL functions that return single values can be used for transforming field values in general. The field should be denoted inside the SQL string as field_name. You specify the SQL function(s) after you specify the data type for the field, and you enclose the SQL string in double quotation marks, as shown in the following examples:

field_name  CHAR TERMINATED BY "," "SUBSTR(:field_name, 1, 10)"
employee_name POSITION 32-62 CHAR  "UPPER(:ename)"
salary position 75 CHAR "TO_NUMBER(:sal,'$99,999.99')"
commission INTEGER EXTERNAL "":commission * 100"

As you can see, the application of SQL operators and functions to field values before they are loaded into tables helps you transform the data at the same time you are loading it. This is a handy feature.

Command-Line Parameters in the Control File

SQL*Loader allows you to specify a number of runtime parameters at the command line when you invoke the SQL*Loader executable. Usually, you specify in the parameter file those parameters whose values remain the same across jobs. You can then use the command line to start the SQL*Loader job, either interactively or as part of a scheduled batch job. On the command line, you specify runtime-specific parameters, along with the control filename and location.

As an alternative, you may use the OPTIONS clause of the control file to specify runtime parameters inside the control file itself. You can always specify a number of runtime parameters while invoking SQL*Loader, but you're better off using the OPTIONS clause to specify them in the control file, if those parameters are something you'll repeat often. Using the OPTIONS clause comes in handy particularly if your SQL*Loader command-line specification is so long that it exceeds your operating system's maximum command-line size.

Note

Specifying a parameter on the command line will override the parameter's values inside a control file.

The following sections cover some of the important parameters you can control using the OPTIONS clause in the control file.

USERID

The USERID parameter specifies both the username and the password of the user in the database who has the privileges for the data load:

USERID = samalapati/sammyy1

CONTROL

The CONTROL parameter specifies the name of the control file for the SQL*Loader session. The control file may include the specifications for all the load parameters. Of course, you can load data using manual commands, but using a control file gives you more flexibility and enables the automation of the load process.

CONTROL = '/test01/app/oracle/oradata/load/finance.ctl'

DATA

The DATA parameter simply refers to the input datafile. The default filename extension is .dat. Note that the data doesn't necessarily need to be inside a separate datafile. If you wish, you can include the data at the end of the control file load specifications.

DATA = '/test02/app/oracle/oradata/load/finance.dat'

BINDSIZE and ROWS

You can use the two parameters BINDSIZE and ROWS to specify a conventional path bind array. SQL*Loader in the conventional path mode doesn't insert data into the table row by row. Rather, it inserts a set of rows at a time, and that set of rows, called the bind array, is sized based on either the BINDSIZE or ROWS parameter.

The BINDSIZE parameter sets the bind array size in bytes. On my system, the default bind size is 256,000 bytes.

BINDSIZE = 512000

The ROWS parameter does not set any limit on the number of bytes in the bind array. It imposes a limit on the number of rows in each bind array, and SQL*Loader multiplies this value in the ROWS parameter with its estimate of the size of each row in the table. The default number of rows under the conventional method on my system is 64.

ROWS = 64000

Note

If you specify both the BINDSIZE and ROWS parameters, SQL*Loader uses the smaller of the two values for the bind array.

DIRECT

If you specify DIRECT=true, SQL*Loader loads using the direct-path method instead of the conventional method. The default for this parameter is DIRECT=false, meaning the conventional method is the default method used.

ERRORS

The ERRORS parameter specifies the maximum number of errors that can occur before the SQL*Loader job is terminated. The default on most systems is 50. If you don't want to tolerate any errors, set this parameter to 0:

ERRORS = 0

LOAD

Using the LOAD parameter, you can set the maximum number of logical records to be loaded into the table. The default is to load all the records in the input datafile.

LOAD = 10000

LOG

The LOG parameter specifies the name of the log file. The SQL*Loader log file, as you'll see shortly, provides a lot of information about your SQL*Loader session.

LOG = '/u01/app/oracle/admin/finance/logs/financeload.log'

BAD

The BAD parameter specifies the name and location of the bad file. If any records are rejected due to data formatting errors, SQL*Loader will write the record to the bad file. For example, a field could exceed its specified length and be rejected by SQL*Loader. Note that besides the records rejected by SQL*Loader, other records may be rejected by the database. For example, if you try to insert rows with duplicate primary key values, the database will reject the insert. These records will be part of the bad file as well. If you don't explicitly name a bad file, Oracle will create one and use a default name with the control filename as a prefix.

BAD = '/u01/app/oracle/load/financeload.bad'

SILENT

By default, SQL*Loader displays feedback messages on the screen showing the load job's progress. You can turn off the display with the SILENT parameter. You can use several options with the SILENT parameter. For example, you can turn off all types of messages with the ALL option, as shown here:

SILENT = ALL

DISCARD and DISCARDMAX

The discard file contains all records rejected during the load because they didn't meet the record selection criteria you specified in the control file. The default is to not have a discard file. Oracle will create this file only if there are discarded records, and, even then, only if you explicitly specify the discard file in the control file. You use the DISCARD parameter to specify the name and location of the discard file.

DISCARD = 'test01/app/oracle/oradata/load/finance.dsc'

By default, SQL*Loader doesn't impose any limit on the number of records; therefore, all the logical records can be discarded. Using the DISCARDMAX parameter, you can set a limit on the number of records that can be discarded.

Tip

Both the bad and discard files contain records in the original format. Therefore, it's easy, especially during large loads, to just edit these files and use them for loading the data that was left out during the first load run.

PARALLEL

The PARALLEL parameter specifies whether SQL*Loader can run multiple sessions when you're employing the direct-path loading method.

sqlldr USERID=salapati/sammyy1 CONTROL=load1.ctl DIRECT=true PARALLEL=true

RESUMABLE

Using the RESUMABLE parameter, you can turn on Oracle's Resumable Space Allocation feature. This way, if a job encounters a space problem while loading the data, the load job is merely suspended. You can arrange for a notification about the job suspension and allocate more space so the job can continue without failing. The Resumable Space Allocation feature is discussed in Chapter 8. The default for the RESUMABLE parameter is false, meaning Resumable Space Allocation is disabled. Set RESUMABLE=true to enable this feature.

RESUMABLE_NAME

The RESUMABLE_NAME parameter enables you to identify a specific resumable load job when you use the Resumable Space Allocation feature. The default name is the combination of the username, session ID, and instance ID.

RESUMABLE_NAME = finance1_load

RESUMABLE_TIMEOUT

The RESUMABLE_TIMEOUT parameter can be set only when the RESUMABLE parameter is set to true. The timeout is the maximum length of time for which an operation can be suspended when it runs into a space-related problem. If the space-related problem is not fixed within this interval, the operation will be aborted. The default is 7,200 seconds.

RESUMABLE_TIMEOUT = 3600

SKIP

The SKIP parameter is very useful in situations where your SQL*Loader job fails to complete its run due to some errors, but it has already committed some rows. The SKIP parameter lets you skip a specified number of records in the input file when you run the SQL*Loader job the second time. The alternative is to truncate the table and restart the SQL*Loader job from the beginning—not a great idea if a large number of rows have already been loaded into the database tables.

SKIP = 235550

This example assumes the first job failed after loading 235,549 records successfully. You can find out this information from the log file for the load, or query the table directly.

Generating Data During the Load

The SQL*Loader utility enables you to generate data to load columns. This means that you can do a load without ever using a datafile. More commonly, however, you generate data for one or more columns of the data when you are loading from a datafile. The following types of data can be generated by SQL*Loader:

Constant value: You can set a column to a constant value by using the CONSTANT specification. For example, with the following specification, all the rows populated during this run will have the value sysadm in the loaded_by column:
```
loaded_by     CONSTANT  "sysadm"
```
Expression value: You can set a column to the value specified by a SQL operator or a PL/SQL function. You specify the expression value using the EXPRESSION parameter, as shown here:
```
column_name EXPRESSION "SQL string"
```
Datafile record number: You can set a column's value to the record number that loaded that row by using the RECNUM column specification:
```
record_num RECNUM
```
System date: You can use the sysdate variable to set a column to the date you're loading the data:
```
loaded_date     sysdate
```
Sequence: You can generate unique values to load a column by using the SEQUENCE function. In the following example, the current maximum value of the loadseq sequence is incremented by one each time a row is inserted:
```
loadseq SEQUENCE(max,1)
```

Invoking SQL*Loader

You can invoke the SQL*Loader utility in a number of ways. The standard syntax for invoking the SQL*Loader is as follows:

SQLLDR keyword=value [,keyword=value,. . .]

Here's an example showing how to invoke the SQL*Loader:

$ sqlldr USERID=nicholas/nicholas1 CONTROL=/u01/app/oracle/finance/finance.ctl \
DATA=/u01/app/oracle/oradata/load/finance.dat \
LOG=/u01/aapp/oracle/finance/log/finance.log \
ERRORS=0 DIRECT=true SKIP=235550 RESUMABLE=true RESUMABLE_TIMEOUT=7200

Note

In the command-line invocation of the SQL*Loader utility, the backslash (\) at the end of each line simply indicates that the command continues on the next line. You can specify a command-line parameter with the parameter name itself or by position. For example, the username/password specification always follows the keyword sqlldr. If you ignore a parameter, Oracle will use the default value for that parameter. You can optionally use a comma after each parameter.

As you can see, the more parameters you want to use, the more information you need to provide at the command line. This approach presents two problems. First, if you make typing or other errors, you'll have a mess on your hands. Second, there may be a limit on some operating systems regarding how many characters you can input at the command prompt. Fortunately, you can run the same SQL*Loader job with the following command, which is a lot less complicated:

$ sqlldr PARFILE=/u01/app/oracle/admin/finance/load/finance.par

The command-line parameter PARFILE stands for parameter file, which is a file in which you can specify values for all your command parameters. For example, for the load specifications shown in this chapter, the parameter file looks like this:

USERID=nicholas/nicholas1
CONTROL='/u01/app/oracle/admin/finance/finance.ctl'
DATA='/app/oracle/oradata/load/finance.dat'
LOG='/u01/aapp/oracle/admin/finance/log/finance.log'
ERRORS=0
DIRECT=true
SKIP=235550
RESUMABLE=true
RESUMABLE_TIMEOUT=7200

Using the parameter file is more elegant than typing all the parameters at the command line, and it is a logical approach when you need to regularly run jobs that use the same options. Any option that you specify at the command line will override the value specified for that parameter inside a parameter file.

If you want to use the command line, but you don't want to type the password where someone can easily see it, you can invoke SQL*Loader in the following manner:

$ sqlldr CONTROL=control.ctl

SQL*Loader will then prompt you for your username/password combination.

Exploring the Loader Log File

The SQL*Loader log file offers a host of information regarding a SQL*Loader run. It tells you how many records were supposed to be loaded and how many actually were loaded. It tells you which records failed to get loaded and why. It also describes the field columns provided in the SQL*Loader control file. Listing 13-2 shows a typical SQL*Loader log file.

Listing 13-2: A Typical SQL*Loader Log File

SQL*Loader: Release 11.1.0.0.0 - Production on Sun Aug 24 14:04:26 2008
Control File:   /u01/app/oracle/admin/fnfactsp/load/test.ctl
 Data File:      /u01/app/oracle/admin/fnfactsp/load/test.ctl
 Bad File:     /u01/app/oracle/admin/fnfactsp/load/test.badl
Discard File:  none specified
  (Allow all discards)
Number to load: ALL
Number to skip: 0
Errors allowed: 0
Bind array:     64 rows, maximum of 65536 bytes
Continuation:    none specified
Path used:      Conventional
Table TBLSTAGE1, loaded when ACTIVITY_TYPE != 0X48(character 'H')
                  and ACTIVITY_TYPE != 0X54(character 'T')
Insert option in effect for this table: APPEND
TRAILING NULLCOLS option in effect
   Column Name               Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- -----
COUNCIL_NUMBER                  FIRST     *   ,       CHARACTER
COMPANY                          NEXT     *   ,       CHARACTER
ACTIVITY_TYPE                    NEXT     *   ,       CHARACTER
RECORD_NUMBER                    NEXT     *   ,       CHARACTER
FUND_NUMBER                      NEXT     *   ,       CHARACTER
BASE_ACCOUNT_NUMBER              NEXT     *   ,       CHARACTER
FUNCTIONAL_CODE                  NEXT     *   ,       CHARACTER
DEFERRED_STATUS                  NEXT     *   ,       CHARACTER
CLASS                            NEXT     *   ,       CHARACTER
UPDATE_DATE                                             SYSDATE
UPDATED_BY                                             CONSTANT
    Value is 'sysadm'
BATCH_LOADED_BY                                        CONSTANT
    Value is 'sysadm'
/*Discarded Records Section: Gives the complete list of discarded
records, including reasons why they were discarded.*/

Record 1: Discarded - failed all WHEN clauses.
Record 1527: Discarded - failed all WHEN clauses.
Table TBLSTAGE1:
/*Number of Rows: Gives the number of rows
 successfully loaded and the number of rows not
 loaded due to errors or because they failed the
 WHEN conditions, if any. Here, two records failed the WHEN condition*/
  1525 Rows successfully loaded.
  0 Rows not loaded due to data errors.
  2 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.

/* Memory Section: Gives the bind array size chosen for the data load*/
Space allocated for bind array:                  99072 bytes(64 rows)
Read   buffer bytes: 1048576
/* Logical Records Section: Gives the total records, number of rejected
 and discarded records.*/
Total logical records skipped:          0
Total logical records read:          1527
Total logical records rejected:         0
Total logical records discarded:      2
/*Date Section: Gives the day and date of the data load.*/
Run began on Sun Mar 06 14:04:26 2009
Run ended on Sun Mar 06 14:04:27 2009
/*Time section: Gives the time taken for completing the data load.*/
Elapsed time was:     00:00:01.01
CPU time was:         00:00:00.27

When you examine the log file, focus on the total logical records read and the records that are skipped, rejected, and discarded. When you encounter difficulty running a job, the log file is the first place you should look to see whether or not the data records are being loaded.

Using Return Codes

The log file provides a wealth of information about the load, but Oracle also allows you to trap the exit code after each load run. This enables you to check the results of the load when you run it through a cron job or a shell script. For a Windows server, you may use the at command to schedule the load job. Here are the key exit codes for the UNIX/Linux operating systems:

EX_SUCC 0 indicates that all the rows were loaded successfully.
EX_FAIL 1 indicates that there were command-line or syntax errors.
EX_WARN 2 indicates that some or all rows were rejected.
EX_FTL 3 indicates operating system errors.

Using the Direct-Path Loading Method

So far, you have looked at the SQL*Loader utility from the point of view of a conventional load. As you recall, the conventional loading method uses SQL INSERT statements to insert data into tables one bind array size at a time. The direct-path loading option doesn't use the SQL INSERT statement to put data into tables; rather, it formats Oracle data blocks and writes them directly to the database files. This direct-write process eliminates much of the overhead involved in executing SQL statements to load tables. Since the direct-path loading method doesn't contend for database resources, it will load data much faster than a conventional data load. For larger data loads, the direct-path loading method works best, and it may be the only viable method of loading data into tables for the simple reason that a conventional load may require more time than is available.

Besides the obvious advantages of a shorter load time, direct loading also helps you rebuild indexes and presort table data. Using the direct-path loading method as opposed to the conventional loading method has the following advantages:

The load is much faster than in the conventional loading method because you aren't using SQL INSERT statements for the load.
The direct load uses multiblock asynchronous I/O for database writes, so the writing is fast.
You have the option of presorting data using efficient sorting routines with the direct load.
By setting the UNRECOVERABLE=Y parameter, you can avoid the writing of any redo data during a direct load.
By using temporary storage, you can build indexes more efficiently during a direct load than when you're using the conventional load method.

Note

A conventional load will always generate redo entries, whereas the direct-path loading method will generate such entries only under specific conditions. A direct load also won't fire any insert triggers, unlike the conventional load, which fires the triggers during the load. Users can't make any changes when a table is being loaded using a direct load, unlike in a conventional load.

However, direct-path loads have some serious limitations. You can't use this method under the following conditions:

You're using clustered tables.
You're loading parent and child tables together.
You're loading VARRAY or BFILE columns.
You're loading across heterogeneous platforms using Oracle Net.
You want to apply SQL functions during the load.

Note

In a direct load, you can't use any SQL functions. If you need to perform a large data load and also transform the data during the load, you have a problem. The conventional data load will let you use SQL functions to transform data, but the method is very slow compared to the direct load. Thus, for large data loads, you may want to consider using one of the newer load/transform techniques, such as external tables or table functions.

Direct Load Options

Several SQL*Loader options are intended especially for use with the direct load option or are more significant for direct loads than conventional loads. The following options are relevant to the direct-path loading method:

DIRECT: The DIRECT clause must be set to true in order for you to use the direct-path loading method (DIRECT=true).
DATA_CACHE: The DATA_CACHE parameter comes in handy if you're loading the same data or time-stamp values several times during a direct load. SQL*Loader has to convert the date and timestamp data each time it encounters them. If you have duplicate data and timestamp values in your data, you can reduce unnecessary data conversions, and thus processing time, by specifying the DATA_CACHE parameter. By default, the DATA_CACHE parameter is enabled for 1,000 values. If you don't have duplicate date and timestamp values in your data, or if there are few such duplicates, you can disable the DATA_CACHE parameter by setting it to zero (DATA_CACHE=0).
ROWS: The ROWS parameter is crucial because you can use it to specify how many rows SQL*Loader will read from the input datafile before saving the insertions to the tables. You use the ROWS parameter to set the ceiling on the amount of data lost if the instance fails during a long SQL*Loader run. When SQL*Loader reads the number of rows specified in the ROWS parameter, it will stop loading data until all of the data buffers are successfully written to the datafiles. This process is called a data save. For example, if SQL*Loader can load about 10,000 rows per minute, setting ROWS=150000 saves the data every 15 minutes.
UNRECOVERABLE: If you want to minimize the use of the redo log, you can do so by using the UNRECOVERABLE parameter during a direct load (specify UNRECOVERABLE in the control file).
SKIP_INDEX_MAINTENANCE: The SKIP_INDEX_MAINTENANCE parameter, when turned on (SKIP_INDEX_MAINTENANCE=true), instructs SQL*Loader not to bother maintaining the indexes during the load. The default for SKIP_INDEX_MAINTENANCE is false.
SKIP_UNUSABLE_INDEXES: Setting a value of true for the SKIP_UNUSABLE_INDEXES parameter will ensure that SQL*Loader will load tables with indexes in an unusable state. These indexes won't be maintained by SQL*Loader, however. The default for this parameter is based on the setting for the SKIP_UNUSABLE_INDEXES initialization parameter, whose default value is true.
SORTED_INDEXES: The SORTED_INDEXES parameter signals SQL*Loader that data is sorted on a specified index, which helps improve load performance.
COLUMNARRAYROWS: This parameter determines the number of rows loaded before the building of the stream buffer. For example, COLUMNARRAYROWS=100000 loads 100,000 rows first. The size of the direct-path column array is thus determined by this parameter. The default value for this parameter on my UNIX server is 5,000.
STREAMSIZE: The STREAMSIZE parameter lets you set the size of the stream buffer. The default on my server, for example, is 256,000, and I can increase it using the STREAMSIZE parameter; for example, STREAMSIZE=512000.
MULTITHREADING: Under MULTITHREADING, the conversion of column arrays to stream buffers and stream buffer loading are performed in parallel. On machines with multiple CPUs, by default, multithreading is turned on (true). If you wish, you can turn it off by setting MULTITHREADING=false.

Direct Loads and Constraint/Trigger Management

The direct-path loading method inserts data directly into the datafiles by formatting the data blocks. By bypassing the INSERT statement mechanism, the table constraints and triggers aren't systematically applied during a direct load. All triggers are disabled, as are several integrity constraints. SQL*Loader automatically disables all foreign keys and check constraints, but the not null, unique, and primary key constraints are still maintained by SQL*Loader. Upon completion of the SQL*Loader run, the disabled constraints are automatically enabled by SQL*Loader if the REENABLE clause has been specified. Otherwise, the disabled constraints must be manually reenabled. The disabled triggers are automatically enabled after the load is completed.

Tips for Optimal Use of SQL*Loader

The following tips will help you optimize SQL*Loader during data loads, especially when the data loads are large and/or you have multiple indexes and constraints on the tables in your database.

Try to use the direct-path loading method as much as possible. It's much faster than conventional data loading.
Use the UNRECOVERABLE=true option wherever possible (in direct loads). This will save you considerable time, because the newly loaded data doesn't need to be logged in the redo log file. Media recovery is still in force for all the other users of the database, and you can always start a new SQL*Loader run if there's a problem.
Keep the use of the NULLIF and DEFAULTIF parameters to a minimum. These clauses must be tested for every single row on which they're used.
Minimize the number of data type and character set conversions, as they slow down processing.
Wherever possible, use positional fields rather than delimited fields. SQL*Loader can move from field to field much faster if it's given the position of the field.
Map physical and logical records on a one-to-one basis.
Disable constraints before the load, as the constraints will slow down the loading. Of course, you may sometimes end up with errors while enabling the constraints, but it's a small price to pay for a much faster data load, especially for large tables.
If you're using the direct-path loading method, specify the SORTED_INDEXES clause to optimize the load performance.
If you're doing large data loads, it's smart to drop the indexes on the tables before the load. Index maintenance will slow down your SQL*Loader session. If it isn't possible to drop the indexes, you can make them unusable and use the SKIP_UNUSABLE_INDEXES clause during the load. If it's a direct load, use the SKIP_INDEX_MAINTENANCE clause.

Some Useful SQL*Loader Data-Loading Techniques

Using SQL*Loader is efficient, but it's not without its share of headaches. This section describes how to perform some special types of operations during data loads.

Using the WHEN Clause During Loads

You can use WHEN clauses during data loads to limit the load to only those rows that match certain conditions. For example, in a datafile, you can pick up only those records that have a field matching certain criteria. Here's an example that shows how to use the WHEN clause in a SQL*Loader control file:

LOAD DATA
INFILE *
INTO TABLE stagetbl
APPEND
 WHEN (activity_type <>'H') and (activity_type <>'T')
FIELDS TERMINATED BY ','

TRAILING NULLCOLS
/* Table columns here . . .*/
BEGINDATA
/* Data here . . .*/

The WHEN condition will reject all records where the data record field matching the activity_type column in the stagetbl table is neither H nor T.

Loading the Username into a Table

You can use the user pseudo-variable to load the username into a table during the load. The following example illustrates the use of this variable. Note that the target table stagetbl should have a column called loaded_by so SQL*Loader can insert the username into that column.

LOAD DATA
INFILE *
INTO TABLE stagetbl
INSERT
(loaded_by   "USER")
/* Table columns and the data follow . . .*/

Loading Large Data Fields into a Table

If you try to load any field larger than 255 bytes into a table, even if the table column is defined as VARCHAR2(2000) or a CLOB, SQL*Loader won't be able to load the data. You'll get an error informing you that the "Field in datafile exceeds maximum length." To manage the load of the large field, you need to specify the size of the table column in the control file when you're matching table columns to the data fields, as in this example (for a table column called text):

LOAD DATA
INFILE '/u01/app/oracle/oradata/load/testload.txt'
INSERT INTO TABLE test123
FIELDS TERMINATED BY ','
(text CHAR(2000))

Loading a Sequence Number into a Table

Suppose you have a sequence named test_seq, and you want this sequence to be incremented each time you load a data record into your table. Here's how to do it:

LOAD DATA
INFILE '/u01/app/oracle/oradata/load/testload.txt'
INSERT INTO TABLE test123
   (test_seq.nextval,. . .)

Loading Data from a Table into an ASCII File

You may sometimes want to get data out of the database table into flat files; for example, to later use this data to load data into Oracle tables in a different location. You can write complex scripts to do the job if there are a number of tables, but if there are few tables to load, you can use the following simple method of extracting data using SQL*Plus commands:

SET TERMOUT OFF
SET PAGESIZE 0
SET ECHO OFF
SET FEED OFF

SET HEAD OFF
SET LINESIZE 100
COLUMN  customer_id FORMAT 999,999
COLUMN first_name FORMAT a15
COLUMN last_name FORMAT a25
SPOOL test.txt
SELECT customer_id,first_name,last_name FROM customer;
SPOOL OFF

You may also use the UTL_FILE package to load data into text files.

Dropping Indexes Before Bulk Data Loads

There are two major reasons why you should seriously consider dropping indexes on a large table before performing a direct-path load using the NOLOGGING option. First, it may take you a longer time to do the load with the indexes included with the table data. Second, if you leave indexes on, there will be redo records generated by the changes that will be made to the index structure during the load.

Tip

Even if you choose to load data using the NOLOGGING option, there will be considerable redo generated to mark the changes being made to the indexes. In addition, there will always be some redo to support the data dictionary, even during a NOLOGGING data load operation. The best strategy here is to drop the indexes and rebuild them after the tables are created.

While you're performing a direct load, the instance may fail halfway through, SQL*Loader may run out of space that it needs to update the index, or SQL*Loader may encounter duplicate values for the index keys. This situation is referred to as the indexes left unusable condition, as the indexes will be unusable upon instance recovery. In such cases, it may be better to create the indexes after the load is complete.

Loading into Multiple Tables

You can use the same SQL*Loader run to load into multiple tables. Here's an example that shows how to load data into two tables simultaneously:

LOAD DATA
INFILE *
INSERT
INTO TABLE dept
 WHEN recid = 1
 (recid  FILLER POSITION(1:1) INTEGER EXTERNAL,
 deptno POSITION(3:4) INTEGER EXTERNAL,
 dname  POSITION(8:21) CHAR)
INTO TABLE emp
 WHEN recid <> 1
 (recid  FILLER POSITION(1:1) INTEGER EXTERNAL,
 empno  POSITION(3:6) INTEGER EXTERNAL,
 ename  POSITION(8:17) CHAR,
 deptno POSITION(19:20) INTEGER EXTERNAL)

In the preceding example, data from the same datafile is simultaneously loaded into two tables, dept and emp, based on whether or not the recid field value is 1.

Trapping Error Codes from SQL*Loader

Here's a simple example of how you can trap the process error codes issued by SQL*Loader:

$ sqlldr PARFILE=test.par
retcode=$?
if [[retcode !=2 ]]
 then
 mv ${ImpDir}/${Fil} ${InvalidLoadDir}/.${Dstamp}.${Fil}
 writeLog $func "Load Error" "load error:${retcode} on file ${Fil}"
 else
 sqlplus /  ___EOF
 /* You can place any SQL statements to process the successfully loaded data */
___EOF

Loading XML Data into an Oracle XML Database

SQL*Loader supports the XML data type for columns. If a column is of this type, you can use SQL*Loader to load the XML data into a table. SQL*Loader treats the XML columns as CLOBs. Oracle also lets you load the XML data either from a primary datafile or from an external LOB file. You can use fixed-length fields or delimited fields. The contents of the entire file could also be read into a single LOB field.

juliaputao

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Using the SQL*Loader Utility

Using the SQL*Loader UtilityThe SQL*Loader utility, which comes with the Oracle database server, is commonly used by DBAs to load external data into an Oracle database. SQL*Loader is an immensely
复制链接

扫一扫

专栏目录

Using the SQL*Loader Utility

Using the SQL*Loader Utility

Exploring the SQL*Loader Control File

Loading-Related Clauses

Datafile-Related Clauses

Datafile Specification

Physical and Logical Records

Record Format

Table- and Field-Mapping Clauses

Table Column Name

Position

Data Types

Delimiters

Data Transformation Parameters

Command-Line Parameters in the Control File

USERID

CONTROL

DATA

BINDSIZE and ROWS

DIRECT

ERRORS

LOAD

LOG

BAD

SILENT

DISCARD and DISCARDMAX

PARALLEL

RESUMABLE

RESUMABLE_NAME

RESUMABLE_TIMEOUT

SKIP

Generating Data During the Load

Invoking SQL*Loader

Exploring the Loader Log File

Using Return Codes

Using the Direct-Path Loading Method

Direct Load Options

Direct Loads and Constraint/Trigger Management

Some Useful SQL*Loader Data-Loading Techniques

Using the WHEN Clause During Loads

Loading the Username into a Table

Loading Large Data Fields into a Table

Loading a Sequence Number into a Table

Loading Data from a Table into an ASCII File

Dropping Indexes Before Bulk Data Loads

Loading into Multiple Tables

Trapping Error Codes from SQL*Loader

Loading XML Data into an Oracle XML Database

“相关推荐”对你有帮助么？