Maximazing SQL*Loader Performance
· Use Direct Path Loads - The conventional path loader essentially loads the data by using standard insert statements. The direct path loader (direct=true) loads directly into the Oracle data files and creates blocks in Oracle database block format. There are certain cases, however, in which direct path loads cannot be used (clustered tables). To prepare the database for direct path loads, the script $ORACLE_HOME/rdbms/admin/catldr.sql.sql must be executed.
· Disable Indexes and Constraints. For conventional data loads only, the disabling of indexes and constraints can greatly enhance the performance.
· Use a Larger Bind Array. For conventional data loads only, larger bind arrays limit the number of calls to the database and increase performance. The size of the bind array is specified using the bindsize parameter. The bind array's size is equivalent to the number of rows it contains (rows=) times the maximum length of each row.
· Use ROWS=n to Commit Less Frequently. For conventional data loads only, the rows parameter specifies the number of rows per commit. Issuing fewer commits will enhance performance.
· Use Parallel Loads. Available with direct path data loads only, this option allows multiple SQL*Loader jobs to execute concurrently.
$ sqlldr control=first.ctl parallel=true direct=true
$ sqlldr control=second.ctl parallel=true direct=true
· Use Fixed Width Data. Fixed width data format saves Oracle some processing when parsing the data. The savings can be tremendous.
· Disable Archiving During Load. While this may not be feasible in certain environments, disabling database archiving can increase performance considerably.
· Use unrecoverable. The unrecoverable option (unrecoverable load data) disables the writing of the data to the redo logs. This option is available for direct path loads only.
Using the table table_with_one_million_rows, the following benchmark tests were performed with the various SQL*Loader options. The table was truncated after each test.
SQL*Loader Option | Elapsed Time (Seconds) | Time Reduction |
direct=false | 135 | - |
direct=false | 92 | 32% |
direct=false | 85 | 37% |
direct=true | 47 | 65% |
direct=true | 41 | 70% |
direct=true | 41 | 70% |
The results above indicate that conventional path loads take the longest. However, the bindsize and rows parameters can aid the performance under these loads. The test involving the conventional load didn’t come close to the performance of the direct path load with the unrecoverable option specified.
It is also worth noting that the fastest import time achieved for this table (earlier) was 67 seconds, compared to 41 for SQL*Loader direct path – a 39% reduction in execution time. This proves that SQL*Loader can load the same data faster than import. These tests did not compensate for indexes. All database load operations will execute faster when indexes are disabled.
SQL*Loader Control File
The control file is a text file written in a language that SQL*Loader understands. The control file describes the task that the SQL*Loader is to carry out. The control file tells SQL*Loader where to find the data, how to parse and interpret the data, where to insert the data, and more. See Chapter 4, "SQL*Loader Case Studies" for example control files.
Although not precisely defined, a control file can be said to have three sections:
- The first section contains session-wide information, for example:
- global options such as bindsize, rows, records to skip, etc.
- INFILE clauses to specify where the input data is located
- data character set specification
- The second section consists of one or more "INTO TABLE" blocks. Each of these blocks contains information about the table into which the data is to be loadedsuch as the table name and the columns of the table.
- The third section is optional and, if present, contains input data.
Case 1: Loading Variable-Length Data
Loads stream format records in which the fields are delimited by commas and may be enclosed by quotation marks. The data is found at the end of the control file.
Case 2: Loading Fixed-Format Fields:
Loads a datafile with fixed-length fields, stream-format records, all records the same length.
Case 3: Loading a Delimited, Free-Format File
Loads data from stream format records with delimited fields and sequence numbers. The data is found at the end of the control file.
Case 4: Loading Combined Physical Records
Combines multiple physical records into one logical record corresponding to one database row
Case 5: Loading Data into Multiple Tables
Loads data into multiple tables in one run
Case 6: Loading Using the Direct Path Load Method
Loads data using the direct path load method
Case 7: Extracting Data from a Formatted Report
Extracts data from a formatted report
Case 8: Loading Partitioned Tables
Loads partitioned tables.
Case 9: Loading LOBFILEs (CLOBs)
Adds a CLOB column called RESUME to the table emp, uses a FILLER field (RES_FILE), and loads multiple LOBFILEs into the emp table.
Case 10: How to use TRIM, TO_NUMBER, TO_CHAR, User Defined Functions with SQL*Loader
How to use the functions TRIM, TO_CHAR/TO_NUMBER, and user defined functions in connection with SQL*Loader
Case 11: Calling Stored Functions
How to call a Function from SQL*Loader
OPTIONS Clause
Continue Interrupted Load
Identifying Data Files
Loading into Non-Empty Tables
Loading into Multiple Tables
Case 1: Loading Variable-Length Data
- A simple control file identifying one table and three columns to be loaded.
- Including data to be loaded from the control file itself, so there is no separate datafile.
- Loading data in stream format, with both types of delimited fields -- terminated and enclosed.
- The LOAD DATA statement is required at the beginning of the control file.
- INFILE * specifies that the data is found in the control file and not in an external file.
- The INTO TABLE statement is required to identify the table to be loaded (DEPT) into. By default, SQL*Loader requires the table to be empty before it inserts any records.
- FIELDS TERMINATED BY specifies that the data is terminated by commas, but may also be enclosed by quotation marks. Datatypes for all fields default to CHAR.
- Specifies that the names of columns to load are enclosed in parentheses. Since no datatype is specified, the default is a character of length 255.
BEGINDATA specifies the beginning of the data.
Case 2: Loading Fixed-Format Fields
- A separate datafile.
- Data conversions.
6) modifieddate "SYSDATE",
7) customerid constant "0"
)
- The LOAD DATA statement is required at the beginning of the control file.
- The name of the file containing data follows the keyword INFILE.
- The INTO TABLE statement is required to identify the table to be loaded into.
- Lines 4 and 5 identify a column name and the location of the data in the datafile to be loaded into that column. EMPNO, ENAME, JOB, and so on are names of columns in table EMP. The datatypes (INTEGER EXTERNAL, CHAR, DECIMAL EXTERNAL) identify the datatype of data fields in the file, not of corresponding columns in the EMP table.
- Note that the set of column specifications is enclosed in parentheses.
- This statement let me insert the current sysdate in this field
- This statement let me put a constant value
- Loading data (enclosed and terminated) in stream format
- Loading dates using the datatype DATE
- Using SEQUENCE numbers to generate unique keys for loaded data
- Using APPEND to indicate that the table need not be empty before inserting new records
- Using Comments in the control file set off by double dashes
- Overriding general specifications with declarations for individual fields
- Comments may appear anywhere in the command lines of the file, but they should not appear in data. They are preceded with a double dash that may appear anywhere on a line.
- INFILE * specifies that the data is found at the end of the control file.
- Specifies that the data can be loaded even if the table already contains rows. That is, the table need not be empty.
- The default terminator for the data fields is a comma, and some fields may be enclosed by double quotation marks (").
- The data to be loaded into column HIREDATE appears in the format DD-Month-YYYY. The length of the date field is a maximum of 20. If a length is not specified, the length is a maximum of 20. If a length is not specified, then the length depends on the length of the date mask.
- The SEQUENCE function generates a unique value in the column LOADSEQ. This function finds the current maximum value in column LOADSEQ and adds the increment (1) to it to obtain the value for LOADSEQ for each row inserted.
- BEGINDATA specifies the end of the control information and the beginning of the data.
- Although each physical record equals one logical record, the fields vary in length so that some records are longer than others. Note also that several rows have null values for COMM.
- Combining multiple physical records to form one logical record with CONTINUEIF
- Inserting negative numbers.
- Indicating with REPLACE that the table should be emptied before the new data is inserted
- Specifying a discard file in the control file using DISCARDFILE
- Specifying a maximum number of discards using DISCARDMAX
- Rejecting records due to duplicate values in a unique index or due to invalid data values
- DISCARDFILE specifies a discard file named ULCASE4.DSC.
- DISCARDMAX specifies a maximum of 999 discards allowed before terminating the run (for all practical purposes, this allows all discards).
- REPLACE specifies that if there is data in the table being loaded, then SQL*Loader should delete that data before loading new data.
- CONTINUEIF THIS specifies that if an asterisk is found in column 1 of the current record, then the next physical record after that record should be appended to it to from the logical record. Note that column 1 in each physical record should then contain either an asterisk or a non-data value.
- Loading multiple tables
- Using SQL*Loader to break down repeating groups in a flat file and load the data into normalized tables -- one file record may generate multiple database rows
- Deriving multiple logical records from each physical record
- Using a WHEN clause
- Loading the same field (EMPNO) into multiple tables
- REPLACE specifies that if there is data in the tables to be loaded (EMP and PROJ), SQL*loader should delete the data before loading new rows.
- Multiple INTO clauses load two tables, EMP and PROJ. The same set of records is processed three times, using different combinations of columns each time to load table PROJ.
- WHEN loads only rows with non-blank project numbers. When PROJNO is defined as columns 25...27, rows are inserted into PROJ only if there is a value in those columns.
- When PROJNO is defined as columns 29...31, rows are inserted into PROJ only if there is a value in those columns.
- When PROJNO is defined as columns 33...35, rows are inserted into PROJ only if there is a value in those columns.
Case 6: Loading Using the direct Path Load Method
- Use of the direct path load method to load and index data
- How to specify the indexes for which the data is pre-sorted.
- Loading all-blank numeric fields as null
- The NULLIF clause
- Note: Specify the name of the table into which you want to load data; otherwise, you will see LDR-927. Specifying DIRECT=TRUE as a command-line parameter is not an option when loading into a synonym for a table.
- The SORTED INDEXES clause identifies indexes:presorting data:case study the indexes on which the data is sorted. This clause indicates that the datafile is sorted on the columns in the EMPIX index. This clause allows SQL*Loader to optimize index creation by eliminating the sort phase for this data when using the direct path load method.
- The NULLIF...BLANKS clause specifies that the column should be loaded as NULL if the field in the datafile consists of all blanks
- Using SQL*Loader with an INSERT trigger
- Use of the SQL string to manipulate data
- Different initial and trailing delimiters
- Use of SYSDATE
- Use of the TRAILING NULLCOLS clause
- Ambiguous field length warnings
- The decimal point in column 57 (the salary field) identifies a line with data on it. All other lines in the report are discarded.
- The TRAILING NULLCOLS clause causes SQL*Loader to treat any fields that are missing at the end of a record as null. Because the commission field is not present for every record, this clause says to load a null commission instead of rejecting the record when only six fields are found instead of the expected seven.
- Employee's hire date is filled in using the current system date.
- This specification generates a warning message because the specified length does not agree with the length determined by the field's position. The specified length (3) is used.
- Because the report only shows department number, job, and manager when the value changes, these fields may be blank. This control file causes them to be loaded as null, and an RDBMS insert trigger fills in the last valid value.
- The SQL string changes the job name to uppercase letters.
- It is necessary to specify starting position here. If the job field and the manager field were both blank, then the job field's TERMINATED BY BLANKS clause would cause SQL*Loader to scan forward to the employee name field. Without the POSITION clause, the employee name field would be mistakenly interpreted as the manager field.
- Here, the SQL string translates the field from a formatted character string into a number. The numeric value takes less space and can be printed with a variety of formatting options.
- In this case, different initial and trailing delimiters pick the numeric value out of a formatted field. The SQL string then converts the value to its stored form.
- Partitioning of data
- Explicitly defined field positions and datatypes.
- Loading using the fixed record length option
- Specifies that each record in the datafile is of fixed length (129 characters in this example). See Input Data and Datafiles.
- Identifies the column name and location of the data in the datafile to be loaded into each column.
)
Case 9: Loading LOB Files (CLOBs)
- Adding a CLOB column called RESUME to the table emp.
- Using a FILLER field (RES_FILE).
- Loading multiple LOBFILEs into the emp table.
- This is a filler field. The filler field is assigned values from the datafield to which it is mapped.
- RESUME is loaded as a CLOB. The LOBFILE function is used to specify the name of the field that specifies name of the file which contains the data for the LOB field.
-- Create the table
DROP TABLE TEST;
CREATE TABLE TEST
(
ID NUMBER PRIMARY KEY,
FIRST_NAME VARCHAR2(20),
LAST_NAME VARCHAR2(20) NOT NULL,
DEPARTMENT VARCHAR2(20) NOT NULL,
SALARY NUMBER(10,2) NOT NULL,
BONUS NUMBER(10,2),
DESCRIPTION VARCHAR2(50),
TOTAL NUMBER(10,2)
);
-- Create the user defined functions (used by SQL*Loader)
CREATE OR REPLACE FUNCTION GET_BONUS (dept IN VARCHAR2)
RETURN NUMBER AS
retval NUMBER(10,2);
BEGIN
retval := NULL;
if upper (dept) in ('CLERK', 'SALESMAN') then
if to_char (sysdate, 'YYYY') = '2002' then
retval := 9.2;
else
retval := 7.88;
end if;
elsif upper (dept) = 'ANALYST' then
retval := 18.7;
elsif upper (dept) = 'MANAGER' then
retval := -5.92;
end if;
return (retval);
END;
/
CREATE OR REPLACE FUNCTION CALC_SAL (sal IN NUMBER, bon IN NUMBER)
RETURN NUMBER AS
retval NUMBER(10,2);
BEGIN
if bon is null then
retval := sal;
else
retval := round (sal + (sal * bon / 100), 2);
end if;
return (retval);
END;
/
The table TEST will be loaded with this control file:
LOAD DATA
INFILE *
INTO TABLE TEST
TRUNCATE
FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
ID SEQUENCE,
FIRST_NAME "TRIM (:FIRST_NAME)", -- 1)
LAST_NAME "TRIM ('x' FROM :LAST_NAME)", -- 2)
DEPARTMENT,
SALARY "TO_NUMBER (:SALARY, '99999D99', -- 3)
'NLS_NUMERIC_CHARACTERS='',.''')",
BONUS "GET_BONUS (:DEPARTMENT)", -- 4)
DESCRIPTION "'Salary: '||:SALARY||' -> '|| -- 5)
DECODE (GET_BONUS (:DEPARTMENT), NULL, 'No bonus',
'Bonus: '||TO_CHAR (GET_BONUS (:DEPARTMENT), 'S09D90'))",
TOTAL "CALC_SAL (TO_NUMBER (:SALARY, '99999D99', -- 6)
'NLS_NUMERIC_CHARACTERS='',.'''), GET_BONUS (:DEPARTMENT))"
)
BEGINDATA
" Martin ";"xxxxxxxSmithxxx";CLERK;2459,25
" David ";"xxxxxxxAllenxxx";SALESMAN;4563,9
" Brad ";"xxxxxxxWardxxxx";SALESMAN;4815,81
" Marvin ";"xxxxxxxJonesxxx";MANAGER;9765,33
" Dean ";"xxxxxxxMartinxx";SALESMAN;4214,56
" John ";"xxxxxxxBlakexxx";MANAGER;10333,87
" Clark ";"xxxxxxxGablexxx";MANAGER;11011,11
" Scott ";"xxxxxxxTigerxxx";ANALYST;6865,88
" Ralph ";"xxxxxxxKingxxxx";PRESIDENT;18955,45
" Tina ";"xxxxxxxTurnerxx";SALESMAN;5324,44
" Bryan ";"xxxxxxxAdamsxxx";CLERK;1899,48
" Jesse ";"xxxxxxxJamesxxx";CLERK;2288,99
" John ";"xxxxxxxFordxxxx";ANALYST;7564,83
" John ";"xxxxxxxMillerxx";CLERK;1865,93
1) TRIM deletes the leading/trailing blanks in the column FIRST_NAME (i.e. " Martin " becomes "Martin")
2) TRIM deletes the leading/trailing 'x' characters in the column LAST_NAME (i.e. "xxxxxxxSmithxxx" becomes "Smith")
3) TO_NUMBER shows that the format of the numbers in the column SALARY is in the form: 99999D99. That means max. 5 digits integer with max. 2 digit
post-decimal positions. The decimal separator is ','. If the format is not specified, then the records are not loaded (ORA-1722 invalid number, if NLS_NUMERIC_CHARACTERS = '.,')
4) The column BONUS is calculated with the user defined function GET_BONUS. The Function expects an input parameter, DEPARTMENT (VARCHAR2), and
returns the value, BONUS (NUMBER(2,2))
5) The column DESCRIPTION is a composition of the information from the previous columns. The Function DECODE checks if a bonus is available to the department. If no bonus is available, then the message 'No bonus' will be printed. The new thing here is the function TO_CHAR. This function modifies the format of the BONUS in this form: sign, 2 integer digits with leading zeros, decimal separator, 2 post-decimal positions with trailing zeros.
6) The column TOTAL is calculated with the user defined function CALC_SAL (the BONUS, if available, is applied to the SALARY)
The result after the loading procedure looks like this in the table TEST:
SQL> select * from test;
ID FIRST_NAME LAST_NAME DEPARTMENT SALARY
---------- -------------------- -------------------- -------------------- ----------
1 Martin Smith CLERK 2459.25
2 David Allen SALESMAN 4563.9
3 Brad Ward SALESMAN 4815.81
4 Marvin Jones MANAGER 9765.33
5 Dean Martin SALESMAN 4214.56
6 John Blake MANAGER 10333.87
7 Clark Gable MANAGER 11011.11
8 Scott Tiger ANALYST 6865.88
9 Ralph King PRESIDENT 18955.45
10 Tina Turner SALESMAN 5324.44
11 Bryan Adams CLERK 1899.48
12 Jesse James CLERK 2288.99
13 John Ford ANALYST 7564.83
14 John Miller CLERK 1865.93
The syntax for such calls is quite simple:
LOAD DATA
INFILE *
APPEND
INTO TABLE dept (
deptno POSITION(01:02) INTEGER EXTERNAL,
dname POSITION(03:16) CHAR "LDR_PCK.NAME(:dname, :deptno)",
loc POSITION(17:29) CHAR
)
BEGINDATA
21Dep Loc
22Dep Loc
Here LDR_PCK.NAME is name of package and function, :dnname and :deptno are parameters.
When calling user-defined functions we must remember that only conventional path may be used. An attempt to use direct path will cause an error:
SQL*Loader-00417 SQL String (on column column_name) not allowed in direct path.
The OPTIONS Clause
The OPTIONS clause is useful when you usually invoke a control file with the same set of options, or when the command line and all its arguments becomes very long. This clause allows you to specify runtime arguments in the control file rather than on the command line.
SKIP = n | -- Number of logical records to skip (DEFAULT 0) |
---|---|
LOAD = n | -- Number of logical records to load (DEFAULT all) |
ERRORS = n | -- Number of errors to allow (DEFAULT 50) |
ROWS = n | -- Number of rows in conventional path bind array (DEFAULT 64) |
BINDSIZE = n | -- Size of conventional path bind array in bytes |
SILENT = {HEADER | FEEDBACK | ERROR | DISCARDS | ALL } | |
-- Suppress messages during run |
For example:
OPTIONS (BINDSIZE=10000, SILENT=(ERRORS, FEEDBACK) )
Values specified on the command line override values specified in the control file. With this precedence, the OPTIONS keyword in the control file established default values that are easily changed from the command line.
Continuing Interrupted Loads
If SQL*Loader runs out of space for data rows or index entries, the load is discontinued. (For example, the table might reach its maximum number of extents.) Discontinued loads can be continued after more space is made available.
When a load is discontinued, any data already loaded remains in the tables, and the tables are left in a valid state. SQL*Loader's log file tells you the state of the tables and indexes and the number of logical records already read from the input data file. Use this information to resume the load where it left off.
For example:
SQLLOAD / CONTROL=FAST1.CTL SKIP=345
CONTINUE\_LOAD DATA statement is used to continue a discontinued direct path load involving multiple tables with a varying number of records to skip. For more information on this command, see chapter 6 of ``ORACLE7 Server Utilities Users Guide''.
Identifying Data Files
To specify the file containing the data to be loaded, use the INFILE or INDDN keyword, followed by the filename. A filename specified on the command line overrides the first INFILE or INDDN statement in the control file. If no filename is specified, the filename defaults to the control filename with an extension or filetype of DAT.
Loading into Non-Empty Database Tables
SQL*Loader does not update existing records, even if they have null columns. If the tables you are loading already contain data, you have three choices for how SQL*Loader should proceed:
INSERT - This is the default option. It requires the table to be empty before loading. SQL*Loader terminates with an error if the table contains rows.APPEND - If data already exists in the table, SQL*Loader appends the new rows to it; if data doesn't already exist, the new rows are simply loaded.
REPLACE - All rows in the table are deleted and the new data is loaded. This option requires DELETE privileges on the table.
You can create one logical record from multiple physical records using CONCATENATE and CONTINUEIF. See chapter 6 of ``ORACLE7 Server Utilities Users Guide''.
Loading Logical Records into Tables
The INTO TABLE clause allows you to tell which table you want to load data into. To load multiple tables, you would include one INTO TABLE clause for each table you wish to load.
The INTO TABLE clause may continue with some options for loading that table. For example, you may specify different options (INSERT, APPEND, REPLACE) for each table in order to tell SQL*Loader what to do if data already exists in the table.
The WHEN clause appears after the table name and is followed by one or more field conditions. For example, the following clause indicates that any record with the value ``q'' in the fifth column position should be loaded:
WHEN (5) = 'q'
A WHEN clause can contain several comparisons as long as each is preceded by AND. Parentheses are optional but should be used for clarity with multiple comparisons joined by AND. For example:
WHEN (DEPTNO = '10') AND (JOB = 'SALES')
To evaluate the WHEN clause, SQL*Loader first determines the values of all the fields in the record. Then the WHEN clause is evaluated. A row is inserted into the table only if the WHEN clause is true.
When the control file specifies more fields for a record than are present in the record, SQL*Loader must determine whether the remaining (specified) columns should be considered null, or whether an error should be generated. TRAILING NULLCOLS clause tells SQL*Loader to treat any relatively positioned columns that are not present in the record as null columns. For example, if the following data
10 Accounting
is read with the following control file
INTO TABLE dept TRAILING NULLCOLS ( deptno CHAR TERMINATED BY " ", dname CHAR TERMINATED BY WHITESPACE, loc CHAR TERMINATED BY WHITESPACE )
and the record ends after DNAME, then the remaining LOC field is set to null. Without the TRAILING NULLCOLS clause, an error would be generated, due to missing data.