SQL*Loader Field List Reference

SQLLDR只有两个保留字CONSTANT与ZONE:

Note:The keywords CONSTANT and ZONE have special meaning to SQL*Loader and are therefore reserved. To avoid potential conflicts, Oracle recommends that you do not use either CONSTANT or ZONE as a name for any tables or columns.

The fields are position, data type, conditions, and delimiters.

(  hiredate  SYSDATE,

     deptno  POSITION(1:2)  INTEGER EXTERNAL(2)  NULLIF deptno=BLANKS,

       job   POSITION(7:14)  CHAR  TERMINATED BY WHITESPACE

              NULLIF job=BLANKS  "UPPER(:job)",

       mgr    POSITION(28:31) INTEGER EXTERNAL

              TERMINATED BY WHITESPACE, NULLIF mgr=BLANKS,

       ename  POSITION(34:41) CHAR

              TERMINATED BY WHITESPACE  "UPPER(:ename)",

       empno  POSITION(45) INTEGER EXTERNAL

              TERMINATED BY WHITESPACE,

       sal    POSITION(51) CHAR  TERMINATED BY WHITESPACE

              "TO_NUMBER(:sal,'$99,999.99')",

      comm   INTEGER EXTERNAL  ENCLOSED BY '(' AND '%'

              ":comm * 100"

)

Specifying the Position of a Data Field

The position may either be stated explicitly or relative to the preceding field.

The start, end, and integer values are always in bytes, even if character-length semantics are used for a data file.


start: starting column of the data field in the logical record. The first byte position in a logical record is 1.

End: The ending position of the data field in the logical record. Either start-end or start:end is acceptable. If you omit end, then the length of the field is derived from the data type in the data file. 

Note that CHAR data specified without start or end, and without a length specification (CHAR(n)), is assumed to have a length of 1. If it is impossible to derive a length from the data type, then an error message is issued.

*: Specifies that the data field follows immediately after the previous field. If you use * for the first data field in the control file, then that field is assumed to be at the beginning of the logical record. When you use * to specify position, the length of the field is derived from the data type.

+integer: You can use an offset, specified as +integer, to offset the current field from the next position after the end of the previous field. A number of bytes, as specified by +integer, are skipped before reading the value for the current field

注:You may omit POSITION entirely. If you do, then the position specification for the data field is the same as if POSITION(*) had been used.

如果在Multiple Table INTO TABLE 要注意:

When you specify POSITION(*) for the first column of the first table, the position is calculated relative to the beginning of the logical record.

之后的导入表的postion(*)为上个表的最后一列之后

When you specify POSITION(*) for the first column of subsequent tables, the position is calculated relative to the last column of the last table loaded.

Example:

ename  POSITION (1:20)  CHAR

empno  POSITION (22-26) INTEGER EXTERNAL

allow  POSITION (*+2)   INTEGER EXTERNAL TERMINATED BY "/"

Column allow is offset from the next position (27) after the end of empno by +2, so it starts in column 29 and continues until a slash is encountered.

position的长度单位永远是byte,而数据类型的长度要根据character-length semantics看是byte还是char为单位; 如果同时指定了postition(start:end)+数据类型(length),长度以数据类型定义为准

Specifying Columns and Fields

列名必须与导入表的列名一致(除了FILLER属性的列,FILLER单列了文档)

Each column name (unless it is marked FILLER) must correspond to a column of the table named in the INTO TABLE clause. A column name must be enclosed in quotation marks if it is a SQL or SQL*Loader reserved word, contains special characters, or is case sensitive.

  1. If the value is to be generated by SQL*Loader, then the specification includes the RECNUMSEQUENCE, or CONSTANT parameter. 
  2. If the column's value is read from the data file, then the data field that contains the column's value is specified. In this case, the column specification includes a column name that identifies a column in the database table, and a field specification that describes a field in a data record. The field specification includes position, data type, null restrictions, and defaults.
  3. It is not necessary to specify all attributes when loading column objects. Any missing attributes will be set to NULL.

Specifying the Data Type of a Data Field

The data type specification of a field tells SQL*Loader how to interpret the data in the field.

Only one data type can be specified for each field; if a data type is not specified, then CHAR is assumed. Before you specify the data type, you must specify the position of the field.

The field specifications are contained in the control file. The control file CHAR specification is not the same as the database CHAR specification. A data field defined as CHAR in the control file simply tells SQL*Loader how to create the row insert. The data could then be inserted into a CHAR, VARCHAR2, NCHAR, NVARCHAR2, or even a NUMBER or DATE column in the database, with the Oracle database handling any necessary conversions.

By default, SQL*Loader removes trailing spaces from CHAR data before passing it to the database. So, both Field 1 and Field 2 are passed to the database as 3-byte fields. However, when the data is inserted into the table, there is a difference.
Column 1 is defined in the database as a fixed-length CHAR column of length 5. So the data (aaa) is left-justified in that column, which remains 5 bytes wide. The extra space on the right is padded with blanks. Column 2, however, is defined as a varying-length field with a maximum length of 5 bytes. The data for that column (bbb) is left-justified as well, but the length remains 3 bytes.

  • SQL*Loader Data Types

SQLLDR数据类型根据是否为平台依赖platform dependent分为两组:可移植和不可移植

在这两组中又细分为值数据类型(value data types)和长度数据类型(length-value data type): 前者表示数据字段只存值,后者表示字段分两部分,分别用于存值及存字段的长度(就是固定长类型和变长类型的区别)

SQL*Loader data types can be grouped into portable and nonportable data types. Within each of these two groups, the data types are subgrouped into value data types and length-value data types.

Portable versus nonportable refers to whether the data type is platform dependent. Platform dependency can exist for several reasons, including differences in the byte ordering schemes of different platforms (big-endian versus little-endian), differences in the number of bits in a platform (16-bit, 32-bit, 64-bit), differences in signed number representation schemes (2's complement versus 1's complement), and so on. In some cases, such as with byte ordering schemes and platform word length, SQL*Loader provides mechanisms to help overcome platform dependencies. These mechanisms are discussed in the descriptions of the appropriate data types.

Both portable and nonportable data types can be values or length-values. Value data types assume that a data field has a single part. Length-value data types require that the data field consist of two subfields where the length subfield specifies how long the value subfield can be.

Note: As of Oracle Database 12c Release 1 (12.1), the maximum size of the Oracle Database VARCHAR2NVARCHAR2, and RAW data types has been increased to 32 KB when the COMPATIBLE initialization parameter is set to 12.0 or later and the MAX_STRING_SIZE initialization parameter is set to EXTENDED. SQL*Loader supports this new maximum size.

  1. Nonportable Data Types

The nonportable value data types:INTEGER(n), SMALLINT, FLOAT, DOUBLE, BYTEINT, ZONED, and (packed) DECIMAL.

The nonportable length-value data types:VARGRAPHIC, VARCHAR, VARRAW, and LONG VARRAW.

1.1 INTEGER(n)

The data is a full-word binary integer, where n is an optionally supplied length of 1, 2, 4, or 8. If no length specification is given, then the length, in bytes, is based on the size of a LONG INT in the C programming language on your particular platform.

INTEGERs are not portable because their byte size, their byte order, and the representation of signed values may be different between systems. However, if the representation of signed values is the same between systems, then SQL*Loader may be able to access INTEGER data with correct results. If INTEGER is specified with a length specification (n), and the appropriate technique is used (if necessary) to indicate the byte order of the data, then SQL*Loader can access the data with correct results between systems.

If INTEGER is specified without a length specification, then SQL*Loader can access the data with correct results only if the size of a LONG INT in the C programming language is the same length in bytes on both systems. In that case, the appropriate technique must still be used (if necessary) to indicate the byte order of the data.

Specifying an explicit length for binary integers is useful in situations where the input data was created on a platform whose word length differs from that on which SQL*Loader is running. For instance, input data containing binary integers might be created on a 64-bit platform and loaded into a database using SQL*Loader on a 32-bit platform. In this case, use INTEGER(8) to instruct SQL*Loader to process the integers as 8-byte quantities, not as 4-byte quantities.

By default, INTEGER is treated as a SIGNED quantity. If you want SQL*Loader to treat it as an unsigned quantity, then specify UNSIGNED. To return to the default behavior, specify SIGNED.

1.2 SMALLINT

The data is a half-word binary integer. The length of the field is the length of a half-word integer on your system. By default, it is treated as a SIGNED quantity. If you want SQL*Loader to treat it as an unsigned quantity, then specify UNSIGNED. To return to the default behavior, specify SIGNED.

SMALLINT can be loaded with correct results only between systems where a SHORT INT has the same length in bytes. If the byte order is different between the systems, then use the appropriate technique to indicate the byte order of the data. See "Byte Ordering".

Note:

This is the SHORT INT data type in the C programming language. One way to determine its length is to make a small control file with no data and look at the resulting log file. This length cannot be overridden in the control file.

1.3 FLOAT

The data is a single-precision, floating-point, binary number. If you specify end in the POSITION clause, then end is ignored. The length of the field is the length of a single-precision, floating-point binary number on your system. (The data type is FLOAT in C.) This length cannot be overridden in the control file.

FLOAT can be loaded with correct results only between systems where the representation of FLOAT is compatible and of the same length. If the byte order is different between the two systems, then use the appropriate technique to indicate the byte order of the data. See "Byte Ordering".

1.4 DOUBLE

The data is a double-precision, floating-point binary number. If you specify end in the POSITION clause, then end is ignored. The length of the field is the length of a double-precision, floating-point binary number on your system. (The data type is DOUBLE or LONG FLOAT in C.) This length cannot be overridden in the control file.

DOUBLE can be loaded with correct results only between systems where the representation of DOUBLE is compatible and of the same length. If the byte order is different between the two systems, then use the appropriate technique to indicate the byte order of the data. See "Byte Ordering".

1.5  BYTEINT

The decimal value of the binary representation of the byte is loaded. For example, the input character x"1C" is loaded as 28. The length of a BYTEINT field is always 1 byte. If POSITION(start:end) is specified, then end is ignored. (The data type is UNSIGNED CHAR in C.)

An example of the syntax for this data type is:

(column1 position(1) BYTEINT,

column2 BYTEINT,

...

)

1.6 ZONED

ZONED data is in zoned decimal format: a string of decimal digits, one per byte, with the sign included in the last byte. (In COBOL, this is a SIGN TRAILING field.) The length of this field equals the precision (number of digits) that you specify.


In this syntax, precision is the number of digits in the number, and scale (if given) is the number of digits to the right of the (implied) decimal point. The following example specifies an 8-digit integer starting at position 32:

sal  POSITION(32)  ZONED(8),

The Oracle database uses the VAX/VMS zoned decimal format when the zoned data is generated on an ASCII-based platform. It is also possible to load zoned decimal data that is generated on an EBCDIC-based platform. In this case, Oracle uses the IBM format as specified in the ESA/390 Principles of Operations, version 8.1 manual. The format that is used depends on the character set encoding of the input data file.

1.7 DECIMAL

DECIMAL data is in packed decimal format: two digits per byte, except for the last byte, which contains a digit and sign. DECIMAL fields allow the specification of an implied decimal point, so fractional values can be represented.


The precision parameter is the number of digits in a value. The length of the field in bytes, as computed from digits, is (N+1)/2 rounded up.

The scale parameter is the scaling factor, or number of digits to the right of the decimal point. The default is zero (indicating an integer). The scaling factor can be greater than the number of digits but cannot be negative.

An example is:

sal DECIMAL (7,2)

This example would load a number equivalent to +12345.67. In the data record, this field would take up 4 bytes. (The byte length of a DECIMAL field is equivalent to (N+1)/2, rounded up, where N is the number of digits in the value, and 1 is added for the sign.)

1.8 VARGRAPHIC

The data is a varying-length, double-byte character set (DBCS). It consists of a length subfield followed by a string of double-byte characters. The Oracle database does not support double-byte character sets; however, SQL*Loader reads them as single bytes and loads them as RAW data. Like RAW data, VARGRAPHIC fields are stored without modification in whichever column you specify.

Note: The size of the length subfield is the size of the SQL*Loader SMALLINT data type on your system (C type SHORT INT). See "SMALLINT" for more information.

VARGRAPHIC data can be loaded with correct results only between systems where a SHORT INT has the same length in bytes. If the byte order is different between the systems, then use the appropriate technique to indicate the byte order of the length subfield. See "Byte Ordering".

The syntax for the VARGRAPHIC data type is:


The length of the current field is given in the first 2 bytes. A maximum length specified for the VARGRAPHIC data type does not include the size of the length subfield. The maximum length specifies the number of graphic (double-byte) characters. It is multiplied by 2 to determine the maximum length of the field in bytes.

The default maximum field length is 2 KB graphic characters, or 4 KB (2 times 2KB). To minimize memory requirements, specify a maximum length for such fields whenever possible.

If a position specification is specified (using pos_spec) before the VARGRAPHIC statement, then it provides the location of the length subfield, not of the first graphic character. If you specify pos_spec(start:end), then the end location determines a maximum length for the field. Both startand end identify single-character (byte) positions in the file. Start is subtracted from (end + 1) to give the length of the field in bytes. If a maximum length is specified, then it overrides any maximum length calculated from the position specification.

If a VARGRAPHIC field is truncated by the end of the logical record before its full length is read, then a warning is issued. Because the length of a VARGRAPHIC field is embedded in every occurrence of the input data for that field, it is assumed to be accurate.

VARGRAPHIC data cannot be delimited.

1.9 VARCHAR

A VARCHAR field is a length-value data type. It consists of a binary length subfield followed by a character string of the specified length. The length is in bytes unless character-length semantics are used for the data file. In that case, the length is in characters.

VARCHAR fields can be loaded with correct results only between systems where a SHORT data field INT has the same length in bytes. If the byte order is different between the systems, or if the VARCHAR field contains data in the UTF16 character set, then use the appropriate technique to indicate the byte order of the length subfield and of the data. The byte order of the data is only an issue for the UTF16 character set. See "Byte Ordering".

Note: The size of the length subfield is the size of the SQL*Loader SMALLINT data type on your system (C type SHORT INT).

The syntax for the VARCHAR data type is:


如果不指定长度则使用最大长度,如果不指定长度且使用了POSITION则长度以POSITION为准,如果同时指定了长度和POSITION且POSITION长度小时,以指定的长度为准

A maximum length specified in the control file does not include the size of the length subfield. If you specify the optional maximum length for a VARCHAR data type, then a buffer of that size, in bytes, is allocated for these fields. However, if character-length semantics are used for the data file, then the buffer size in bytes is the max_length times the size in bytes of the largest possible character in the character set.

The default maximum size is 4 KB. Specifying the smallest maximum length that is needed to load your data can minimize SQL*Loader's memory requirements, especially if you have many VARCHAR fields.

The POSITION clause, if used, gives the location, in bytes, of the length subfield, not of the first text character. If you specify POSITION(start:end), then the end location determines a maximum length for the field. Start is subtracted from (end + 1) to give the length of the field in bytes. If a maximum length is specified, then it overrides any length calculated from POSITION.

If a VARCHAR field is truncated by the end of the logical record before its full length is read, then a warning is issued. Because the length of a VARCHAR field is embedded in every occurrence of the input data for that field, it is assumed to be accurate.

VARCHAR data cannot be delimited.

1.10 VARRAW

VARRAW is made up of a 2-byte binary length subfield followed by a RAW string value subfield.

VARRAW results in a VARRAW with a 2-byte length subfield and a maximum size of 4 KB (that is, the default). VARRAW(65000) results in a VARRAW with a length subfield of 2 bytes and a maximum size of 65000 bytes.

VARRAW fields can be loaded between systems with different byte orders if the appropriate technique is used to indicate the byte order of the length subfield. See "Byte Ordering".

1.11 LONG VARRAW

LONG VARRAW is a VARRAW with a 4-byte length subfield instead of a 2-byte length subfield.

LONG VARRAW results in a VARRAW with 4-byte length subfield and a maximum size of 4 KB (that is, the default). LONG VARRAW(300000) results in a VARRAW with a length subfield of 4 bytes and a maximum size of 300000 bytes.

LONG VARRAW fields can be loaded between systems with different byte orders if the appropriate technique is used to indicate the byte order of the length subfield. See "Byte Ordering".

  1. Portable Data Types

The portable value data types are CHAR, Datetime and Interval, GRAPHIC, GRAPHIC EXTERNAL, Numeric EXTERNAL (INTEGER, FLOAT, DECIMAL, ZONE), and RAW.

The portable length-value data types are VARCHARC and VARRAWC.

2.1 CHAR

The data field contains character data. The length, which is optional, is a maximum length. Note the following regarding length:

If a length is not specified, then it is derived from the POSITION specification.

If a length is specified, then it overrides the length in the POSITION specification.

If no length is given and there is no POSITION specification, then CHAR data is assumed to have a length of 1, unless the field is delimited:

For a delimited CHAR field, if a length is specified, then that length is used as a maximum.

For a delimited CHAR field for which no length is specified, the default is 255 bytes.

For a delimited CHAR field that is greater than 255 bytes, you must specify a maximum length. Otherwise you will receive an error stating that the field in the data file exceeds maximum length.

2.2 Datetime and Interval Data Types

Both datetimes and intervals are made up of fields. The values of these fields determine the value of the data type.

The datetime data types are:DATE, TIME, TIME WITH TIME ZONE, TIMESTAMP, TIMESTAMP WITH TIME ZONE, TIMESTAMP WITH LOCAL TIME ZONE

Values of datetime data types are sometimes called datetimes. In the following descriptions of the datetime data types you will see that, except for DATE, you are allowed to optionally specify a value for fractional_second_precision. The fractional_second_precision specifies the number of digits stored in the fractional part of the SECOND datetime field. When you create a column of this data type, the value can be a number in the range 0 to 9. The default is 6.

The interval data types are:INTERVAL YEAR TO MONTH,INTERVAL DAY TO SECOND

Values of interval data types are sometimes called intervals. The INTERVAL YEAR TO MONTH data type lets you optionally specify a value for year_precision. The year_precision value is the number of digits in the YEAR datetime field. The default value is 2.

The INTERVAL DAY TO SECOND data type lets you optionally specify values for day_precision and fractional_second_precision. The day_precision is the number of digits in the DAY datetime field. Accepted values are 0 to 9. The default is 2. The fractional_second_precision specifies the number of digits stored in the fractional part of the SECOND datetime field. When you create a column of this data type, the value can be a number in the range 0 to 9. The default is 6.

2.2.1 DATE

The DATE field contains character data that should be converted to an Oracle date using the specified date mask. The syntax for the DATE field is:

LOAD DATA

INTO TABLE dates (col_a POSITION (1:15) DATE "DD-Mon-YYYY")

BEGINDATA

1-Jan-2012

1-Apr-2012 28-Feb-2012

Whitespace is ignored and dates are parsed from left to right unless delimiters are present. (A DATE field that consists entirely of whitespace is loaded as a NULL field.)

The length specification is optional, unless a varying-length date mask is specified. The length is in bytes unless character-length semantics are used for the data file. In that case, the length is in characters.

In the preceding example, the date mask, "DD-Mon-YYYY" contains 11 bytes, with byte-length semantics. Therefore, SQL*Loader expects a maximum of 11 bytes in the field, so the specification works properly. But, suppose a specification such as the following is given:

DATE "Month dd, YYYY"

In this case, the date mask contains 14 bytes. If a value with a length longer than 14 bytes is specified, such as "September 30, 2012", then a length must be specified.

If an explicit length is not specified, then it can be derived from the POSITION clause. It is a good idea to specify the length whenever you use a mask, unless you are absolutely sure that the length of the data is less than, or equal to, the length of the mask.

An explicit length specification, if present, overrides the length in the POSITION clause. Either of these overrides the length derived from the mask. The mask may be any valid Oracle date mask. If you omit the mask, then the default Oracle date mask of "dd-mon-yy" is used.

The length must be enclosed in parentheses and the mask in quotation marks.

A field of data type DATE may also be specified with delimiters.

2.2.2 TIME

The TIME data type stores hour, minute, and second values. It is specified as follows:

TIME [(fractional_second_precision)]

2.2.3 TIME WITH TIME ZONE

The TIME WITH TIME ZONE data type is a variant of TIME that includes a time zone displacement in its value. The time zone displacement is the difference (in hours and minutes) between local time and UTC (coordinated universal time, formerly Greenwich mean time). It is specified as follows:

TIME [(fractional_second_precision)] WITH [LOCAL] TIME ZONE

If the LOCAL option is specified, then data stored in the database is normalized to the database time zone, and time zone displacement is not stored as part of the column data. When the data is retrieved, it is returned in the user's local session time zone.

2.2.4 TIMESTAMP

The TIMESTAMP data type is an extension of the DATE data type. It stores the year, month, and day of the DATE data type, plus the hour, minute, and second values of the TIME data type. It is specified as follows:

TIMESTAMP [(fractional_second_precision)]

If you specify a date value without a time component, then the default time is 12:00:00 a.m. (midnight).

2.2.5 TIMESTAMP WITH TIME ZONE

The TIMESTAMP WITH TIME ZONE data type is a variant of TIMESTAMP that includes a time zone displacement in its value. The time zone displacement is the difference (in hours and minutes) between local time and UTC (coordinated universal time, formerly Greenwich mean time). It is specified as follows:

TIMESTAMP [(fractional_second_precision)] WITH TIME ZONE

2.2.6 TIMESTAMP WITH LOCAL TIME ZONE

The TIMESTAMP WITH LOCAL TIME ZONE data type is another variant of TIMESTAMP that includes a time zone offset in its value. Data stored in the database is normalized to the database time zone, and time zone displacement is not stored as part of the column data. When the data is retrieved, it is returned in the user's local session time zone. It is specified as follows:

TIMESTAMP [(fractional_second_precision)] WITH LOCAL TIME ZONE

2.2.7 INTERVAL YEAR TO MONTH

The INTERVAL YEAR TO MONTH data type stores a period of time using the YEAR and MONTH datetime fields. It is specified as follows:

INTERVAL YEAR [(year_precision)] TO MONTH

2.2.8 INTERVAL DAY TO SECOND

The INTERVAL DAY TO SECOND data type stores a period of time using the DAY and SECOND datetime fields. It is specified as follows:

INTERVAL DAY [(day_precision)] TO SECOND [(fractional_second_precision)]

2.3 GRAPHIC

The data is in the form of a double-byte character set (DBCS). The Oracle database does not support double-byte character sets; however, SQL*Loader reads them as single bytes. Like RAW data, GRAPHIC fields are stored without modification in whichever column you specify.


For GRAPHIC and GRAPHIC EXTERNAL, specifying POSITION(start:end) gives the exact location of the field in the logical record.

If you specify a length for the GRAPHIC (EXTERNAL) data type, however, then you give the number of double-byte graphic characters. That value is multiplied by 2 to find the length of the field in bytes. If the number of graphic characters is specified, then any length derived from POSITION is ignored. No delimited data field specification is allowed with GRAPHIC data type specification.

2.4 GRAPHIC EXTERNAL

If the DBCS field is surrounded by shift-in and shift-out characters, then use GRAPHIC EXTERNAL. This is identical to GRAPHIC, except that the first and last characters (the shift-in and shift-out) are not loaded.


GRAPHIC indicates that the data is double-byte characters. EXTERNAL indicates that the first and last characters are ignored. The graphic_char_length value specifies the length in DBCS (see "GRAPHIC").

For example, let [ ] represent shift-in and shift-out characters, and let # represent any double-byte character.

To describe ####, use POSITION(1:4) GRAPHIC or POSITION(1) GRAPHIC(2).

To describe [####], use POSITION(1:6) GRAPHIC EXTERNAL or POSITION(1) GRAPHIC EXTERNAL(2).

2.5 Numeric EXTERNAL

The numeric EXTERNAL data types are the numeric data types (INTEGER, FLOAT, DECIMAL, and ZONED) specified as EXTERNAL, with optional length and delimiter specifications. The length is in bytes unless character-length semantics are used for the data file. In that case, the length is in characters.

These data types are the human-readable, character form of numeric data. The same rules that apply to CHAR data regarding length, position, and delimiters apply to numeric EXTERNAL data. See "CHAR" for a complete description of these rules.

The syntax for the numeric EXTERNAL data types is shown as part of "datatype_spec".

Note: The data is a number in character form, not binary representation. Therefore, these data types are identical to CHAR and are treated identically, except for the use of DEFAULTIF. If you want the default to be null, then use CHAR; if you want it to be zero, then use EXTERNAL. See "Using the WHEN_ NULLIF_ and DEFAULTIF Clauses".

FLOAT EXTERNAL data can be given in either scientific or regular notation. Both "5.33" and "533E-2" are valid representations of the same value.

2.6 RAW

When raw, binary data is loaded "as is" into a RAW database column, it is not converted by the Oracle database. If it is loaded into a CHAR column, then the Oracle database converts it to hexadecimal. It cannot be loaded into a DATE or number column.


The length of this field is the number of bytes specified in the control file. This length is limited only by the length of the target column in the database and by memory resources. The length is always in bytes, even if character-length semantics are used for the data file. RAW data fields cannot be delimited.

2.7 VARCHARC

The data type VARCHARC consists of a character length subfield followed by a character string value-subfield.

The declaration for VARCHARC specifies the length of the length subfield, optionally followed by the maximum size of any string. If byte-length semantics are in use for the data file, then the length and the maximum size are both in bytes. If character-length semantics are in use for the data file, then the length and maximum size are in characters. If a maximum size is not specified, then 4 KB is the default regardless of whether byte-length semantics or character-length semantics are in use.

For example:

  1. VARCHARC results in an error because you must at least specify a value for the length subfield.
  2. VARCHARC(7) results in a VARCHARC whose length subfield is 7 bytes long and whose maximum size is 4 KB (the default) if byte-length semantics are used for the data file. If character-length semantics are used, then it results in a VARCHARC with a length subfield that is 7 characters long and a maximum size of 4 KB (the default). Remember that when a maximum size is not specified, the default of 4 KB is always used, regardless of whether byte-length or character-length semantics are in use.
  3. VARCHARC(3,500) results in a VARCHARC whose length subfield is 3 bytes long and whose maximum size is 500 bytes if byte-length semantics are used for the data file. If character-length semantics are used, then it results in a VARCHARC with a length subfield that is 3 characters long and a maximum size of 500 characters.

2.8 VARRAWC

The data type VARRAWC consists of a RAW string value subfield.

For example:

VARRAWC results in an error.

VARRAWC(7) results in a VARRAWC whose length subfield is 7 bytes long and whose maximum size is 4 KB (that is, the default).

VARRAWC(3,500) results in a VARRAWC whose length subfield is 3 bytes long and whose maximum size is 500 bytes.

2.9 Conflicting Native Data Type Field Lengths

There are several ways to specify a length for a field. If multiple lengths are specified and they conflict, then one of the lengths takes precedence. A warning is issued when a conflict exists. The following rules determine which field length is used:

  1. The size of SMALLINT, FLOAT, and DOUBLE data is fixed, regardless of the number of bytes specified in the POSITION clause.
  2. If the length (or precision) specified for a DECIMAL, INTEGER, ZONED, GRAPHIC, GRAPHIC EXTERNAL, or RAW field conflicts with the size calculated from a POSITION(start:end) specification, then the specified length (or precision) is used.
  3. If the maximum size specified for a character or VARGRAPHIC field conflicts with the size calculated from a POSITION(start:end)specification, then the specified maximum is used.

For example, assume that the native data type INTEGER is 4 bytes long and the following field specification is given:

column1 POSITION(1:6) INTEGER

In this case, a warning is issued, and the proper length (4) is used. The log file shows the actual length used under the heading "Len" in the column table:

Column Name             Position   Len  Term Encl Data Type

----------------------- --------- ----- ---- ---- ---------

COLUMN1                       1:6     4             INTEGER

2.10 Field Lengths for Length-Value Data Types

A control file can specify a maximum length for the following length-value data types: VARCHAR, VARCHARC, VARGRAPHIC, VARRAW, and VARRAWC. The specified maximum length is in bytes if byte-length semantics are used for the field, and in characters if character-length semantics are used for the field. If no length is specified, then the maximum length defaults to 4096 bytes. If the length of the field exceeds the maximum length, then the record is rejected with the following error:

Variable length field exceed maximum length

  1. Data Type Conversions

控制文件中数据类型用于定义如何解释数据文件数据,数据库定义表列的数据类型,数据文件列与表列是通过控制方件的列名联系到一起的。

SQL*Loader extracts data from a field in the input file, guided by the data type specification in the control file. SQL*Loader then sends the field to the server to be stored in the appropriate column (as part of an array of row inserts).

SQL*Loader or the server does any necessary data conversion to store the data in the proper internal format. This includes converting data from the data file character set to the database character set when they differ.

Note: When you use SQL*Loader conventional path to load character data from the data file into a LONG RAW column, the character data is interpreted has a HEX string. SQL converts the HEX string into its binary representation. Be aware that any string longer than 4000 bytes exceeds the byte limit for the SQL HEXTORAW conversion operator. An error will be returned. SQL*Loader will reject that row and continue loading.

The data type of the data in the file does not need to be the same as the data type of the column in the Oracle table. The Oracle database automatically performs conversions, but you need to ensure that the conversion makes sense and does not generate errors. For instance, when a data file field with data type CHAR is loaded into a database column with data type NUMBER, you must ensure that the contents of the character field represent a valid number.

  1. Data Type Conversions for Datetime and Interval Data Types

Table 10-2 shows which conversions between Oracle database data types and SQL*Loader control file datetime and interval data types are supported and which are not.

In the table, the abbreviations for the Oracle Database data types are as follows:

N = NUMBER

C = CHAR or VARCHAR2

D = DATE

T = TIME and TIME WITH TIME ZONE

TS = TIMESTAMP and TIMESTAMP WITH TIME ZONE

YM = INTERVAL YEAR TO MONTH

DS = INTERVAL DAY TO SECOND

Table 10-2 Data Type Conversions for Datetime and Interval Data Types

SQL*Loader Data Type

Oracle Database Data Type (Conversion Support)

N

N (Yes), C (Yes), D (No), T (No), TS (No), YM (No), DS (No)

C

N (Yes), C (Yes), D (Yes), T (Yes), TS (Yes), YM (Yes), DS (Yes)

D

N (No), C (Yes), D (Yes), T (No), TS (Yes), YM (No), DS (No)

T

N (No), C (Yes), D (No), T (Yes), TS (Yes), YM (No), DS (No)

TS

N (No), C (Yes), D (Yes), T (Yes), TS (Yes), YM (No), DS (No)

YM

N (No), C (Yes), D (No), T (No), TS (No), YM (Yes), DS (No)

DS

N (No), C (Yes), D (No), T (No), TS (No), YM (No), DS (Yes)

For the SQL*Loader data types, the definitions for the abbreviations in the table are the same for D, T, TS, YM, and DS. However, as noted in the previous section, SQL*Loader does not contain data type specifications for Oracle internal data types such as NUMBER,CHAR, and VARCHAR2. However, any data that the Oracle database can convert can be loaded into these or other database columns.

For an example of how to read this table, look at the row for the SQL*Loader data type DATE (abbreviated as D). Reading across the row, you can see that data type conversion is supported for the Oracle database data types of CHAR, VARCHAR2, DATE, TIMESTAMP, and TIMESTAMP WITH TIME ZONE data types. However, conversion is not supported for the Oracle database data types NUMBER, TIME, TIME WITH TIME ZONE, INTERVAL YEARTO MONTH, or INTERVAL DAY TO SECOND data types.

  1. Specifying Delimiters

The boundaries of CHAR, datetime, interval, or numeric EXTERNAL fields can also be marked by delimiter characters contained in the input data record.

The delimiter characters are specified using various combinations of the TERMINATED BYENCLOSED BY, and OPTIONALLY ENCLOSED BY clauses (the TERMINATED BY clause, if used, must come first). The delimiter specification comes after the data type specification.

The RAW data type can also be marked by delimiters, but only if it is in an input LOBFILE, and only if the delimiter is TERMINATED BY EOF (end of file).

5.1 Syntax for Termination and Enclosure Specification



TERMINATED:表示字段以什么结束Data is read until the first occurrence of a delimiter.

WHITESPACEDelimiter is any whitespace character including spaces, tabs, blanks, line feeds, form feeds, or carriage returns. (Only used with TERMINATED, not with ENCLOSED.)

OPTIONALLY:Data can be enclosed by the specified character. If SQL*Loader finds a first occurrence of the character, then it reads the data value until it finds the second occurrence. If the data is not enclosed, then the data is read as a terminated field. If you specify an optional enclosure, then you must specify a TERMINATED BY clause (either locally in the field definition or globally in the FIELDS clause).

ENCLOSED:表示字段以哪两字符包围The data will be found between two delimiters.

String:The delimiter is a string.

X'hexstr':The delimiter is a string that has the value specified by X'hexstr' in the character encoding scheme, such as X'1F' (equivalent to 31 decimal). "X" can be either lowercase or uppercase.

AND:Specifies a trailing enclosure delimiter that may be different from the initial enclosure delimiter. If AND is not present, then the initial and trailing delimiters are assumed to be the same.

EOF:Indicates that the entire file has been loaded into the LOB. This is valid only when data is loaded from a LOB file. Fields terminated by EOF cannot be enclosed.

Examples

TERMINATED BY ','                      a data string,

ENCLOSED BY '"'                        "a data string"

TERMINATED BY ',' ENCLOSED BY '"'        "a data string",

ENCLOSED BY '(' AND ')'                  (a data string)

5.2 Delimiter Marks in the Data

有时分隔符同时出现在数据字段中作数据,为了实现两个相邻的分隔符表示字段包含一个分隔符作数据

Sometimes the punctuation mark that is a delimiter must also be included in the data.

To make that possible, two adjacent delimiter characters are interpreted as a single occurrence of the character, and this character is included in the data. For example, this data:

下面数据中右括号用了两个,表示为数据,最后一个右括号生效

(The delimiters are left parentheses, (, and right parentheses, )).)

with this field specification:

ENCLOSED BY "(" AND ")"

puts the following string into the database:

The delimiters are left parentheses, (, and right parentheses, ).

但这样设置也可能出现以下问题

For this reason, problems can arise when adjacent fields use the same delimiters. For example, with the following specification:

field1 TERMINATED BY "/"

field2 ENCLOSED by "/"

the following data will be interpreted properly:

This is the first string/      /This is the second string/

But if field1 and field2 were adjacent, then the results would be incorrect, because

This is the first string//This is the second string/

would be interpreted as a single character string with a "/" in the middle, and that string would belong to field1.

5.3 Maximum Length of Delimited Data

字段长度默认最大为255bytes

The default maximum length of delimited data is 255 bytes. Therefore, delimited fields can require significant amounts of storage for the bind array. A good policy is to specify the smallest possible maximum value if the fields are shorter than 255 bytes. If the fields are longer than 255 bytes, then you must specify a maximum length for the field, either with a length specifier or with the POSITION clause.

For example, if you have a string literal that is longer than 255 bytes, then in addition to using SUBSTR(), use CHAR() to specify the longest string in any record for the field. An example of how this would look is as follows, assuming that 600 bytes is the longest string in any record for field1:

field1 CHAR(600) SUBSTR(:field, 1, 240)

5.4 Loading Trailing Blanks with Delimiters

如果不设置分隔符或PRESERVE BLANKS, 空格是不会导入的

Trailing blanks are not loaded with nondelimited data types unless you specify PRESERVE BLANKS.

If a data field is 9 characters long and contains the value DANIELbbb, where bbb is three blanks, then it is loaded into the Oracle database as "DANIEL" if declared as CHAR(9).

To include the trailing blanks, declare it as CHAR(9) TERMINATED BY ':', and add a colon to the data file so that the field is DANIELbbb:. The field is loaded as "DANIEL ", with the trailing blanks included. The same results are possible if you specify PRESERVE BLANKS without the TERMINATED BY clause.

  1. How Delimited Data Is Processed

6.1 Fields Using Only TERMINATED BY

If TERMINATED BY is specified for a field without ENCLOSED BY, then the data for the field is read from the starting position of the field up to, but not including, the first occurrence of the TERMINATED BY delimiter. If the terminator delimiter is found in the first column position of a field, then the field is null. If the end of the record is found before the TERMINATED BY delimiter, then all data up to the end of the record is considered part of the field.

如果whitespace作字段分隔符,行首空格会忽略,如果非whitespace作分隔符且行首有空格+分隔符,会把第一个field当成null

If TERMINATED BY WHITESPACE is specified, then data is read until the first occurrence of a whitespace character (spaces, tabs, blanks, line feeds, form feeds, or carriage returns). Then the current position is advanced until no more adjacent whitespace characters are found. This allows field values to be delimited by varying amounts of whitespace. However, unlike non-whitespace terminators, if the first column position of a field is known and a whitespace terminator is found there, then the field is not treated as null and can result in record rejection or fields loaded into incorrect columns.

6.2 Fields Using ENCLOSED BY Without TERMINATED BY

The following steps take place when a field uses an ENCLOSED BY clause without also using a TERMINATED BY clause.

  1. Any whitespace at the beginning of the field is skipped.
  2. The first non-whitespace character found must be the start of a string that matches the first ENCLOSED BY delimiter. If it is not, then the row is rejected.
  3. If the first ENCLOSED BY delimiter is found, then the search for the second ENCLOSED BY delimiter begins.
  4. If two of the second ENCLOSED BY delimiters are found adjacent to each other, then they are interpreted as a single occurrence of the delimiter and included as part of the data for the field. The search then continues for another instance of the second ENCLOSED BY delimiter.
  5. If the end of the record is found before the second ENCLOSED BY delimiter is found, then the row is rejected.

6.3 Fields Using ENCLOSED BY With TERMINATED BY

The following steps take place when a field uses an ENCLOSED BY clause and also uses a TERMINATED BY clause.

  1. Any whitespace at the beginning of the field is skipped.
  2. The first non-whitespace character found must be the start of a string that matches the first ENCLOSED BY delimiter. If it is not, then the row is rejected.
  3. If the first ENCLOSED BY delimiter is found, then the search for the second ENCLOSED BY delimiter begins.
  4. If two of the second ENCLOSED BY delimiters are found adjacent to each other, then they are interpreted as a single occurrence of the delimiter and included as part of the data for the field. The search then continues for the second instance of the ENCLOSED BY delimiter.
  5. If the end of the record is found before the second ENCLOSED BY delimiter is found, then the row is rejected.
  6. If the second ENCLOSED BY delimiter is found, then the parser looks for the TERMINATED BY delimiter. If the TERMINATED BY delimiter is anything other than WHITESPACE, then whitespace found between the end of the second ENCLOSED BY delimiter and the TERMINATED BY delimiter is skipped over.

Note: Only WHITESPACE is allowed between the second ENCLOSED BY delimiter and the TERMINATED BY delimiter. Any other characters will cause an error.

  1. The row is not rejected if the end of the record is found before the TERMINATED BY delimiter is found.

6.4 Fields Using OPTIONALLY ENCLOSED BY With TERMINATED BY

The following steps take place when a field uses an OPTIONALLY ENCLOSED BY clause and a TERMINATED BY clause.

  1. Any whitespace at the beginning of the field is skipped.
  2. The parser checks to see if the first non-whitespace character found is the start of a string that matches the first OPTIONALLY ENCLOSED BYdelimiter. If it is not, and the OPTIONALLY ENCLOSED BY delimiters are not present in the data, then the data for the field is read from the current position of the field up to, but not including, the first occurrence of the TERMINATED BY delimiter. If the TERMINATED BY delimiter is found in the first column position, then the field is null. If the end of the record is found before the TERMINATED BY delimiter, then all data up to the end of the record is considered part of the field.
  3. If the first OPTIONALLY ENCLOSED BY delimiter is found, then the search for the second OPTIONALLY ENCLOSED BY delimiter begins.
  4. If two of the second OPTIONALLY ENCLOSED BY delimiters are found adjacent to each other, then they are interpreted as a single occurrence of the delimiter and included as part of the data for the field. The search then continues for the second OPTIONALLY ENCLOSED BY delimiter.
  5. If the end of the record is found before the second OPTIONALLY ENCLOSED BY delimiter is found, then the row is rejected.
  6. If the OPTIONALLY ENCLOSED BY delimiter is present in the data, then the parser looks for the TERMINATED BY delimiter. If the TERMINATED BY delimiter is anything other than WHITESPACE, then whitespace found between the end of the second OPTIONALLY ENCLOSED BYdelimiter and the TERMINATED BY delimiter is skipped over.
  7. The row is not rejected if the end of record is found before the TERMINATED BY delimiter is found.

Be careful when you specify whitespace characters as the TERMINATED BY delimiter and are also using OPTIONALLY ENCLOSED BY. SQL*Loader strips off leading whitespace when looking for an OPTIONALLY ENCLOSED BY delimiter. If the data contains two adjacent TERMINATED BY delimiters in the middle of a record (usually done to set a field in the record to NULL), then the whitespace for the first TERMINATED BY delimiter will be used to terminate a field, but the remaining whitespace will be considered as leading whitespace for the next field rather than the TERMINATED BY delimiter for the next field. To load a NULL value, you must include the ENCLOSED BY delimiters in the data.

  1.  Conflicting Field Lengths for Character Data Types

A control file can specify multiple lengths for the character-data fields CHAR, DATE, and numeric EXTERNAL. If conflicting lengths are specified, then one of the lengths takes precedence. A warning is also issued when a conflict exists. This section explains which length is used.

7.1 Predetermined Size Fields

If you specify a starting position and ending position for one of these fields, then the length of the field is determined by these specifications. If you specify a length as part of the data type and do not give an ending position, the field has the given length. If starting position, ending position, and length are all specified, and the lengths differ, then the length given as part of the data type specification is used for the length of the field, as follows:

POSITION(1:10) CHAR(15)

In this example, the length of the field is 15.

7.2 Delimited Fields

If a delimited field is specified with a length, or if a length can be calculated from the starting and ending positions, then that length is the maximum length of the field. The specified maximum length is in bytes if byte-length semantics are used for the field, and in characters if character-length semantics are used for the field. If no length is specified or can be calculated from the start and end positions, then the maximum length defaults to 255 bytes. The actual length can vary up to that maximum, based on the presence of the delimiter.

If delimiters and also starting and ending positions are specified for the field, then only the position specification has any effect. Any enclosure or termination delimiters are ignored.

If the expected delimiter is absent, then the end of record terminates the field. If TRAILING NULLCOLS is specified, then remaining fields are null. If either the delimiter or the end of record produces a field that is longer than the maximum, then SQL*Loader rejects the record and returns an error.

7.3 Date Field Masks

The length of a date field depends on the mask, if a mask is specified.

The mask provides a format pattern, telling SQL*Loader how to interpret the data in the record. For example, assume the mask is specified as follows:

"Month dd, yyyy"

Then "May 3, 2012" would occupy 11 bytes in the record (with byte-length semantics), while "January 31, 2012" would occupy 16.

If starting and ending positions are specified, however, then the length calculated from the position specification overrides a length derived from the mask. A specified length such as DATE(12) overrides either of those. If the date field is also specified with terminating or enclosing delimiters, then the length specified in the control file is interpreted as a maximum length for the field.

  • Specifying Field Conditions

下面为pos_spec语法:

pos_spec:Specifies the starting and ending position of the comparison field in the logical record.

If you omit an ending position, then the length of the field is determined by the length of the comparison string. If the lengths are different, then the shorter field is padded. Character strings are padded with blanks, hexadecimal strings with zeros.

full_fieldname:full_fieldname is the full name of a field specified using dot notation. If the field col2 is an attribute of a column object col1, then when referring to col2 in one of the directives, you must use the notation col1.col2. The column name and the field name referencing or naming the same entity can be different, because the column name never includes the full name of the entity (no dot notation).

Operator:A comparison operator for either equal or not equal.

char_string:A string of characters enclosed in single or double quotation marks that is compared to the comparison field. If the comparison is true, then the current record is inserted into the table.

X'hex_string':A string of hexadecimal digits, where each pair of digits corresponds to one byte in the field. It is enclosed in single or double quotation marks. If the comparison is true, then the current record is inserted into the table.

BLANKS:Enables you to test a field to see if it consists entirely of blanks. BLANKS is required when you are loading delimited data and you cannot predict the length of the field, or when you use a multibyte character set that has multiple blanks.

  1. Comparing Fields to BLANKS

The BLANKS parameter makes it possible to determine if a field of unknown length is blank.

For example, use the following clause to load a blank field as null:

full_fieldname ... NULLIF column_name=BLANKS

The BLANKS parameter recognizes only blanks, not tabs. It can be used in place of a literal string in any field comparison. The condition is true whenever the column is entirely blank.

The BLANKS parameter also works for fixed-length fields. Using it is the same as specifying an appropriately sized literal string of blanks. For example, the following specifications are equivalent:

fixed_field CHAR(2) NULLIF fixed_field=BLANKS

fixed_field CHAR(2) NULLIF fixed_field="  "

There can be more than one blank in a multibyte character set. It is a good idea to use the BLANKS parameter with these character sets instead of specifying a string of blank characters.

  1. Comparing Fields to Literals

When a data field is compared to a literal string that is shorter than the data field, the string is padded. Character strings are padded with blanks, for example:

NULLIF (1:4)=" "

This example compares the data in position 1:4 with 4 blanks. If position 1:4 contains 4 blanks, then the clause evaluates as true.

Hexadecimal strings are padded with hexadecimal zeros, as in the following clause:

NULLIF (1:4)=X'FF'

This clause compares position 1:4 to hexadecimal 'FF000000'.

  • Using the WHEN, NULLIF, and DEFAULTIF Clauses

The results of a WHEN, NULLIF, or DEFAULTIF clause can be different depending on whether the clause specifies a field name or a position.

  1. If the WHEN, NULLIF, or DEFAULTIF clause specifies a field name, then SQL*Loader compares the clause to the evaluated value of the field. The evaluated value takes trimmed whitespace into consideration.
  2. If the WHENNULLIF, or DEFAULTIF clause specifies a position, then SQL*Loader compares the clause to the original logical record in the data file. No whitespace trimming is done on the logical record in that case.

Different results are more likely if the field has whitespace that is trimmed, or if the WHEN, NULLIF, or DEFAULTIF clause contains blanks or tabs or uses the BLANKS parameter. If you require the same results for a field specified by name and for the same field specified by position, then use the PRESERVE BLANKS option. The PRESERVE BLANKS option instructs SQL*Loader not to trim whitespace when it evaluates the values of the fields.

The results of a WHEN, NULLIF, or DEFAULTIF clause are also affected by the order in which SQL*Loader operates, as described in the following steps. SQL*Loader performs these steps in order, but it does not always perform all of them. Once a field is set, any remaining steps in the process are ignored. For example, if the field is set in Step 5, then SQL*Loader does not move on to Step 6.

  1. SQL*Loader evaluates the value of each field for the input record and trims any whitespace that should be trimmed (according to existing guidelines for trimming blanks and tabs).
  2. For each record, SQL*Loader evaluates any WHEN clauses for the table.
  3. If the record satisfies the WHEN clauses for the table, or no WHEN clauses are specified, then SQL*Loader checks each field for a NULLIF clause.
  4. If a NULLIF clause exists, then SQL*Loader evaluates it.
  5. If the NULLIF clause is satisfied, then SQL*Loader sets the field to NULL.
  6. If the NULLIF clause is not satisfied, or if there is no NULLIF clause, then SQL*Loader checks the length of the field from field evaluation. If the field has a length of 0 from field evaluation (for example, it was a null field, or whitespace trimming resulted in a null field), then SQL*Loader sets the field to NULL. In this case, any DEFAULTIF clause specified for the field is not evaluated.
  7. If any specified NULLIF clause is false or there is no NULLIF clause, and if the field does not have a length of 0 from field evaluation, then SQL*Loader checks the field for a DEFAULTIF clause.
  8. If a DEFAULTIF clause exists, then SQL*Loader evaluates it.
  9. If the DEFAULTIF clause is satisfied, then the field is set to 0 if the field in the data file is a numeric field. It is set to NULL if the field is not a numeric field. The following fields are numeric fields and will be set to 0 if they satisfy the DEFAULTIF clause:

BYTEINT,SMALLINT,INTEGER,FLOAT,DOUBLE,ZONED,(packed) DECIMAL, Numeric EXTERNAL (INTEGERFLOATDECIMAL, and ZONED)

  1. If the DEFAULTIF clause is not satisfied, or if there is no DEFAULTIF clause, then SQL*Loader sets the field with the evaluated value from Step 1.

The order in which SQL*Loader operates could cause results that you do not expect. For example, the DEFAULTIF clause may look like it is setting a numeric field to NULL rather than to 0.

Note: As demonstrated in these steps, the presence of NULLIF and DEFAULTIF clauses results in extra processing that SQL*Loader must perform. This can affect performance. Note that during Step 1, SQL*Loader will set a field to NULL if its evaluated length is zero. To improve performance, consider whether it might be possible for you to change your data to take advantage of this. The detection of NULLs as part of Step 1 occurs much more quickly than the processing of a NULLIF or DEFAULTIF clause.

For example, a CHAR(5) will have zero length if it falls off the end of the logical record or if it contains all blanks and blank trimming is in effect. A delimited field will have zero length if there are no characters between the start of the field and the terminator.

Also, for character fields, NULLIF is usually faster to process than DEFAULTIF (the default for character fields is NULL).

Examples

In the examples, a blank or space is indicated with a period (.).

Assume that col1 and col2 are VARCHAR2(5) columns in the database.

Example1 DEFAULTIF Clause Is Not Evaluated

Control file:

(col1 POSITION (1:5),

 col2 POSITION (6:8) CHAR INTEGER EXTERNAL DEFAULTIF col1 = 'aname')

Data file

aname...

In this example, col1 for the row evaluates to aname. col2 evaluates to NULL with a length of 0 (it is ... but the trailing blanks are trimmed for a positional field).

When SQL*Loader determines the final loaded value for col2, it finds no WHEN clause and no NULLIF clause. It then checks the length of the field, which is 0 from field evaluation. Therefore, SQL*Loader sets the final value for col2 to NULL. The DEFAULTIF clause is not evaluated, and the row is loaded as aname for col1 and NULL for col2.

Example2 DEFAULTIF Clause Is Evaluated

The control file specifies:

.

.

.

PRESERVE BLANKS

.

.

.

(col1 POSITION (1:5),

 col2 POSITION (6:8) INTEGER EXTERNAL DEFAULTIF col1 = 'aname'

The data file contains:

aname...

In this example, col1 for the row again evaluates to aname. col2 evaluates to '...' because trailing blanks are not trimmed when PRESERVE BLANKS is specified.

When SQL*Loader determines the final loaded value for col2, it finds no WHEN clause and no NULLIF clause. It then checks the length of the field from field evaluation, which is 3, not 0.

Then SQL*Loader evaluates the DEFAULTIF clause, which evaluates to true because col1 is aname, which is the same as aname.

Because col2 is a numeric field, SQL*Loader sets the final value for col2 to 0. The row is loaded as aname for col1 and as 0 for col2.

Example3 DEFAULTIF Clause Specifies a Position

The control file specifies:

(col1 POSITION (1:5),

 col2 POSITION (6:8) INTEGER EXTERNAL DEFAULTIF (1:5) = BLANKS)

The data file contains:

.....123

In this example, col1 for the row evaluates to NULL with a length of 0 (it is ..... but the trailing blanks are trimmed). col2 evaluates to 123.

When SQL*Loader sets the final loaded value for col2, it finds no WHEN clause and no NULLIF clause. It then checks the length of the field from field evaluation, which is 3, not 0.

Then SQL*Loader evaluates the DEFAULTIF clause. It compares (1:5) which is ..... to BLANKS, which evaluates to true. Therefore, because col2 is a numeric field (integer EXTERNAL is numeric), SQL*Loader sets the final value for col2 to 0. The row is loaded as NULL for col1 and 0 for col2.

Example4 DEFAULTIF Clause Specifies a Field Name

The control file specifies:

(col1 POSITION (1:5),

 col2 POSITION(6:8) INTEGER EXTERNAL DEFAULTIF col1 = BLANKS)

The data file contains:

.....123

In this example, col1 for the row evaluates to NULL with a length of 0 (it is ..... but the trailing blanks are trimmed). col2 evaluates to 123.

When SQL*Loader determines the final value for col2, it finds no WHEN clause and no NULLIF clause. It then checks the length of the field from field evaluation, which is 3, not 0.

Then SQL*Loader evaluates the DEFAULTIF clause. As part of the evaluation, it checks to see that col1 is NULL from field evaluation. It is NULL, so the DEFAULTIF clause evaluates to false. Therefore, SQL*Loader sets the final value for col2 to 123, its original value from field evaluation. The row is loaded as NULL for col1 and 123 for col2.

  • Loading All-Blank Fields

Fields that are totally blank cause the record to be rejected. To load one of these fields as NULL, use the NULLIF clause with the BLANKS parameter.

If an all-blank CHAR field is surrounded by enclosure delimiters, then the blanks within the enclosures are loaded. Otherwise, the field is loaded as NULL.

A DATE or numeric field that consists entirely of blanks is loaded as a NULL field.

  • Trimming Whitespace

Blanks, tabs, and other nonprinting characters (such as carriage returns回车 and line feeds换行) constitute whitespace.

Leading whitespace occurs at the beginning of a field. Trailing whitespace occurs at the end of a field. Depending on how the field is specified, whitespace may or may not be included when the field is inserted into the database. 

  1. Data Types for Which Whitespace Can Be Trimmed

只有以下字符类的数据类型才有可能trim:

  1. CHAR data type
  2. Datetime and interval data types
  3. Numeric EXTERNAL data types:

INTEGER EXTERNAL

FLOAT EXTERNAL

(packed) DECIMAL EXTERNAL

ZONED (decimal) EXTERNAL

Note:Although VARCHAR and VARCHARC fields also contain character data, these fields are never trimmed. These fields include all whitespace that is part of the field in the data file.

  1. Specifying Field Length for Data Types for Which Whitespace Can Be Trimmed

There are two ways to specify field length ---它们均会被trim

  1. If a field has a constant length that is defined in the control file with a position specification or the data type and length, then it has a predetermined size.

loc POSITION(19:31)

loc CHAR(14)

  1. If a field's length is not known in advance, but depends on indicators in the record, then the field is delimited, using either enclosure or termination delimiters.

loc TERMINATED BY "." OPTIONALLY ENCLOSED BY '|'

Note: If a position specification with start and end values is defined for a field that also has enclosure or termination delimiters defined, then only the position specification has any effect. The enclosure and termination delimiters are ignored.

  1. Relative Positioning of Fields
  1. No Start Position Specified for a Field

When a starting position is not specified for a field, it begins immediately after the end of the previous field. 

field1使用数据类型长度(不会去Leading whitespace),field2未指定起始点,这时不会trim

  1. Previous Field Terminated by a Delimiter

If the previous field (Field 1) is terminated by a delimiter, then the next field begins immediately after the delimiter. 

结果与上面相同

  1. Previous Field Has Both Enclosure and Termination Delimiters

When a field is specified with both enclosure delimiters and a termination delimiter, then the next field starts after the termination delimiter. 

这里注意是去了Leading whitespace

  1. Leading Whitespace

Fields do not include leading whitespace in the following cases:

  1. Previous Field Terminated by Whitespace 

If the previous field is TERMINATED BY WHITESPACE, then all whitespace after the field acts as the delimiter. The next field starts at the next nonwhitespace character. 

Leading whitespace不会被trim,但之后字段会trim

  1. Optional Enclosure Delimiters

Leading whitespace不会被trim,但之后字段会trim

Leading whitespace is also removed from a field when optional enclosure delimiters are specified but not present.

Whenever optional enclosure delimiters are specified, SQL*Loader scans forward, looking for the first enclosure delimiter. If an enclosure delimiter is not found, then SQL*Loader skips over whitespace, eliminating it from the field. The first nonwhitespace character signals the start of the field.

Unlike the case when the previous field is TERMINATED BY WHITESPACE, this specification removes leading whitespace even when a starting position is specified for the current field.

  1. Trimming Trailing Whitespace

Trailing whitespace is always trimmed from character-data fields that have a predetermined size.These are the only fields for which trailing whitespace is always trimmed.

  1. Trimming Enclosed Fields

If a field is enclosed, or terminated and enclosed, then any whitespace outside the enclosure delimiters is not part of the field. Any whitespace between the enclosure delimiters belongs to the field, whether it is leading or trailing whitespace.

  1. How the PRESERVE BLANKS Option Affects Whitespace Trimming

To prevent whitespace trimming in all CHARDATE, and numeric EXTERNAL fields, you specify PRESERVE BLANKS as part of the LOAD statement in the control file.

PRESERVE BLANKS可以在全局设置,也可以在某个字段上设置,如下面为字段级设置

c1 INTEGER EXTERNAL(10) PRESERVE BLANKS DEFAULTIF c1=BLANKS

你也可以在全局指定PRESERVE BLANKS然后在字段级设置为no:

c1 INTEGER EXTERNAL(10) NO PRESERVE BLANKS

  1. How [NO] PRESERVE BLANKS Works with Delimiter Clauses

Delimiter clauses affect PRESERVE BLANKS in the following cases:

  1. Leading whitespace is left intact when optional enclosure delimiters are not present
  2. Trailing whitespace is left intact when fields are specified with a predetermined size

下面数据是用_来代替空格表示的:

__aa__,

Suppose this field is loaded with the following delimiter clause:

TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'

In such a case, if PRESERVE BLANKS is specified, then both the leading whitespace and the trailing whitespace are retained. If PRESERVE BLANKS is not specified, then the leading whitespace is trimmed.

Now suppose the field is loaded with the following clause:

TERMINATED BY WHITESPACE

In such a case, if PRESERVE BLANKS is specified, then it does not retain the space at the beginning of the next field, unless that field is specified with a POSITION clause that includes some of the whitespace. Otherwise, SQL*Loader scans past all whitespace at the end of the previous field until it finds a nonblank, nontab character.

  • Applying SQL Operators to Fields

A wide variety of SQL operators can be applied to field data with the SQL string. This string can contain any combination of SQL expressions that are recognized by the Oracle database as valid for the VALUES clause of an INSERT statement.

In general, any SQL function that returns a single value that is compatible with the target column's data type can be used. SQL strings can be applied to simple scalar column types and also to user-defined complex types such as column objects and collections.

SQL expression的bind variable名要与对应列名相同(注是对应列而不当前列),如果列名未使用双引号,大小写可以不同,下面例子中:first, :FIRST, :\"FIRST\"均可以。外层要用双引号,内层如有双引号要转义

The column name and the name of the column in a SQL string bind variable must, with the interpretation of SQL identifier rules, correspond to the same column. But the two names do not necessarily have to be written exactly the same way, as in the following example:

LOAD DATA

INFILE *

APPEND INTO TABLE XXX

( "Last"   position(1:7)     char   "UPPER(:\"Last\")"

  first   position(8:15)    char   "UPPER(:first || :FIRST || :\"FIRST\")"

)

BEGINDATA

Grant  Phil

Taylor Jason

Note the following when you are using SQL strings:

  1. The execution of SQL strings is not considered to be part of field setting. Rather, when the SQL string is executed it uses the result of any field setting and NULLIF or DEFAULTIF clauses. So, the evaluation order is as follows (steps 1 and 2 are a summary of the steps described in Using the WHEN_ NULLIF_ and DEFAULTIF Clauses):
  1. Field setting is done.
  2. Any NULLIF or DEFAULTIF clauses are applied (and that may change the field setting results for the fields that have such clauses). When NULLIF and DEFAULTIF clauses are used with a SQL expression, they affect the field setting results, not the final column results.
  3. Any SQL expressions are evaluated using the field results obtained after completion of Steps 1 and 2. The results are assigned to the corresponding columns that have the SQL expressions. (If there is no SQL expression present, then the result obtained from Steps 1 and 2 is assigned to the column.)

  1. If your control file specifies character input that has an associated SQL string, then SQL*Loader makes no attempt to modify the data. This is because SQL*Loader assumes that character input data that is modified using a SQL operator will yield results that are correct for database insertion.  转化数据发生在数据库端的insert语句中
  2. The SQL string must appear after any other specifications for a given column.
  3. The SQL string must be enclosed in double quotation marks.
  4. If a SQL string contains a column name that references a column object attribute, then the full object attribute name must be used in the bind variable. Each attribute name in the full name is an individual identifier. Each identifier is subject to the SQL identifier quoting rules, independent of the other identifiers in the full name. For example, suppose you have a column object named CHILD with an attribute name of "HEIGHT_%TILE". (Note that the attribute name is in double quotation marks.) To use the full object attribute name in a bind variable, any one of the following formats would work

:CHILD.\"HEIGHT_%TILE\"

:child.\"HEIGHT_%TILE\"

Enclosing the full name (:\"CHILD.HEIGHT_%TILE\") generates a warning message that the quoting rule on an object attribute name used in a bind variable has changed. The warning is only to suggest that the bind variable be written correctly; it will not cause the load to abort. The quoting rule was changed because enclosing the full name in quotation marks would have caused SQL to interpret the name as one identifier rather than a full column object attribute name consisting of multiple identifiers.

  1. The SQL string is evaluated after any NULLIF or DEFAULTIF clauses, but before a date mask.

注:这里是评估,SQL string还是要写在date mask之后的

  1. If the Oracle database does not recognize the string, then the load terminates in error. If the string is recognized, but causes a database error, then the row that caused the error is rejected.
  2. SQL strings are required when using the EXPRESSION parameter in a field specification. 如果字段指定了EXPRESSION 选项则必须使用SQL表达式
  3. The SQL string cannot reference fields that are loaded using OIDSIDREF, or BFILE. Also, it cannot reference filler fields or other fields which use SQL strings.
  4. In direct path mode, a SQL string cannot reference a VARRAY, nested table, or LOB column. This also includes a VARRAY, nested table, or LOB column that is an attribute of a column object.
  5. The SQL string cannot be used on RECNUMSEQUENCECONSTANT, or SYSDATE fields.
  6. The SQL string cannot be used on LOBs, BFILEs, XML columns, or a file that is an element of a collection.
  7. In direct path mode, the final result that is returned after evaluation of the expression in the SQL string must be a scalar data type. That is, the expression may not return an object or collection data type when performing a direct path load.

  1. Referencing Fields

LOAD DATA

INFILE *

APPEND INTO TABLE YYY

(

 field1  POSITION(1:6) CHAR "LOWER(:field1)"

 field2  CHAR TERMINATED BY ','

         NULLIF ((1) = 'a') DEFAULTIF ((1)= 'b')

         "RTRIM(:field2)",

 field3  CHAR(7) "TRANSLATE(:field3, ':field1', ':1')",

 field4  COLUMN OBJECT

 (

  attr1  CHAR(3) NULLIF field4.attr2='ZZ' "UPPER(:field4.attr3)",

  attr2  CHAR(2),

  attr3  CHAR(3)  ":field4.attr1 + 1"

 ),

 field5  EXPRESSION "MYFUNC(:FIELD4, SYSDATE)"

)

BEGINDATA

ABCDEF1234511  ,:field1500YYabc

abcDEF67890    ,:field2600ZZghl

Note that bind variables enclosed in single quotation marks are treated as text literals, not as bind variables.In the following line, :field1 is not enclosed in single quotation marks and is therefore interpreted as a bind variable:

field1 POSITION(1:6) CHAR "LOWER(:field1)"

In the following line, ':field1' and ':1' are enclosed in single quotation marks and are therefore treated as text literals and passed unchanged to the TRANSLATE function:

field3 CHAR(7) "TRANSLATE(:field3, ':field1', ':1')"

For each input record read, the value of the field referenced by the bind variable will be substituted for the bind variable. For example, the value ABCDEF in the first record is mapped to the first field :field1. This value is then passed as an argument to the LOWER function.

A bind variable in a SQL string need not reference the current field. In the preceding example, the bind variable in the SQL string for the field4.attr1 field references the field4.attr3 field. The field4.attr1 field is still mapped to the values 500 and NULL (because the NULLIF field4.attr2='ZZ' clause is TRUE for the second record) in the input records, but the final values stored in its corresponding columns are ABC and GHL.

The field4.attr3 field is mapped to the values ABC and GHL in the input records, but the final values stored in its corresponding columns are 500 + 1 = 501 and NULL because the SQL expression references field4.attr1. (Adding 1 to a NULL field still results in a NULL field.)

  1. Combinations of SQL Operators

Multiple operators can also be combined, as in the following examples:

field1 POSITION(*+3) INTEGER EXTERNAL "TRUNC(RPAD(:field1,6,'0'), -2)"

field1 POSITION(1:8) INTEGER EXTERNAL "TRANSLATE(RTRIM(:field1),'N/A', '0')"

field1 CHAR(10) "NVL( LTRIM(RTRIM(:field1)), 'unknown' )"

  1. Using SQL Strings with a Date Mask

When a SQL string is used with a date mask, the date mask is evaluated after the SQL string.

field1 DATE "dd-mon-yy" "RTRIM(:field1)"

SQL*Loader internally generates and inserts the following:

TO_DATE(RTRIM(<field1_value>), 'dd-mon-yyyy')

Note that when using the DATE field data type with a SQL string, a date mask is required. This is because SQL*Loader assumes that the first quoted string it finds after the DATE parameter is a date mask. For instance, the following field specification would result in an error (ORA-01821: date format not recognized):

field1 DATE "RTRIM(TO_DATE(:field1, 'dd-mon-yyyy'))"

In this case, a simple workaround is to use the CHAR data type.

  1. Interpreting Formatted Fields

It is possible to use the TO_CHAR operator to store formatted dates and numbers.

field1 ... "TO_CHAR(:field1, '$09999.99')"

This example could store numeric input data in formatted form, where field1 is a character column in the database. This field would be stored with the formatting characters (dollar sign, period, and so on) already in place.

You have even more flexibility, however, if you store such values as numeric quantities or dates. You can then apply arithmetic functions to the values in the database, and still select formatted values for your reports.

  1. Using SQL Strings to Load the ANYDATA Database Type

The ANYDATA database type can contain data of different types.

To load the ANYDATA type using SQL*loader, it must be explicitly constructed by using a function call. The function is called using support for SQL strings as has been described in this section.

For example, suppose you have a table with a column named miscellaneous which is of type ANYDATA. You could load the column by doing the following, which would create an ANYDATA type containing a number.

LOAD DATA

INFILE *

APPEND INTO TABLE  ORDERS

(

miscellaneous CHAR "SYS.ANYDATA.CONVERTNUMBER(:miscellaneous)"

)

BEGINDATA

4

There can also be more complex situations in which you create an ANYDATA type that contains a different type depending upon the values in the record. To do this, you could write your own PL/SQL function that would determine what type should be in the ANYDATA type, based on the value in the record, and then call the appropriate ANYDATA.Convert*() function to create it.

  • Using SQL*Loader to Generate Data for Input

使用sqlldr生成数据用于导入

  1. Loading Data Without Files

It is possible to use SQL*Loader to generate data by specifying only sequences, record numbers, system dates, constants, and SQL string expressions as field specifications.

SQL*Loader inserts as many records as are specified by the LOAD statement. The SKIP parameter is not permitted in this situation.

只有在使用了when时才会读数据文件进行判断

SQL*Loader is optimized for this case. Whenever SQL*Loader detects that only generated specifications are used, it ignores any specified data file—no read I/O is performed.

In addition, no memory is required for a bind array. If there are any WHEN clauses in the control file, then SQL*Loader assumes that data evaluation is necessary, and input records are read.

  1. Setting a Column to a Constant Value

To set a column to a constant value, use CONSTANT followed by a value:

CONSTANT  value

CONSTANT data is interpreted by SQL*Loader as character input. It is converted, as necessary, to the database column type.

You may enclose the value within quotation marks, and you must do so if it contains whitespace or reserved words. Be sure to specify a legal value for the target column. If the value is bad, then every record is rejected.

Numeric values larger than 2^32 - 1 (4,294,967,295) must be enclosed in quotation marks.

Note: Do not use the CONSTANT parameter to set a column to null. To set a column to null, do not specify that column at all. Oracle automatically sets that column to null when loading the record. The combination of CONSTANT and a value is a complete column specification.

  1. Setting a Column to an Expression Value

Use the EXPRESSION parameter after a column name to set that column to the value returned by a SQL operator or specially written PL/SQL function.

column_name EXPRESSION "SQL string"

In both conventional path mode and direct path mode, the EXPRESSION parameter can be used to load the default value into column_name:

column_name EXPRESSION "DEFAULT"

Note that if DEFAULT is used and the mode is direct path, then use of a sequence as a default will not work.

  1. Setting a Column to the Data File Record Number

Use the RECNUM parameter after a column name to set that column to the number of the logical record from which that record was loaded. 用于生成记录号

Records are counted sequentially from the beginning of the first data file, starting with record 1. RECNUM is incremented as each logical record is assembled. Thus it increments for records that are discarded, skipped, rejected, or loaded. If you use the option SKIP=10, then the first record loaded has a RECNUM of 11.

column_name RECNUM

  1. Setting a Column to the Current Date

A column specified with SYSDATE gets the current system date, as defined by the SQL language SYSDATE parameter.

column_name SYSDATE

The database column must be of type CHAR or DATE. If the column is of type CHAR, then the date is loaded in the form 'dd-mon-yy.' After the load, it can be loaded only in that form. If the system date is loaded into a DATE column, then it can be loaded in a variety of forms that include the time and the date.

A new system date/time is used for each array of records inserted in a conventional path load and for each block of records loaded during a direct path load.

  1. Setting a Column to a Unique Sequence Number

The SEQUENCE parameter ensures a unique value for a particular column. SEQUENCE increments for each record that is loaded or rejected.

It does not increment for records that are discarded or skipped.


COUNT:The sequence starts with the number of records already in the table plus the increment.

MAX:The sequence starts with the current maximum value for the column plus the increment.

Integer:Specifies the specific sequence number to begin with.

Incr:The value that the sequence number is to increment after a record is loaded or rejected. This is optional. The default is 1.

如果有序号插入有问题,会把有问题的序号空出来

If a record is rejected (that is, it has a format error or causes an Oracle error), then the generated sequence numbers are not reshuffled to mask this. If four rows are assigned sequence numbers 10, 12, 14, and 16 in a particular column, and the row with 12 is rejected, then the three rows inserted are numbered 10, 14, and 16, not 10, 12, and 14. This allows the sequence of inserts to be preserved despite data errors. When you correct the rejected data and reinsert it, you can manually set the columns to agree with the sequence.

  1. Generating Sequence Numbers for Multiple Tables

Because a unique sequence number is generated for each logical input record, rather than for each table insert, the same sequence number can be used when inserting data into multiple tables.

This is frequently useful.

Sometimes, however, you might want to generate different sequence numbers for each INTO TABLE clause. For example, your data format might define three logical records in every input record. In that case, you can use three INTO TABLE clauses, each of which inserts a different part of the record into the same table. When you use SEQUENCE(MAX), SQL*Loader will use the maximum from each table, which can lead to inconsistencies in sequence numbers.

To generate sequence numbers for these records, you must generate unique numbers for each of the three inserts. Use the number of table-inserts per record as the sequence increment, and start the sequence numbers for each insert with successive numbers.

Example: Generating Different Sequence Numbers for Each Insert

Suppose you want to load the following department names into the dept table. Each input record contains three department names, and you want to generate the department numbers automatically.

Accounting     Personnel      Manufacturing

Shipping       Purchasing     Maintenance

...

You could use the following control file entries to generate unique department numbers:

INTO TABLE dept

(deptno  SEQUENCE(1, 3),

 dname   POSITION(1:14) CHAR)

INTO TABLE dept

(deptno  SEQUENCE(2, 3),

 dname   POSITION(16:29) CHAR)

INTO TABLE dept

(deptno  SEQUENCE(3, 3),

 dname   POSITION(31:44) CHAR)

The first INTO TABLE clause generates department number 1, the second number 2, and the third number 3. They all use 3 as the sequence increment (the number of department names in each record). This control file loads Accounting as department number 1, Personnel as 2, and Manufacturing as 3.

The sequence numbers are then incremented for the next record, so Shipping loads as 4, Purchasing as 5, and so on.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值