ORA-01401 / ORA-12899 当imp数据到UTF8或其他多字符类型数据库

个人总结


oracle软件毕竟不是国人自己的东西,也不开源.所以有一个metalink账号在学习,排错中是十分有必要的.老外写的文档真够详细的.
下班了,待续.

20111008 接上
关于从单字符集数据库向多字符集数据库导入报错(我实验环境只是10.2.0.1环境)的解决.metalink文件中给出的A方法是用csscan查出是哪个表的哪列引起了报错的原因.B修改源表列或在目的库上创建符合导入条件的表.

ORA-01401 / ORA-12899 While Importing Data In An AL32UTF8 / UTF8 (Unicode) Or Other Multibyte NLS_CHARACTERSET Database. [ID 1297961.1]

 Modified 26-SEP-2011     Type PROBLEM     Status PUBLISHED 

In this Document
  Symptoms
  Changes
  Cause
  Solution
     A) Use Csscan to check for "Truncation" data on the source side
     B) Solving the ORA-01401 / ORA-12899 error :
  References


Applies to:

Oracle Server - Enterprise Edition - Version: 8.1.7.4 to 11.2.0.2 - Release: 8.1.7 to 11.2
Information in this document applies to any platform.

Symptoms


Import (both the old imp tool or datapump) give errors like
IMP-00019: row rejected due to ORACLE error 1401
IMP-00003: ORACLE error 1401 encountered
ORA-01401: inserted value too large for column
or
IMP-00058: ORACLE error 1461 encountered
ORA-01461: can bind a LONG value only for insert into a LONG column
or from 10g onwards:
IMP-00019: row rejected due to ORACLE error 12899
IMP-00003: ORACLE error 12899 encountered
ORA-12899: value too large for column "SCHEMA_NAME"."TABLE_NAME"."COLUMN_NAME" (actual: , maximum: )
or
ORA-02374: conversion error loading table "SCHEMA_NAME"."TABLE_NAME"
ORA-12899: value too large for column COLUMN_NAME (actual: , maximum: )
or
ORA-02375: conversion error loading table "SCHEMA_NAME.TABLE_NAME" partition "PARTITION_NAME"
ORA-12899: value too large for column COLUMN_NAME (actual: , maximum: )
ORA-02372: data for row:
or
ORA-31693: Table data object "SCHEMA_NAME"."TABLE_NAME" failed to load/unload and is being skipped due to error:
ORA-12899: value too large for column "SCHEMA_NAME"."TABLE_NAME"."COLUMN_NAME" (actual: , maximum: )

Note that with CHAR semantics on the column the ORA-12899 error gives the amount of characters for "actual" value, hence one can see an at first sight contradicting error like ORA-12899: value too large for column COMMENT (actual: 2456, maximum: 4000) - 2456 is here expressed in characters, not bytes

Please note:

* If you want to change a WHOLE database to UTF8 or AL32UTF8 then please follow Note 260192.1 Changing the NLS_CHARACTERSET to AL32UTF8/UTF8 (Unicode)
This note is intended for people who want to export/import only subset of a database, a certain user or a collection of tables.

* When using UTF8 / AL32UTF8 or other multibyte character sets is might be good to read Note 788156.1 AL32UTF8 / UTF8 (Unicode) Database Character Set Implications

* Do NOT use Expdp/Impdp when going to (AL32)UTF8 or an other multibyte characterset on ALL 10g versions lower then 10.2.0.4 (including 10.1.0.5). Also 11.1.0.6 is affected.
It will provoke data corruption unless Patch 5874989 is applied on the Impdp side. Expdp is not affected, hence the data in the dump file is correct.Also the "old" exp/imp tools are not affected.
This problem is fixed in the 10.2.0.4 and 11.1.0.7 patch set. Fixed in 11.2.0.1 and up
For windows the fix is included in
10.1.0.5.0 Patch 20 (10.1.0.5.20P) or later, see Note 276548.1
10.2.0.3.0 Patch 11 (10.2.0.3.11P) or later, see Note 342443.1

Changes


Importing data from an US7ASCII or 8 bit (we 8mswin1252 , we 8iso8859p1 etc etc) into an UTF8, AL32UTF8 or other multi byte character set database.

The NLS_CHARACTERSET of a database can be found using this select:
SQL> select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET';

VALUE
----------------------------------------
AL32UTF8

Cause

UTF8/AL32UTF8 is a varying width characterset, which means that the code for 1 character can be 1 , 2 , 3 or 4 bytes long. This is a big difference with character sets like WE8ISO8559P1 or WE8MSWIN1252 where 1 character is always 1 byte.

US7ASCII characters (A-Z,a-Z,0-1 and ./?,*# etc..) are in AL32UTF8 always 1 byte, so for most West European languages the impact is rather limited as only "special" characters like accented e a etc will use more bytes than in an 8 bit characterset.
When converting a Cyrillic or Arabic system to AL32UTF8 then all the Cyrillic or Arabian data will take considerable more bytes to store.

More info on how UTF8/AL32UTF8 encoding works can be found in Note 69518.1 Storing and Checking Character Codepoints in a UTF8/AL32UTF8 (Unicode) database.

This also means that the columns need to be big enough to store the additional bytes. By default the column size is defined in BYTES and not in CHARACTERS. By default a "create table ( VARCHAR2 (2000));" means that that column can store 2000 bytes.

If import give errors like ORA-01401: inserted value too large for column
or from 10g onwards:ORA-12899: value too large for column COMMENT (actual: , maximum: ) then this indicates that the column definition is not large enough to handle the "grow in bytes" of the data during the conversion to the new NLS_CHARACTERSET.

While not often seen, it is also possible to see:
IMP-00058: ORACLE error 1461 encountered
ORA-01461: can bind a LONG value only for insert into a LONG column
The cause is however the same as for an ORA-01401 / ORA-12899 error.
VARCHAR2 can only be 4000 bytes - if during conversion to AL32UTF8 the data expands beyond 4000 BYTES Oracle is forced to make this bind internally into a LONG and this cannot be inserted in a non-LONG column.

Solution


The only way to know what data will be giving this error and what the length of the data in the target characterset will be is to use csscan on the SOURCE side before taking an export and then take , based on the csscan result, appropriate actions.
All data giving ORA-01401 / ORA-12899 is logged as "Truncation" in the csscan result.

A) Use Csscan to check for "Truncation" data on the source side

Install Csscan on the SOURCE ( the database that is NOT AL32UTF8) database
Note 458122.1 Installing and configuring CSSCAN in 8i and 9i
Note 745809.1 Installing and configuring CSSCAN in 10g and 11g

To have an overview of the Csscan output and what it means please see Note 444701.1 Csscan output explained

Then run csscan depending on what you are exporting, a list of tables or a certain user.

Note that the examples use TOCHAR=AL32UTF8 assuming the target database has an AL32UTF8 NLS_CHARACTERSET , if your target characterset is different please adapt.

This will scan the user scott (you can only specify only one user)
$ csscan \"sys/@ as sysdba\" LOG=/tmp/expcheck USER=scott CAPTURE=Y TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=3

This will scan the tables SCOTT.DEPT and SCOTT.EMP on most Unix shell's (note the ' ' around the TABLE parameter value, other wise you might see errors like 0403-057 Syntax error at line 21 : `(' is not expected. ):
$ csscan \"sys/@ as sysdba\" LOG=/tmp/expcheck TABLE='(SCOTT.DEPT,SCOTT.EMP)' CAPTURE=Y TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=2

This will scan the tables SCOTT.DEPT and SCOTT.EMP on Windows (note there are NO ' ' around the TABLE parameter value):
c:\>csscan \"sys/@ as sysdba\" LOG=c:\temp\expcheck TABLE=(SCOTT.DEPT,SCOTT.EMP) CAPTURE=Y TOCHAR=AL32UTF8 ARRAY=1024000 PROCESS=2

If you have a large number of tables, or to avoid problem with escaping differences between different OS's for the command line it is useful to create a parameter file and use this.

Example contents of csscan.par:
LOG=/tmp/expcheck
TABLE=(SCOTT.DEPT,SCOTT.EMP)
FULL=N
CAPTURE=Y
TOCHAR=AL32UTF8
  PROCESS=6
ARRAY=1024000

And then run csscan with this parameter file:
$ csscan \"sys/@ as sysdba\" PARFILE=/tmp/csscan.par
or
C:\>csscan \"sys/@ as sysdba\" PARFILE=c:\temp\csscan.par

Csscan will create 3 files :
expcheck.out a log of the output of csscan
expcheck.txt the Database Scan Summary Report
expcheck.err contains the rowid's of the Convertible , Truncation and Lossy rows reported in expcheck.txt

You can only specify on user at the time, if you are exporting the majority of a database then use FULL=Y as documented in Note 260192.1

$ csscan \"sys/@ as sysdba\" FULL=Y TOCHAR=AL32UTF8 LOG=expcheck CAPTURE=Y ARRAY=1000000 PROCESS=2

* Always run Csscan connecting with a 'sysdba' connection/user, do not use the "system" or "csmig" user.

* The PROCESS= parameter influences the load on your system, the higher this is (6 or 8 for example) the faster Csscan will be done, the lower this is the less impact it will have on your system. Adapt if needed.

* Do not specify the TONCHAR or FROMNCHAR csscan parameters , those are to change the NLS_NCHAR_CHARACTERSET. Again they are not needed and should not be specified.

* The csscan SUPPRESS parameter limits the size of the .err file by limiting the amount of information logged / table. Using SUPPRESS=1000 will log max 1000 rows for each table in the .err file. It will not affect the information in the .txt file. It WILL affect the data logged in the .err file. This is mainly useful for the first scan of big databases, if you have no idea how much "Convertible" or "Lossy" there is in a database then this will avoid that the .err file becomes 100's of MB big and it limits also the space used by the csscan tables under the Csmig schema.
note that to have correct result of the following select you should NOT use the Csscan  SUPPRESS option.

Once Csscan has been run you then need to check the .txt file

If the is any "Lossy" data - this is data that CANNOT be converted to the new NLS_CHARACTERSET - you need to check further and see why this data is "Lossy", discussing "Lossy" is outside the scope of this note, if you have lossy then see Note 444701.1 Csscan output explained, the section about "Lossy" and Note 260192.1 Changing the NLS_CHARACTERSET to AL32UTF8/UTF8 (Unicode)

This select will give all the lossy objects found in the last Cssan run:
conn / AS sysdba
SELECT DISTINCT z.owner_name
|| '.'
|| z.table_name
|| '('
|| z.column_name
|| ') - '
|| z.column_type
|| ' ' LossyColumns
FROM csmig.csmv$errors z
WHERE z.error_type ='DATA_LOSS'
ORDER BY LossyColumns
/


For solving the ORA-01401 / ORA-12899 error you need to know which columns are logged as "Truncation" and what the new size of the data will be after import.

You can find that in the expcheck.err file as "Max Post Conversion Data Size"
For example, check in the expcheck.txt file wich table has "Truncation", let's assume you have there a row that say's:

-- snip from expcheck.txt
[Distribution of Convertible, Truncated and Lossy Data by Table]

USER.TABLE Convertible Truncation Lossy
--------------------- ---------------- ---------------- ----------------
...
SCOTT.TESTUTF8 69 6 0
...

then look in the expcheck.err file for "TESTUTF8" until the "Max Post Conversion Data Size" is bigger then the column size for that table.

-- snip from expcheck.err
User : SCOTT
Table : TESTUTF8
Column: ITEM_NAME
Type : VARCHAR2(80)
Number of Exceptions : 6
Max Post Conversion Data Size: 81
the max size after going to AL32UTF8 will be 81 bytes for this column.

Or you can use this select to have a list of the columns that have "Truncation" and the new size in bytes that is minimally needed:

conn / AS sysdba
SET serveroutput ON
DECLARE
newmaxsz NUMBER;
BEGIN
FOR rec IN
( SELECT DISTINCT u.owner_name,
u.table_name,
u.column_name ,
u.column_type,
u.owner_id,
u.table_id,
u.column_id,
u.column_intid
FROM csmv$errors u
WHERE u.error_type='EXCEED_SIZE'
ORDER BY u.owner_name,
u.table_name,
u.column_name
)
LOOP
SELECT MAX(cnvsize)
INTO newmaxsz
FROM csm$errors
WHERE usr# =rec.owner_id
AND obj# =rec.table_id
AND col# =rec.column_id
AND intcol#=rec.column_intid;

DBMS_OUTPUT.PUT_LINE(rec.owner_name ||'.'|| rec.table_name||' ('|| rec.column_name ||') - '|| rec.column_type ||' - '|| newmaxsz || ' Bytes');
END LOOP;
END;
/

B) Solving the ORA-01401 / ORA-12899 error :

There are 2 main options:

B1) shorten the data before export

A script. that gives an idea on how to do this, based on the csscan result of the LAST run (Csscan will only keep ONE result set ) , is found in Note 1231922.1 Script. to Solve Truncation in Csscan output by Shortening Data Before Conversion
Note of course that this will provoke data loss, so this not often a good alternative

B2) adapt the columns to fit the expansion of the data

For some columns it might be needed to change datype:

* for CHAR data that becomes more than 2000 bytes after conversion to AL32UTF8 you need to
change the CHAR column to VARCHAR2  before export on the source side
OR
 pre create the CHAR column as VARCHAR2 on the import (AL32UTF8) side.

* for VARCHAR2 data that becomes more than 4000 bytes after conversion to AL32UTF8 you need to:
change the VARCHAR2 column to CLOB before export on the source side
OR
pre-create the VARCHAR2 column as CLOB on the import (AL32UTF8) side. Pre-creating CLOB will only work when using DataPump , the "old" imp/exp tools will fail with IMP-00058: ORACLE error 1461 encountered , ORA-01461: can bind a LONG value only for insert into a LONG column

Note: the need for a datatype change is not often seen, but is possible. If the expansion in BYTES is bigger then the max datalength of the datatype , which is 2000 bytes for CHAR and 4000 bytes for VARCHAR2, then using CHAR semantics will not help

For most columns changing the column datatype "size" will be enough , you have again 2 choices:

a)*- Adapt the column size in BYTES for data that becomes NOT more than 2000 bytes (for CHAR) / 4000 bytes (for VARCHAR2) after conversion to AL32UTF8.
 you can:
- change the column(s) size in BYTEs to the minimum need size before export on the source side
OR
- pre-create the table with enlarged column(s) size in BYTEs on the import (AL32UTF8) side


b)*- Change the columns to CHAR semantics for data that becomes NOT more than 2000 bytes (for CHAR) / 4000 bytes (for VARCHAR2) after conversion to AL32UTF8.

The import utility uses the exact semantics defined on the source table definition from the export dump file, not from the source or target database init.ora settings. If the source tables are defined with BYTE semantics then they will be created with BYTE semantics , even when importing into a database that has NLS_LENGTH_SEMANTICS set to CHAR in the init.ora/pfile/spfile.
In other words, setting NLS_LENGTH_SEMANTICS=CHAR in your init.ora will NOT change any table to CHAR semantics during import if those tables (!!!) are using BYTE on the source side.

you can:
- change the column(s) to CHAR semantics before export on the source side 
OR
- pre-create the table using CHAR semantics on the import (AL32UTF8) side 

How to use CHAR semantics is discussed in note 144808.1 Examples and limits of BYTE and CHAR semantics usage (NLS_LENGTH_SEMANTICS)

For the import to succeed:

you will need to adapt /address at least the reported "Truncation" columns.

When using CHAR semantics however it is in general better to change all columns to CHAR semantics to avoid confusion later on.
As linked in note 144808.1 , you can use a script. that automates the change to CHAR semantics: Note 313175.1 Changing columns to CHAR length semantics.

One known issue is documented in Note 779526.1 CSSCAN does not detect data truncation for CHAR datatype - ORA-12899 when importing
Set the parameter BLANK_TRIMMING=TRUE on the importing side to avoid the problem documented in above note.

If you adapt the column definitions on the SOURCE side, then run csscan again to make sure there is no "Truncation" (or "Lossy") any more.
"Convertible" data is normal and not a problem for export/import.

Note: sometimes there is also a list of tables at then end of the expcheck.txt under the header [Truncation Due To Character Semantics] , this is not the same as the truncation this note discusses.
When using (full) export/import the objects under the [Trunation Due To Character Semantics] header can be simply ignored. (for the good order, only under this header.....)

If you pre-create the tables using the longer columns or using CHAR semantics on the TARGET side then use the IGNORE=Y parameter for imp or the TABLE_EXISTS_ACTION=TRUNCATE option for Impdp to import the data into the pre-created tables.

Import the exported data into the new AL32UTF8 database, when using export/import using the "old" Exp/Imp tools the NLS_LANG setting is simply AMERICAN_AMERICA. OR AMERICAN_AMERICA.AL32UTF8. Both are correct.
Expdp/imdpd does not use the NLS_LANG for data conversion.

Known issues:

* Bug 6167249 - imp-00058: ora-01461: can bind a long value only for insert
Fixed version: 10.2.0.5 , 11.1.0.7 , 11.2.0.1 and up
Problem: If a table is containing a CLOB is precreated during an import with rows=N and subsequently one performs a rows=Y import of the same table from the same dumpfile, the last import operation will fail with the following errors :
IMP-00058: ORACLE error 1461 encountered
ORA-01461: can bind a LONG value only for insert into a LONG column
IMP-00028: partial import of previous table rolled back: 19070 rows rolled back
If the table would be imported direactly with ROWS=Y the errors do not occur.
Workaround: use expdp/imdpd

References

NOTE:144808.1 - Examples and limits of BYTE and CHAR semantics usage (NLS_LENGTH_SEMANTICS)
NOTE:444701.1 - Csscan output explained
NOTE:458122.1 - Installing and Configuring Csscan in 8i and 9i (Database Character Set Scanner)
NOTE:727997.1 - ORA-12899 Using Datapump Import With Transporting Tablespaces (TTS) Having Multiple Schemas or Tablespaces
NOTE:745809.1 - Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner)
NOTE:779526.1 - CSSCAN does not detect data truncation for CHAR datatype - ORA-12899 when importing
NOTE:788156.1 - AL32UTF8 / UTF8 (Unicode) Database Character Set Implications

Show Related InformationRelated


Products
  • Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition
Keywords
-TO; AL32UTF8; DATAPUMP; EXP; EXPDP; IMP; IMPORT; UNICODE


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/11780477/viewspace-708545/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/11780477/viewspace-708545/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值