字符集问题的初步探讨(五)

原文链接:

http://www.eygle.com/special/NLS_CHARACTER_SET_05.htm

原文发表于itpub技术丛书《Oracle数据库DBA专题技术精粹》,未经许可,严禁转载本文.

我们知道在导出文件中,记录着导出使用的字符集id,通过查看导出文件头的第2、3个字节,我们可以找到16进制表示的字符集ID,在Windows上,
我们可以使用UltraEdit等工具打开dmp文件,查看其导出字符集::

在Unix上我们可以通过以下命令来查看:


cat expdat.dmp | od -x | head
                      

Oracle提供标准函数,对字符集名称及ID进行转换:

 

 
SQL> select nls_charset_id('ZHS16GBK') from dual;

NLS_CHARSET_ID('ZHS16GBK')
--------------------------
                       852

1 row selected.

SQL> select nls_charset_name(852) from dual;

NLS_CHAR
--------
ZHS16GBK

1 row selected.

十进制转换十六进制:

SQL> select to_char('852','xxxx') from dual;

TO_CH
-----
  354

1 row selected.
					  

对应上面的图中第2、3字节,我们知道该导出文件字符集为ZHS16GBk.

查询数据库中有效的字符集可以使用以下脚本:

 

 
col nls_charset_id for 9999
col nls_charset_name for a30
col hex_id for a20
select 
nls_charset_id(value) nls_charset_id,  
value nls_charset_name,
to_char(nls_charset_id(value),'xxxx') hex_id
from  v$nls_valid_values
where parameter = 'CHARACTERSET'
order by nls_charset_id(value)
/
						

输出样例如下:

 

 
NLS_CHARSET_ID NLS_CHARSET_NAME               HEX_ID
-------------- ------------------------------ -------------
1 US7ASCII 1
2 WE8DEC 2
3 WE8HP 3
4 US8PC437 4
5 WE8EBCDIC37 5
6 WE8EBCDIC500 6
7 WE8EBCDIC1140 7
8 WE8EBCDIC285 8
...................
850 ZHS16CGB231280 352
851 ZHS16MACCGB231280 353
852 ZHS16GBK 354
853 ZHS16DBCS 355
860 ZHT32EUC 35c
861 ZHT32SOPS 35d
862 ZHT16DBT 35e
863 ZHT32TRIS 35f
864 ZHT16DBCS 360
865 ZHT16BIG5 361
866 ZHT16CCDC 362
867 ZHT16MSWIN950 363
868 ZHT16HKSCS 364
870 AL24UTFFSS 366
871 UTF8 367
872 UTFE 368

..................................


在很多时候,当我们进行导入操作的时候,已经离开了源数据库,这时如果目标数据库的字符集和导出文件不一致,很多时候就需要进行特殊处理,
以下介绍几种方法,主要以US7ASCII和ZHS16GBK为例

1. 源数据库字符集为US7ASCII,导出文件字符集为US7ASCII或ZHS16GBK,目标数据库字符集为ZHS16GBK
在Oracle92中,我们发现对于这种情况,不论怎样处理,这个导出文件都无法正确导入到Oracle9i数据库中,这可能是因为Oracle9i的编码方案发生了较大改变。

以下是我们所做的简单测试,其中导出文件命名规则为:


S-Server ,后跟Server字符集
C-client , 后跟导出操作时客户端字符集


导入时客户端字符集设置在命令行完成,限于篇幅,我们省略了部分测试过程。
对于Oracle9iR2,我们的测试结果是US7ASCII字符集,不管怎样转换,都无法正确导入ZHS16GBK字符集的数据库中。

在进行导入操作时,如果字符不能正常转换,Oracle数据库会自动用一个”?”代替,也就是编码63。

 

 
E:/nls2>set NLS_LANG=AMERICAN_AMERICA.US7ASCII

E:/nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii.dmp fromuser=eygle touser=eygle tables=test

Import: Release 9.2.0.4.0 - Production on Mon Nov 3 17:14:39 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path
import done in US7ASCII character set and AL16UTF16 NCHAR character set
import server uses ZHS16GBK character set (possible charset conversion)
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:/nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:14:50 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select name,dump(name) from test;

NAME	DUMP(NAME)
-----------------------------
????	Typ=1 Len=4: 63,63,63,63
test	Typ=1 Len=4: 116,101,115,116

2 rows selected.

SQL> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

E:/nls2>set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK

E:/nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii.dmp fromuser=eygle touser=eygle tables=test ignore=y

Import: Release 9.2.0.4.0 - Production on Mon Nov 3 17:15:28 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path
import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
export client uses US7ASCII character set (possible charset conversion)
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:/nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:15:34 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select name,dump(name) from test;

NAME		DUMP(NAME)
--------------------------------------------------------------------------------
????		Typ=1 Len=4: 63,63,63,63
test		Typ=1 Len=4: 116,101,115,116
????		Typ=1 Len=4: 63,63,63,63
test		Typ=1 Len=4: 116,101,115,116


4 rows selected.

SQL> drop table test;

Table dropped.

SQL> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

E:/nls2>set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK

E:/nls2>imp eygle/eygle file=Sus7ascii-Czhs16gbk.dmp fromuser=eygle touser=eygle tables=test ignore=y

Import: Release 9.2.0.4.0 - Production on Mon Nov 3 17:17:21 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path
import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:/nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:17:30 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select name,dump(name) from test;

NAME	DUMP(NAME)
----------------------------------------------
????	Typ=1 Len=4: 63,63,63,63
test	Typ=1 Len=4: 116,101,115,116

2 rows selected.

SQL> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

E:/nls2>set NLS_LANG=AMERICAN_AMERICA.US7ASCII

E:/nls2>imp eygle/eygle file=Sus7ascii-Czhs16gbk.dmp fromuser=eygle touser=eygle tables=test ignore=y

Import: Release 9.2.0.4.0 - Production on Mon Nov 3 17:18:00 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path
import done in US7ASCII character set and AL16UTF16 NCHAR character set
import server uses ZHS16GBK character set (possible charset conversion)
export client uses ZHS16GBK character set (possible charset conversion)
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:/nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:18:08 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select name,dump(name) from test;

NAME	DUMP(NAME)
----------------------------------------
????	Typ=1 Len=4: 63,63,63,63
test	Typ=1 Len=4: 116,101,115,116
????	Typ=1 Len=4: 63,63,63,63
test	Typ=1 Len=4: 116,101,115,116

4 rows selected.

SQL>

对于这种情况,我们可以通过使用Oracle8i的导出工具,设置导出字符集为US7ASCII,导出后修改第二、三字符,修改 0001 为
0354,这样就可以将US7ASCII字符集的数据正确导入到ZHS16GBK的数据库中。

修改导出文件:

导入修改后的导出文件:

 

 
E:/nls2>set NLS_LANG=AMERICAN_AMERICA.ZHS16GBK

E:/nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii-exp817.dmp fromuser=eygle touser=eygle tables=test

Import: Release 9.2.0.4.0 - Production on Mon Nov 3 17:37:17 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V08.01.07 via conventional path
import done in ZHS16GBK character set and AL16UTF16 NCHAR character set
export server uses UTF8 NCHAR character set (possible ncharset conversion)
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:/nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Nov 3 17:37:23 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select name,dump(name) from test;

NAME     DUMP(NAME)
--------------------------------------------------------------------------------
测试       Typ=1 Len=4: 178,226,202,212
Test        Typ=1 Len=4: 116,101,115,116

2 rows selected.

SQL>

2. 使用create database的方法
如果导出文件使用的字符集是US7ASCII,目标数据库的字符集是ZHS16GBK,我们可以使用create database的方法来修改,具体如下:

 

 
SQL> col parameter for a30
SQL> col value for a30
SQL> select * from v$nls_parameters;

PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_CHARACTERSET               ZHS16GBK
NLS_SORT                       BINARY
……………….

19 rows selected.

SQL> create database character set us7ascii;
create database character set us7ascii
*
ERROR at line 1:
ORA-01031: insufficient privileges


SQL> select * from v$nls_parameters;

PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_LANGUAGE                   AMERICAN
NLS_TERRITORY                  AMERICA
NLS_CURRENCY                   $
NLS_ISO_CURRENCY               AMERICA
NLS_NUMERIC_CHARACTERS         .,
NLS_CALENDAR                   GREGORIAN
NLS_DATE_FORMAT                DD-MON-RR
NLS_DATE_LANGUAGE              AMERICAN
NLS_CHARACTERSET               US7ASCII
NLS_SORT                       BINARY
…………..

19 rows selected.

SQL> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

E:/nls2>set nls_lang=AMERICAN_AMERICA.US7ASCII

E:/nls2>imp eygle/eygle file=Sus7ascii-Cus7ascii.dmp fromuser=eygle touser=eygle

Import: Release 9.2.0.4.0 - Production on Sun Nov 2 14:53:26 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Export file created by EXPORT:V09.02.00 via conventional path
import done in US7ASCII character set and AL16UTF16 NCHAR character set
import server uses ZHS16GBK character set (possible charset conversion)
. . importing table                         "TEST"          2 rows imported
Import terminated successfully without warnings.

E:/nls2>sqlplus eygle/eygle

SQL*Plus: Release 9.2.0.4.0 - Production on Sun Nov 2 14:53:35 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select * from test;

NAME
----------
测试
test

2 rows selected.

我们看到,当发出create database character set us7ascii命令时,数据库v$nls_parameters中的字符集设置随之更改,该参数影响导入进程,
更改后可以正确导入数据,重起数据库后,该设置恢复。

提示:v$nls_paraemters来源于x$nls_parameters,该动态性能视图影响导入操作;而nls_database_parameters来源于props$数据表,影响数据存储。

3. Oracle提供的字符扫描工具csscan

我们说以上的方法只是应该在不得已的情况下使用,其本质是欺骗数据库,强制导入数据,可能损失元数据。
如果要确保数据的完整性,应该使用csscan扫描数据库,找出所有不兼容的字符,然后通过编写相应的脚本及代码,在转换之后进行更新,确保数据的正确性。
我们简单看一下csscan的使用。

要使用csscan之前,需要以sys用户身份创建相应数据字典对象:

 

 
E:/nls2>sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.4.0 - Production on Sun Nov 2 19:42:07 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

SQL> select instance_name from v$intance;
select instance_name from v$intance
                          *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> select instance_name from v$instance;

INSTANCE_NAME
----------------
penny

1 row selected.

SQL> @?/rdbms/admin/csminst.sql

User created.


Grant succeeded.

………..

                

这个脚本创建相应用户(csmig)及数据字典对象,扫描信息会记录在相应的数据字典表里。

我们可以在命令行调用这个工具对数据库进行扫描:

 
E:/nls2>csscan FULL=Y FROMCHAR=ZHS16GBK TOCHAR=US7ASCII LOG=US7check.log CAPTURE=Y ARRAY=1000000 PROCESS=2


Character Set Scanner v1.1 : Release 9.2.0.1.0 - Production on Sun Nov 2 20:24:45 2003

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.


Username: eygle/eygle

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, Oracle Label Security, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production

Enumerating tables to scan...

. process 1 scanning SYS.SOURCE$[AAAABHAABAAAAIRAAA]
. process 2 scanning SYS.ATTRIBUTE$[AAAAEoAABAAAAhZAAA]
. process 2 scanning SYS.PARAMETER$[AAAAEoAABAAAAhZAAA]
. process 2 scanning SYS.METHOD$[AAAAEoAABAAAAhZAAA]
……..
. process 2 scanning SYSTEM.DEF$_AQERROR[AAAA8fAABAAACWJAAA]
. process 1 scanning WMSYS.WM$ENV_VARS[AAABeWAABAAAFMZAAA]
………………….
. process 2 scanning SYS.UGROUP$[AAAAA5AABAAAAGpAAA]
. process 2 scanning SYS.CON$[AAAAAcAABAAAACpAAA]
. process 1 scanning SYS.FILE$[AAAAARAABAAAABxAAA]

Creating Database Scan Summary Report...

Creating Individual Exception Report...

Scanner terminated successfully.
						

然后我们可以检查输出的日志来查看数据库扫描情况:

 
Database Scan Individual Exception Report


[Database Scan Parameters]

Parameter                      Value                                           
------------------------------ ------------------------------------------------
Scan type                      Full database                                   
Scan CHAR data?                YES                                             
Current database character set ZHS16GBK                                        
New database character set     US7ASCII                                        
Scan NCHAR data?               NO                                              
Array fetch buffer size        1000000                                         
Number of processes            2                                               
Capture convertible data?      YES                                             
------------------------------ ------------------------------------------------

[Data Dictionary individual exceptions]


[Application data individual exceptions]

User  : EYGLE
Table : TEST
Column: NAME
Type  : VARCHAR2(10)
Number of Exceptions         : 1         
Max Post Conversion Data Size: 4         

ROWID              Exception Type      Size Cell Data(first 30 bytes)     
------------------ ------------------ ----- ------------------------------
AAABpIAADAAAAAMAAA lossy conversion         测试                          
------------------ ------------------ ----- ------------------------------

 

不能转换的数据将会被记录下来,我们可以根据这些信息在转换之后,对数据进行相应的更新,确保转换无误。

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值