Oracle character set

Character Encoding Schemes

Single-Byte 7-Bit Encoding Schemes

Single-byte 7-bit encoding schemes can define up to 128 characters, and normally support just one language. The only characters defined in 7-bit ASCII are the 26 Latin alphabetic characters. Various other 7-bit schemes are used where certain characters (normally punctuation) in 7-bit ASCII are replaced with additional alphanumeric characters required for a specific language.

 

Single-Byte 8-Bit Encoding Schemes

Single-byte 8-bit encoding schemes can define up to 256 characters, and normally support a group of languages. For example, ISO 8859/1 supports many West European languages.

 

Multi-Byte Encoding Schemes

Multi-byte encoding schemes are needed for Asian languages because these languages use thousands of characters. A double-byte encoding scheme can support up to 65536 characters. Some multi-byte encoding schemes use the value of the most significant bit to indicate if a byte represents a single-byte character or is the first or second byte of a double-byte character. In other schemes, control codes differentiate single-byte from double-byte characters. A shift-out code indicates that the following bytes are double-byte characters until a shift-in code is encountered.

There are two general groups of encoding schemes, those based on 7-bit ASCII and those based on IBM EBCDIC. Within each group, all schemes normally use the same encoding for the 26 Latin characters (A to Z), but use different encoding for other characters used in languages other than English. ASCII and EBCDIC use different encodings, even for the Latin characters.

--------------------------

Specifying Language-Dependent Behavior

This section discusses the parameters that specify language-dependent operation. You can set language-dependent behavior defaults for the server and set language dependent behavior for the client that overrides these defaults.

Most NLS parameters can be used in three ways

 

  • As initialization parameters to specify language-dependent behavior defaults for the server.
  • For example, in your INIT.ORA file, include
		NLS_TERRITORY = FRANCE

 

  • As environment variables on client machines to specify language-dependant behavior defaults for a session. These defaults override the defaults set for the server.
  • For example, on a UNIX system
		setenv NLS_TERRITORY FRANCE

 

  • For example:
		ALTER SESSION SET NLS_TERRITORY = FRANCE
 
 

NLS Parameters

The NLS_LANGUAGE and NLS_TERRITORY parameters implicitly specify several aspects of language-dependent operation. Additional NLS parameters provide explicit control over these operations. The parameters listed below can be specified in the initialization file, or they can also be specified for each session with the ALTER SESSION command.

Parameter Description
NLS_CALENDAR Calendar system
NLS_CURRENCY Local currency symbol
NLS_DATE_FORMAT Default date format
NLS_DATE_LANGUAGE Default language for dates
NLS_ISO_CURRENCY ISO international currency symbol
NLS_LANGUAGE Default language
NLS_NUMERIC_CHARACTERS Decimal character and group separator
NLS_SORT Character sort sequence
NLS_SPECIAL_CHARS  
NLS_TERRITORY Default territory
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.

NLS_CALENDAR

Many different calendar systems are in use throughout the world. NLS_CALENDAR specifies which calendar system Oracle uses.

NLS_CALENDAR can have one of the following values:

 

  • Arabic Hijrah
  • Gregorian
  • Japanese Imperial
  • Persian
  • ROC Official
  • Thai Buddha

For example, if NLS_CALENDAR is set to "Japanese Imperial", the date format is "YY-MM-DD", and the date is February 17, 1907, then the sysdate is displayed as follows:

 

SELECT SYSDATE FROM DUAL;
SYSDATE
--------
07-02-17 
 

NLS_CURRENCY

This parameter specifies the character string returned by the number format mask L, the local currency symbol, overriding that defined implicitly by NLS_TERRITORY. For example, to set the local currency symbol to "Dfl" (including a space), the parameter should be set as follows:

 

NLS_CURRENCY = "Dfl "

In this case, the query

 

SELECT TO_CHAR(TOTAL, 'L099G999D99') "TOTAL"
   FROM ORDERS WHERE CUSTNO = 586

would return

 

TOTAL
-------------
Dfl 12.673,49

You can alter the default value of NLS_CURRENCY by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_CURRENCY command.

For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.

 

NLS_DATE_FORMAT

Defines the default date format to use with the TO_CHAR and TO_DATE functions. The default value of this parameter is determined by NLS_TERRITORY. The value of this parameter can be any valid date format mask, and the value must be surrounded by double quotes. For example:

 

NLS_DATE_FORMAT = "MM/DD/YYYY"

As another example, to set the default date format to display Roman numerals for months, you would include the following line in your initialization file:

 

NLS_DATE_FORMAT = "DD RM YY"

With such a default date format, the following SELECT statement would return the month using Roman numerals (assuming today's date is February 13, 1991):

 

SELECT TO_CHAR(SYSDATE) CURRDATE
   FROM DUAL;
CURRDATE
---------
13 II 91

The value of this parameter is stored in the tokenized internal date format. Each format element occupies two bytes, and each string occupies the number of bytes in the string plus a terminator byte. Also, the entire format mask has a two-byte terminator. For example, "MM/DD/YY" occupies 12 bytes internally because there are three format elements, two one-byte strings (the two slashes), and the two-byte terminator for the format mask. The tokenized format for the value of this parameter cannot exceed 24 bytes.

Note: The applications you design may need to allow for a variable-length default date format. Also, the parameter value must be surrounded by double quotes: single quotes are interpreted as part of the format mask.

You can alter the default value of NLS_DATE_FORMAT by changing its value in the initialization file and then restarting the instance, and you can alter the value during a session using an ALTER SESSION SET NLS_DATE_FORMAT command.

For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.

 

NLS_DATE_ LANGUAGE

This parameter specifies the language for the spelling of day and month names by the functions TO_CHAR and TO_DATE , overriding that specified implicitly by NLS_LANGUAGE. NLS_DATE_LANGUAGE has the same syntax as the NLS_LANGUAGE parameter, and all supported languages are valid values. For example, to specify the date language as French, the parameter should be set as follows:

 

NLS_DATE_LANGUAGE = FRENCH

In this case, the query

 

SELECT TO_CHAR(SYSDATE, 'Day:Dd Month yyyy')
   FROM DUAL;

would return

 

Mercredi:13 Février 1991

Month and day name abbreviations are also in the language specified, for example:

 

Me:13 Fév 1991

The default date format also uses the language-specific month name abbreviations. For example, if the default date format is DD-MON-YYYY, the above date would be inserted using:

 

INSERT INTO tablename VALUES ('13-Fév-1991');

The abbreviations for AM, PM, AD, and BC are also returned in the language specified by NLS_DATE_LANGUAGE. Note that numbers spelled using the TO_CHAR function always use English spellings; for example:

 

SELECT TO_CHAR(TO_DATE('27-Fév-91'),'Day: ddspth Month')
   FROM DUAL;

would return:

 

Mercredi: twenty-seventh Février

You can alter the default value of NLS_DATE_LANGUAGE by changing its value in the initialization file and then restarting the instance, and you can alter the value during a session using an ALTER SESSION SET NLS_DATE_LANGUAGE command.

For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.

 

NLS_ISO_CURRENCY

This parameter specifies the character string returned by the number format mask C, the ISO currency symbol, overriding that defined implicitly by NLS_TERRITORY.

Local currency symbols can be ambiguous; for example, a dollar sign ($) can refer to US dollars or Australian dollars. ISO Specification 4217 1987-07-15 defines unique "international" currency symbols for the currencies of specific territories (or countries).

For example, the ISO currency symbol for the US Dollar is USD, for the Australian Dollar AUD. To specify the ISO currency symbol, the corresponding territory name is used.

NLS_ISO_CURRENCY has the same syntax as the NLS_TERRITORY parameter, and all supported territories are valid values. For example, to specify the ISO currency symbol for France, the parameter should be set as follows:

 

NLS_ISO_CURRENCY = FRANCE

In this case, the query

 

SELECT TO_CHAR(TOTAL, 'C099G999D99') "TOTAL"
   FROM ORDERS WHERE CUSTNO = 586

would return

 

TOTAL
-------------
 FRF12.673,49

You can alter the default value of NLS_ISO_CURRENCY by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_ISO_CURRENCY command.

For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.

 

NLS_NUMERIC_ CHARACTERS

This parameter specifies the decimal character and grouping separator, overriding those defined implicitly by NLS_TERRITORY. The decimal character separates the integer and decimal parts of a number. The grouping separator is the character returned by the number format mask G. For example, to set the decimal character to a comma and the grouping separator to a period, the parameter should be set as follows:

 

NLS_NUMERIC_CHARACTERS = ",."

Both characters are single byte and must be different. Either can be a space.

Note: When the decimal character is not a period (.) or when a group separator is used, numbers appearing in SQL statements must be enclosed in quotes. For example:

 

        INSERT INTO SIZES (ITEMID, WIDTH, QUANTITY)
          VALUES (618, '45,5', TO_NUMBER('1.234','9G999'));

You can alter the default value of NLS_NUMERIC_CHARACTERS by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_DATE_LANGUAGE command.

For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.

 

NLS_SORT

This parameter specifies the type of sort for character data, overriding that defined implicitly by NLS_LANGUAGE.

The syntax of NLS_SORT is:

 

NLS_SORT = { BINARY | name }

BINARY specifies a binary sort and name specifies a particular linguistic sort sequence. For example, to specify the linguistic sort sequence called German, the parameter should be set as follows:

 

NLS_SORT = German

The name given to a linguistic sort sequence has no direct connection to language names. Usually, however, each supported language will have an appropriate linguistic sort sequence defined that uses the same name.

Note: Setting the NLS_SORT initialization parameter to BINARY causes a sort to use a full table scan, regardless of the path the optimizer chooses.

You can alter the default value of NLS_SORT by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_SORT command.

For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.

A complete list of linguistic definitions is provided in the "Linguistic Definitions" table .

 

_______________________________________________________________________

NLS Data

This section lists supported languages, territories, storage character sets, Arabic/Hebrew display character sets, linguistic definitions, and calendars.

 

 

 

 

Table C-2 Oracle Character Sets for Operating System Locales  

Operating System Locale

Character Set

Arabic

AR8ASMO8X

Catalan

WE8PC850

Chinese (PRC)

ZHS16GBK

Chinese (Taiwan)

ZHT16MSWIN950

Czech

EE8PC852

Danish

WE8PC850

Dutch

WE8PC850

English (United Kingdom)

WE8PC850

English (United States)

US8PC437

Finnish

WE8PC850

French

WE8PC850

German

WE8PC850

Greek

EL8PC737

Hungarian

EE8PC852

Italian

WE8PC850

Japanese

JA16SJIS

Korean

KO16MSWIN949

Norwegian

WE8PC850

Polish

EE8PC852

Portuguese

WE8PC850

Romanian

EE8PC852

Russian

RU8PC866

Slovak

EE8PC852

Slovenian

EE8PC852

Spanish

WE8PC850

Swedish

WE8PC850

Turkish

TR8PC857

 

 

 

Storage Character Sets

The following storage character sets are supported in Oracle Server release 7.3:

Name Description US7ASCII ASCII 7-bit American WE8DEC DEC 8-bit West European WE8HP HP LaserJet 8-bit West European US8PC437 IBM-PC Code Page 437 8-bit American WE8EBCDIC37 EBCDIC Code Page 37 8-bit West European WE8EBCDIC500 EBCDIC Code Page 500 8-bit West European WE8PC850 IBM-PC Code Page 850 8-bit West European D7DEC DEC VT 100 7-bit German F7DEC DEC VT 100 7-bit French S7DEC DEC VT100 7-bit Swedish E7DEC DEC VT100 7-bit Spanish SF7ASCII ASCII 7-bit Finnish NDK7DEC DEC VT100 7-bit Norwegian/Danish I7DEC DEC VT100 7-bit Italian NL7DEC DEC VT100 7-bit Dutch CH7DEC DEC VT100 7-bit Swiss (German/French) YUG7ASCII ASCII 7-bit Yugoslavian SF7DEC DEC VT 100 7-bit Finnish TR7DEC DEC VT100 7-bit Turkish WE8ISO8859P1 ISO 8859-1 West European EE8ISO8859P2 ISO 8859-2 East European SE8ISO8859P3 ISO 8859-3 South European NEE8ISO8859P4 ISO 8859-4 North and North-East European CL8ISO8859P5 ISO 8859-5 Latin/Cyrillic AR8ISO8859P6 ISO 8859-6 Latin/Arabic EL8ISO8859P7 ISO 8859-7 Latin/Greek IW8ISO8859P8 ISO 8859-8 Latin/Hebrew WE8ISO8859P9 ISO 8859-9 West European & Turkish NE8ISO8859P10 ISO 8859-10 North European TH8TISASCII Thai Industrial Standard 620-2533 - ASCII 8-bit TH8TISEBCDIC Thai Industrial Standard 620-2533 - EBCDIC 8-bit AR8EBCDICX EBCDIC XBASIC 8-bit Latin/Arabic EL8DEC DEC 8-bit Latin/Greek TR8DEC DEC 8-bit Turkish WE8EBCDIC37C EBCDIC Code Page 37 8-bit Oracle/c RU8PC866 IBM-PC Code Page 866 8-bit Latin/Cyrillic WE8EBCDIC500C EBCDIC Code Page 500 8-bit Oracle/c EEC8EUROPA3 EEC EUROPA3 8-bit West European/Greek EE8PC852 IBM-PC Code Page 852 8-bit East European RU8BESTA BESTA 8-bit Latin/Cyrillic RU8PC855 IBM-PC Code Page 855 8-bit Latin/Cyrillic TR8PC857 IBM-PC Code Page 857 8-bit Turkish CL8MACCYRILLIC Mac Client 8-bit Latin/Cyrillic CL8MACCYRILLICS Mac Server 8-bit Latin/Cyrillic WE8PC860 IBM-PC Code Page 860 8-bit West European IS8PC861 IBM-PC Code Page 861 8-bit Icelandic EE8MACCES Mac Server 8-bit Central European EE8MACCROATIANS Mac Server 8-bit Croatian TR8MACTURKISHS Mac Server 8-bit Turkish IS 8MACICELANDICS Mac Server 8-bit Icelandic EL8MACGREEKS Mac Server 8-bit Greek EE8MSWIN 1250 MS Windows Code Page 1250 8-bit East European CL8MSWIN1251 MS Windows Code Page 1251 8-bit Latin/Cyrillic F8EBCDIC297 EBCDIC Code Page 297 8-bit French BG8MSWIN MS Windows 8-bit Bulgarian Cyrillic EL8MSWIN1253 MS Windows Code Page 1253 8-bit Latin/Greek D8EBCDIC273 EBCDIC Code Page 273/18-bit Austrian German I8EBCDIC280 EBCDIC Code Page 280/18-bit Italian DK8EBCDIC277 EBCDIC Code Page 277/18-bit Danish S8EBCDIC278 EBCDIC Code Page 278/18-bit Swedish EE8EBCDIC870 EBCDIC Code Page 870 8-bit East European CL8EBCDIC1025 EBCDIC Code Page 1025 8-bit Cyrillic N8PC865 IBM-PC Code Page 865 8-bit Norwegian F7SIEMENS9780X Siemens 97801/97808 7-bit French E7SIEMENS9780X Siemens 97801/97808 7-bit Spanish S7SIEMENS9780X Siemens 97801/97808 7-bit Swedish DK7SIEMENS9780X Siemens 97801/97808 7-bit Danish N7SIEMENS9780X Siemens 97801/97808 7-bit Norwegian I7SIEMENS9780X Siemens 97801/97808 7-bit Italian D7SIEMENS9780X Siemens 97801/97808 7-bit German WE8GCOS7 Bull EBCDIC GCOS7 8-bit West European US8BS2000 Siemens 9750-62 EBCDIC 8-bit American D8BS2000 Siemens 9750-62 EBCDIC 8-bit German F8BS2000 Siemens 9750-62 EBCDIC 8-bit French E8BS2000 Siemens 9750-62 EBCDIC 8-bit Spanish DK8BS2000 S Siemens 9750-62 EBCDIC 8-bit Danish WE8BS2000 Siemens EBCDIC.DF.04 8-bit West European CL8BS2000 Siemens EBCDIC.EHC.LC 8-bit Cyrillic WE8BS2000L5 Siemens EBCDIC.DF.O4.L5 8-bit West European/Turkish WE8DG DG 8-bit West European WE8NCR4970 NCR 4970 8-bit West European WE8ROMAN8 HP Roman8 8-bit West European EE8MACCE Mac Client 8-bit Central European EE8MACCROATIAN Mac Client 8-bit Croatian TR8MACTURKISH Mac Client 8-bit Turkish IS8MACICELANDIC Mac Client 8-bit Icelandic EL8MACGREEK Mac Client 8-bit Greek US8ICL ICL EBCDIC 8-bit American WE8ICL ICL EBCDIC 8-bit West European WE8MACROMAN8 Mac Client 8-bit Extended Roman8 West European WE8MACROMAN8S Mac Server 8-bit Extended Roman8 West European TH8MACTHAI Mac Client 8-bit Latin/Thai TH8MACTHAIS Mac Server 8-bit Latin/Thai HU8CWI2 Hungarian 8-bit CWI-2 TR8ISO8859P9 Turkish version ISO 8859-9 West European & Turkish EL8PC437S IBM-PC Code Page 437 8-bit (Greek modification) EL8EBCDIC875 EBCDIC Code Page 875 8-bit Greek EL8PC737 IBM-PC Code Page 737 8-bit Greek/Latin LT8PC772 IBM-PC Code Page 772 8-bit Lithuanian (Latin/Cyrillic) LT8PC774 IBM-PCCode Page 774 8-bit Lithuanian (Latin) CDN8PC863 IBM-PC Code Page 863 8-bit Canadian French AR8ASMO8X ASMO Extended 708 8-bit Latin/Arabic AR8NAFITHA711 Nafitha Enhanced 711 Server 8-bit Latin/Arabic AR8SAKHR707 SAKHR 707 Server 8-bit Latin/Arabic AR8MUSSAD768 Mussa'd Alarabi/2 768 Server 8-bit Latin/Arabic AR8ADOS710 Arabic MS-DOS 710 Server 8-bit Latin/Arabic AR8ADOS720 Arabic MS-DOS 720 Server 8-bit Latin/Arabic AR8APTEC715 APTEC 715 Server 8-bit Latin/Arabic AR8MSWIN1256 MS Windows Code Page 1256 8-Bit Latin/Arabic AR8NAFITHA721 Nafitha International 721 Server 8-bit Latin/Arabic AR8SAKHR706 SAKHR 706 Server 8-bit Latin/Arabic AR8ARABICMAC Mac Client 8-bit Latin/Arabic AR8ARABICMACS Mac Server 8-bit Latin/Arabic JA16VMS JVMS 16-bit Japanese JA16EUC EUC 16-bit Japanese JA16SJIS Shift-JIS 16-bit Japanese JA16DBCS IBM DBCS 16-bit Japanese JA16HP HP 16-bit Japanese JA16EBCDIC930 IBM DBCS Code Page 290 16-bit Japanese JA16TOSHIBAEUC Toshiba EUC 16-bit Japanese KO16KSC5601 KSC5601 16-bit Korean KO16DBCS IBM DBCS 16-bit Korean ZHS16CGB231280 CGB2312-80 16-bit Simplified Chinese ZHT32EUC EUC 32-bit Traditional Chinese ZHT32SOPS SOPS 32-bit Traditional Chinese ZHT16DBT Taiwan Taxation 16-bit Traditional Chinese ZHT32TRIS TRIS 32-bit Traditional Chinese ZHT16BIG5 BIG5 16-bit Traditional Chinese AL24UTFFSS Unicode UTF-FSS JA16TSTSET2 ASCII-based 16-bit Test Character Set JA16TSTSET Shift-sensitive ASCII-based Test Character Set Table 4 - 2. (continued) Storage Character Sets

Arabic/Hebrew Display Character Sets

The following Arabic/Hebrew display character sets are supported in Oracle Server release 7.3:

Name Description AR8ASMO708PLUS ASMO 708 Plus 8-bit Latin/Arabic AR7ASMO449PLUS ASMO 449 Plus 7-bit Latin/Arabic AR7AMEER Ameer 7-bit Latin/Arabic AR8XBASIC XBASIC Right-to-Left Arabic Character Set AR8NAFITHA711T Nafitha Enhanced 711 Client 8-bit Latin/Arabic AR8SAKHR707T SAKHR 707 Client 8-bit Latin/Arabic AR8MUSSAD768T Mussa'd Alarabi/2 768 Client 8-bit Latin/Arabic AR8ADOS710T Arabic MS-DOS 710 Client 8-bit Latin/Arabic AR8ADOS720T Arabic MS-DOS 720 Client 8-bit Latin/Arabic AR8APTEC715T APTEC 7 15 Client 8-bit Latin/Arabic AR8NAFITHA721T Nafitha International 721 Client 8-bit Latin/Arabic AR7SEDCOT SEDCO/ESPRIT/DATA GENERAL 7-bit Latin/Arabic AR8HPARABIC8T HP ARABIC8 8-bit Latin/Arabic

_____________________________________________________________________

 

摘要至itpub

AL16UTF16 和 UTF8 这两种选择都适用于国家字符集
AL16UFT16 是宽度固定的双字节 Unicode 字符集

UTF8 是宽度可变的、一至三个字节的 Unicode 字符集
欧洲字符在 UTF8 中按一至两个字节存储,而在 AL16UTF16 中按两个字节存储,相比之下,UTF8可以节省空间
亚洲字符在 UTF8 中按三个字节存储,这样,所需的空间比在 AL16UTF16 中要多

AL16UTF16 是宽度固定的编码,因此在执行速度上要比宽度可变的 UTF8 快

 

翻译的一段:  
   
  字符集类型  
   
          CREATE   DATABASE语句中有CHARACTER   SET从句和附加的NATIONAL   CHARACTER   SET从句用来定义  
  数据库的字符集和国家字符集。这两个字符集在数据库创建之后都无法修改。如果不指明NATIONAL  
  CHARACTER   SET从句,则国家字符集缺省取数据库字符集。  
          因为数据库字符集用于标识并装载SQL和PL/SQL源代码,所以数据库字符集必须将EBCDIC或7位ASCII  
  作为子集。因此,固定宽度,多字节字符集不可能作为数据库字符集,而只能作为国家字符集。数据类型  
  NCHAR,NVARCHAR2和NCLOB是基本数据类型CHAR,VARCHAR2和BLOB的变体,来指明它们用国家字符集而  
  不是数据库字符集存储数据。  
   
        NCHAR用于使用国家字符集定义固定长度的字符项。  
        NVARCHAR2用于使用国家字符集定义变长度的字符项。  
        NCLOB用于使用国家字符集定义字符大对象,来保存固定宽度,多字节字符。  
   
        数据库字符集存储变宽度字符,国家字符集存储固定宽度和变宽度多字节字符。

 

 

原文  
   
  Character   Set   Types  
  The   CREATE   DATABASE   statement   has   the   CHARACTER   SET   clause   and   the  
  additional   optional   clause   NATIONAL   CHARACTER   SET   to   declare   the   character   set  
  to   be   used   as   the   database   character   set   and   the   national   character   set.   Neither  
  character   set   can   be   changed   after   creating   the   database.   If   no   NATIONAL  
  CHARACTER   SET   clause   is   present,   the   national   character   set   defaults   to   the  
  database   character   set.  
  Because   the   database   character   set   is   used   to   identify   and   to   hold   SQL   and   PL/SQL  
  source   code,   it   must   have   either   EBCDIC   or   7-bit   ASCII   as   a   subset,   whichever   is  
  native   to   the   platform.   Therefore,   it   is   not   possible   to   use   a   fixed-width,   multibyte  
  character   set   as   the   database   character   set,   only   as   the   national   character   set.  
  The   data   types   NCHAR,   NVARCHAR2,   and   NCLOB   are   provided   to   declare   columns  
  as   variants   of   the   basic   types   CHAR,   VARCHAR2,   and   CLOB,   to   note   that   they   are  
  stored   using   the   national   character   set   and   not   the   database   character   set.  
  •   To   declare   a   fixed-length   character   item   that   uses   the   national   character   set,   use   the  
  data   type   specification   NCHAR   [(size)].  
  •   To   declare   a   variable-length   character   item   that   uses   the   national   character   set,   use  
  the   data   type   specification   NVARCHAR2   (size).  
  •   To   declare   a   character   large   object   (CLOB)   item   containing   fixed-width,   multibyte  
  characters   that   uses   the   national   character   set,   use   the   data   type   specification  
  NCLOB   (size).

 

效率
  从上述编码原理中得出的结论是:
  1.每个英文字母、数字所占的空间为1 Byte;
  2.泛欧语系、斯拉夫语字母占2 Bytes;
  3.汉字占3 Bytes。
  由此可见UTF8对英文来说是个非常诱人的方案,但对中文来说则不太合算,无论用ANSI还是 Unicode/UCS2来编码都只用2 Bytes,但用UTF8则需要3 Bytes。
  以下是一些统计资料,显示用UTF8来储存文件每个字符所需的平均字节:
  1.拉丁语系平均用1.1 Bytes;
  2.希腊文、俄文、阿拉伯文和希伯莱文平均用1.7 Bytes;
  3.其他大部份文字如中文、日文、韩文、Hindi(北印度语)用约3 Bytes;
  4.用超过4 Bytes的都是些非常少用的文字符号。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值