Oracle Character Set

最新推荐文章于 2022-10-05 11:20:25 发布

彦祖的小号

最新推荐文章于 2022-10-05 11:20:25 发布

阅读量982

点赞数

分类专栏： Oracle 文章标签：数据库

本文链接：https://blog.csdn.net/linsuhangoracle/article/details/118513233

版权

Oracle数据库字符集包括单字节和多字节编码方案，如7位和8位编码。字符集如ASCII和ISO 8859-1支持不同数量的语言。Oracle遵循特定的命名约定，并维护二进制子集和超集的列表。长度语义区分字节和字符计数。数据库字符集和国家字符集在数据库创建时指定，且支持字符类型如NCHAR。选择不同的字符集可能导致数据转换、潜在的数据丢失和额外开销。对于多语言支持，建议使用Unicode的AL32UTF8。

摘要由CSDN通过智能技术生成

（一）Character Set Encoding

Code point/code value即字符对应的字符编码

A group of characters (for example, alphabetic characters, ideographs, symbols, punctuation marks, and control characters) can be encoded as a character set. An encoded character set assigns a unique numeric code to each character in the character set. The numeric codes are called code points or encoded values.

一个字符集可支持多种语言，字符集受限于它的字符库

Different character sets support different character repertoires. Because character sets are typically based on a particular writing script, they can support multiple languages. When character sets were first developed, they had a limited character repertoire. Even now there can be problems using certain characters across platforms.

无论Oracle是什么字符集均可转化以下字符，但其它字符使用时就注意数据库是否支持了

The following CHAR and VARCHAR characters are represented in all Oracle Database character sets and can be transported to any platform:

Uppercase and lowercase English characters A through Z and a through z
Arabic digits 0 through 9
The following punctuation marks: % ' ' ( ) * + - , . / \ : ; < > = ! _ & ~ { } | ^ ? $ # @ " [ ]
The following control characters: space, horizontal tab, vertical tab, form feed

If you are using characters outside this set, then take care that your data is supported in the database character set that you have chosen.

How are Characters Encoded?

Single-Byte Encoding Schemes

每个字符均使用1byte存储

Single-byte encoding schemes are efficient. They take up the least amount of space to represent characters and are easy to process and program with because one character can be represented in one byte.

Single-byte encoding schemes are classified as one of the following types:

7-bit encoding schemes

Single-byte 7-bit encoding schemes can define up to 128 characters and normally support just one language. One of the most common single-byte character sets, used since the early days of computing, is ASCII (American Standard Code for Information Interchange).

8-bit encoding schemes

Single-byte 8-bit encoding schemes can define up to 256 characters and often support a group of related languages. One example is ISO 8859-1, which supports many Western European languages. The following figure shows the ISO 8859-1 8-bit encoding scheme.

Multibyte Encoding Schemes

Multibyte encoding schemes are used in Asian languages like Chinese or Japanese because these languages use thousands of characters. These encoding schemes use either a fixed number or a variable number of bytes to represent each character.

Fixed-width multibyte encoding schemes

In a fixed-width multibyte encoding scheme, each character is represented by a fixed number of bytes. The number of bytes is at least two in a multibyte encoding scheme.

Variable-width multibyte encoding schemes

A variable-width encoding scheme uses one or more bytes to represent a single character. Some multibyte encoding schemes use certain bits to indicate the number of bytes that represents a character. For example, if two bytes is the maximum number of bytes used to represent a character, then the most significant bit can be used to indicate whether that byte is a single-byte character or the first byte of a double-byte character.

Shift-sensitive variable-width multibyte encoding schemes

Some variable-width encoding schemes use control codes to differentiate between single-byte and multibyte characters with the same code values. A shift-out code indicates that the following character is multibyte. A shift-in code indicates that the following character is single-byte. Shift-sensitive encoding schemes are used primarily on IBM platforms. Note that ISO-2022 character sets cannot be used as database character sets, but they can be used for applications such as a mail server.

Naming Convention for Oracle Database Character Sets

Oracle Database uses the following naming convention for its character set names:

<region><number of bits used to represent a character><standard character set name>[S|C]

可选的S或C用于区分只能在服务器（S）或仅在客户端（C）上使用的字符集。

Keep in mind that:

You should use the server character set (S) on the Macintosh platform. The Macintosh client character sets are obsolete. On EBCDIC platforms, use the server character set (S) on the server and the client character set (C) on the client.
UTF8 and UTFE are exceptions to the naming convention.

The following table shows examples of Oracle Database character set names.