code point

一个完整的Unicode字符叫CodePoint
一个Java char 叫代码单元code unit;
The Unicode standard was originally designed as a fixed-width 16-bit character
encoding. It has since been changed to allow for characters whose representa-
tion requires more than 16 bits. The range of legal code points is now U+0000 to
U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are
greater than U+FFFF are called supplementary characters. To represent the complete
range of characters using only 16-bit units, the Unicode standard defines an
encoding called UTF-16. In this encoding, supplementary characters are represented
as pairs of 16-bit code units, the first from the high-surrogates range,
(U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to
U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points
and UTF-16 code units are the same.
The Java programming language represents text in sequences of 16-bit code
units, using the UTF-16 encoding. A few APIs, primarily in the Character class,
use 32-bit integers to represent code points as individual entities. The Java platform
provides methods to convert between the two representations.
(From JLS-3.0)
int 值表示所有 Unicode 代码点,包括增补代码点。int 的 21 个低位(最低有效位)用于表示 Unicode 代码点,并且 11 个高位(最高有效位)必须为零。
为什么只用21位就可以了呢?
合法 代码点 的范围现在是从 U+0000 到 U+10FFFF
代码点大于 U+FFFF 的 字符称为 增补字符, 范围是0x10000到0x10ffff
0000 0001 0000 0000 0000 0000
0001 0000 1111 1111 1111 1111
可见增补字符只用到了int类型的后21位


评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值