JAVA 内存中的编码方式_Java内存中的文本编码

最新推荐文章于 2023-06-17 08:00:00 发布

leeloo deng

最新推荐文章于 2023-06-17 08:00:00 发布

阅读量711

点赞数

文章标签： JAVA 内存中的编码方式

本文链接：https://blog.csdn.net/weixin_42098759/article/details/114570216

版权

1、编码简介

1.1 概念简析：字符、字符集、编码字符集、Code Point、Code Unit和字符编码格式

首先要弄清楚字符、字符集、编码字符集、Code Point、Code Unit和字符编码格式等这些概念。

A character is just an abstract minimal unit of text. It doesn't have a fixed shape (that would be a glyph), and it doesn't have a value. "A" is a character, and so is "€", the symbol for the common currency of Germany, France, and numerous other European countries.

字符是一个文本的最小抽象单元，它没有具体的形状(形状是字形的范畴)。“A”是一个字符，“€”也是一个字符。

A character set is a collection of characters. For example, the Han characters are the characters originally invented by the Chinese, which have been used to write Chinese, Japanese, Korean, and Vietnamese.

字符集是一个字符的集合。

A coded character set is a character set where each character has been assigned a unique number. At the core of the Unicode standard is a coded character set that assigns the letter "A" the number 0041(16) and the letter "€" the number 20AC(16). The Unicode standard always uses hexadecimal numbers, and writes them with the prefix "U+", so the number for "A" is written as "U+0041".

编码字符集是一个经过编码的字符集，其中的每一个字符都被赋予了一个唯一的数字编码。Unicode标准的核心就是一个编码字符集，其中“A”对应0041(16进制)、“€”对应20AC(16进制)。Unicode编码标准用16进制表示，用“U+”作为前缀，比如，“A”被表示成U+0041。

Code points are the numbers that can be used in a coded character set. A coded character set defines a range of valid code points, but doesn't necessarily assign characters to all those code points. The valid code points for Unicode are U+0000 to U+10FFFF. Unicode 4.0 assigns characters to 96,382 of these more than a million code points.

最低0.47元/天解锁文章

leeloo deng

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
JAVA 内存中的编码方式_Java内存中的文本编码

1、编码简介1.1 概念简析：字符、字符集、编码字符集、Code Point、Code Unit和字符编码格式首先要弄清楚字符、字符集、编码字符集、Code Point、Code Unit和字符编码格式等这些概念。A character is just an abstract minimal unit of text. It doesn't have a fixed shape (that wou...
复制链接

扫一扫