JAVA 内存中的编码方式_Java内存中的文本编码

1、编码简介

1.1 概念简析:字符、字符集、编码字符集、Code Point、Code Unit和字符编码格式

首先要弄清楚字符、字符集、编码字符集、Code Point、Code Unit和字符编码格式等这些概念。

A character is just an abstract minimal unit of text. It doesn't have a fixed shape (that would be a glyph), and it doesn't have a value. "A" is a character, and so is "€", the symbol for the common currency of Germany, France, and numerous other European countries.

字符是一个文本的最小抽象单元,它没有具体的形状(形状是字形的范畴)。“A”是一个字符,“€”也是一个字符。

A character set is a collection of characters. For example, the Han characters are the characters originally invented by the Chinese, which have been used to write Chinese, Japanese, Korean, and Vietnamese.

字符集是一个字符的集合。

A coded character set is a character set where each character has been assigned a unique number. At the core of the Unicode standard is a coded character set that assigns the letter "A" the number 0041(16) and the letter "€" the number 20AC(16). The Unicode standard always uses hexadecimal numbers, and writes them with the prefix "U+", so the number for "A" is written as "U+0041".

编码字符集是一个经过编码的字符集,其中的每一个字符都被赋予了一个唯一的数字编码。Unicode标准的核心就是一个编码字符集,其中“A”对应0041(16进制)、“€”对应20AC(16进制)。Unicode编码标准用16进制表示,用“U+”作为前缀,比如,“A”被表示成U+0041。

Code points are the numbers that can be used in a coded character set. A coded character set defines a range of valid code points, but doesn't necessarily assign characters to all those code points. The valid code points for Unicode are U+0000 to U+10FFFF. Unicode 4.0 assigns characters to 96,382 of these more than a million code points.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值