展示字符集编码表示


import java.nio.ByteBuffer;
import java.nio.charset.Charset;

/**
* Charset encoding test. Run the same input string, which contains
* some non-ascii characters, through several Charset encoders and dump out
* the hex values of the resulting byte sequences.
*/
public class DecodeTest {
public static void main(String[] args) {
// This is the character sequence to encode
String input = "\u00bfMa\u00f1ana?";
String [] charsetNames = {
"US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE",
"UTF-16LE", "UTF-16"
};
for (int i = 0; i < charsetNames.length; i++) {
doEncode (Charset.forName(charsetNames[i]), input);
}
}

private static void doEncode(Charset cs, String input) {
ByteBuffer bb = cs.encode(input);
System.out.println("Charset: " + cs.name());
System.out.println(" input :" + input);
System.out.println("Encoded: " );
for (int i = 0; bb.hasRemaining(); i++) {
int b = bb.get();
int ival = ((int) b) & 0xff;
char c = (char) ival;
// Keep tabular alignment pretty
if (i < 10) System.out.print(" ");
// Print index number
System.out.print(" " + i + ": ");
// Better formatted output is coming someday...
if (ival < 16) System.out.print("0");
// Print the hex value of the byte
System.out.print(Integer.toHexString(ival));
// If the byte seems to be the value of a
// printable character, print it. No guarantee
// it will be.
if (Character.isWhitespace(c) || Character.isISOControl(c)) {
System.out.println("");
} else {
System.out.println(" (" + c + ")");
}
}
System.out.println("");
}
}


输出结果

Charset: US-ASCII
input :¿Mañana?
Encoded:
0: 3f (?)
1: 4d (M)
2: 61 (a)
3: 3f (?)
4: 61 (a)
5: 6e (n)
6: 61 (a)
7: 3f (?)

Charset: ISO-8859-1
input :¿Mañana?
Encoded:
0: bf (¿)
1: 4d (M)
2: 61 (a)
3: f1 (ñ)
4: 61 (a)
5: 6e (n)
6: 61 (a)
7: 3f (?)

Charset: UTF-8
input :¿Mañana?
Encoded:
0: c2 (Â)
1: bf (¿)
2: 4d (M)
3: 61 (a)
4: c3 (Ã)
5: b1 (±)
6: 61 (a)
7: 6e (n)
8: 61 (a)
9: 3f (?)

Charset: UTF-16BE
input :¿Mañana?
Encoded:
0: 00
1: bf (¿)
2: 00
3: 4d (M)
4: 00
5: 61 (a)
6: 00
7: f1 (ñ)
8: 00
9: 61 (a)
10: 00
11: 6e (n)
12: 00
13: 61 (a)
14: 00
15: 3f (?)

Charset: UTF-16LE
input :¿Mañana?
Encoded:
0: bf (¿)
1: 00
2: 4d (M)
3: 00
4: 61 (a)
5: 00
6: f1 (ñ)
7: 00
8: 61 (a)
9: 00
10: 6e (n)
11: 00
12: 61 (a)
13: 00
14: 3f (?)
15: 00

Charset: UTF-16
input :¿Mañana?
Encoded:
0: fe (þ)
1: ff (ÿ)
2: 00
3: bf (¿)
4: 00
5: 4d (M)
6: 00
7: 61 (a)
8: 00
9: f1 (ñ)
10: 00
11: 61 (a)
12: 00
13: 6e (n)
14: 00
15: 61 (a)
16: 00
17: 3f (?)


UTF -16BE 和UTF -16LE把每个字符编码为一个 2-字节数值。因此这类编码的解码器必须
要预先了解数据是如何编码的,或者根据编码数据流本身来确定字节顺序的方式。UTF -16
编码承认一种字节顺序标记:Unicode字符\uFEFF 。只有发生在编码流的开端时字节顺序
标记才表现为其特殊含义。如果之后遇到该值,它是根据其定义的 Unicode 值(零宽度,
无间断空格)被映射。外来的,小字节序系统可能会优先考虑\ uFEF 并且把流编码为
UTF -16LE。使用UTF -16编码优先考虑和认可字节顺序标记使系统带有不同的内部字节顺
序,从而与 Unicode数据交流

[table]
|UTF-16BE|无字节标记,编码高位字序|
|UTF-16LE|无字节标记,编码低位字序|
[/table]

更多信息请参考: orelly出版的 java nio 第6章.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值