最早的当然是Ascii编码:0-32是给像打印机这样的外围设备。32-128是键盘上的字母和符号
中国发现不够,于是乎保留Ascii前一个字节的内容,后面加上了一个字节:GB2312
后来少数民族不够用了:GBK(GuoBiaoKuoZhan国标扩展)(关于Latin1编码)
每个国家都来搞太乱了,国际出现一套全球标准:UTF
中国文字全套:GB18030
抛砖引玉:
一个汉字在utf8编码下占3个字节,在GBK编码下占2个字节
一个字母在utf8编码下占1个字节,在GBK编码下占1个字节
这么算,utf8编码下1KB 约等于341个汉字,512个字母,1M 约等于 349,184个汉字 ,524,288个字母
public class CharMain {
public static void main(String[] args) throws Exception {
String a = "你";
System.out.println("================" + a + "============");
System.out.println(a.getBytes("utf-8").length);
System.out.println(a.getBytes("utf-16").length);
System.out.println(a.getBytes("GBK").length);
System.out.println(a.getBytes("GB18030").length);
System.out.println(a.getBytes("GB2312").length);
a = "s";
System.out.println("================" + a + "============");
System.out.println(a.getBytes("utf-8").length);
System.out.println(a.getBytes("utf-16").length);
System.out.println(a.getBytes("GBK").length);
System.out.println(a.getBytes("GB18030").length);
System.out.println(a.getBytes("GB2312").length);
a = "3";
System.out.println("================" + a + "============");
System.out.println(a.getBytes("utf-8").length);
System.out.println(a.getBytes("utf-16").length);
System.out.println(a.getBytes("GBK").length);
System.out.println(a.getBytes("GB18030").length);
System.out.println(a.getBytes("GB2312").length);
a = "!";
System.out.println("================" + a + "============");
System.out.println(a.getBytes("utf-8").length);
System.out.println(a.getBytes("utf-16").length);
System.out.println(a.getBytes("GBK").length);
System.out.println(a.getBytes("GB18030").length);
System.out.println(a.getBytes("GB2312").length);
a = "0x20001";
System.out.println("================" + a + "============");
System.out.println(a.getBytes("utf-8").length);
System.out.println(a.getBytes("utf-16").length);
System.out.println(a.getBytes("GBK").length);
System.out.println(a.getBytes("GB18030").length);
System.out.println(a.getBytes("GB2312").length);
}
}
结果:
================你============
3
4
2
2
2
================s============
1
4
1
1
1
================3============
1
4
1
1
1
================!============
1
4
1
1
1
================0x20001============
7
16 0x20001 ,x占4个字节 其他数字6个各占2个字节 4+2*6 = 16
7
7
7