java字符编码

最新推荐文章于 2021-09-11 14:13:33 发布

guo_yang

最新推荐文章于 2021-09-11 14:13:33 发布

阅读量122

点赞数

本文链接：https://blog.csdn.net/guo_yang/article/details/82186070

版权

要了解java的字符编码首先要弄清楚几个概念：
字符：人们使用的记号，抽象意义上的一个符号
字节：一个八位的计算机存储空间
字符串：多个字符的表现形式。

字符集：定义了那些字符能够被表示。表达了一个范围。
编码：规定了每个字符的存储方式。

通常所说的 UNICODE 是一个字符集。
UTF－8 等就是UNICODE的一种编码方式。

Java 中，字符串类 java.lang.String 处理的是 UNICODE 字符串，可以这么理解java中所有的字符串都以unincode作为它的内码存在：
假设我的java文件是以utf-8保存的：

Java代码

／／打印出字符串在java中表示的“内码” unicode 字符编号
public static void printStrCoding(String st){
for(int i=0;i<st.length();i++){
int j=(int)st.charAt(i);
System.out.println( Integer.toHexString(j));
}
}

      ／／打印出字符串在java中表示的“内码” unicode 字符编号   public static void printStrCoding(String st){         for(int i=0;i<st.length();i++){           int j=(int)st.charAt(i);           System.out.println( Integer.toHexString(j));                     }   }

Java代码

String name="中文";
System.out.println(name.length()); // 打印出2
byte[] bytes= name.getBytes("UTF-8" );
System.out.println(bytes.length); //打印出 6
for(int i =0;i<bytes.length;i++){
int j=(int) bytes[i];
System.out.println("coding: ------------------"+Integer.toHexString(j));
}
// utf-8 编码
// coding: ------------------ffffffb8
// coding: ------------------ffffffad
// coding: ------------------ffffffe6
// coding: ------------------ffffff96
// coding: ------------------ffffff87
printStrCoding(name);
//unicode 字符编号
//4e2d
//6587

       String name="中文";         System.out.println(name.length());  // 打印出2                  byte[] bytes= name.getBytes("UTF-8" );                  System.out.println(bytes.length);  //打印出 6          for(int i =0;i<bytes.length;i++){          int j=(int) bytes[i];          System.out.println("coding: ------------------"+Integer.toHexString(j));         }        // utf-8 编码        // coding: ------------------ffffffb8        // coding: ------------------ffffffad        // coding: ------------------ffffffe6       //  coding: ------------------ffffff96        // coding: ------------------ffffff87                           printStrCoding(name);        //unicode 字符编号        //4e2d        //6587

另外做个测试，将中文两个字以GB2312编码保存在文件中：

Java代码

File f= new File("/home/linwei007/temp/aa");
BufferedReader in = new BufferedReader(new FileReader(f));
String rs=in.readLine(); //读取中文两个字
System.out.println("coding: ------------------"+rs); //打印出乱马
printStrCoding(rs); //打印结果明显java不认识此字符
bytes= name.getBytes("GB2312" ); //以此编码取得字节串
System.out.println(bytes.length); //打印出2
for(int i =0;i<bytes.length;i++){
int j=(int) bytes[i];
System.out.println("coding: ------------------"+Integer.toHexString(j));
}
//GB2312的编码：
//coding: ------------------ffffffd6
//coding: ------------------ffffffd0
//coding: ------------------ffffffce
//coding: ------------------ffffffc4
String newString= new String(bytes,"GB2312");
System.out.println("coding: ------------------"+newString);
//打印出正确的中文.
printStrCoding(newString);
//unicode 字符编号
//4e2d
//6587

       File f= new File("/home/linwei007/temp/aa");         BufferedReader in = new BufferedReader(new FileReader(f));                  String rs=in.readLine();  //读取中文两个字                           System.out.println("coding: ------------------"+rs);  //打印出乱马         printStrCoding(rs);  //打印结果明显java不认识此字符                  bytes= name.getBytes("GB2312" );   //以此编码取得字节串         System.out.println(bytes.length);  //打印出2         for(int i =0;i<bytes.length;i++){          int j=(int) bytes[i];          System.out.println("coding: ------------------"+Integer.toHexString(j));         }         //GB2312的编码：         //coding: ------------------ffffffd6         //coding: ------------------ffffffd0         //coding: ------------------ffffffce         //coding: ------------------ffffffc4                  String newString= new String(bytes,"GB2312");         System.out.println("coding: ------------------"+newString);         //打印出正确的中文.                  printStrCoding(newString);        //unicode 字符编号        //4e2d        //6587

很明显无论那种编码，只要字符相同它在java中的 unicode的字符编号是相同的。关键是要用正确的编码去读写。

guo_yang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java字符编码

要了解java的字符编码首先要弄清楚几个概念：字符：人们使用的记号，抽象意义上的一个符号字节：一个八位的计算机存储空间字符串：多个字符的表现形式。字符集：定义了那些字符能够被表示。表达了一个范围。编码：规定了每个字符的存储方式。通常所说的 UNICODE 是一个字符集。 UTF－8 等就是UNICODE的一种编码方式。 Java 中，字符串类 java...
复制链接

扫一扫