黑马程序员_字符编码解码

最新推荐文章于 2024-07-26 12:28:02 发布

dsp_001

最新推荐文章于 2024-07-26 12:28:02 发布

阅读量532

点赞数

分类专栏： java学习文章标签：黑马程序员 java 编码解码 UTF-8

本文链接：https://blog.csdn.net/dsp_001/article/details/8862001

版权

java学习专栏收录该内容

9 篇文章 0 订阅

订阅专栏

----------- android培训、java培训、java学习型技术博客、期待与您交流！ ------------

关于字符集与字符编码的基础知识，见http://www.crifan.com/files/doc/docbook/char_encoding/release/html/char_encoding.html#enc_iso8859。

常用的编码表：

*ASCII：美国标准信息交换码。用一个字节的7位可以表示。不识别中文。

*ISO8859-1：拉丁码表，欧洲码表，也是Tomcat使用的默认码表。用一个字节的8为表示。不识别中文。

*GB2312：中文编码表。

*GBK：对GB2312的升级，融合了更多的中文字符。

*Unicode：国际标准码，融合了多种文字。Unicode只是一个符号集，它只规定了符号的二进制代码，却没有规定这个二进制代码应该如何存储。Java使用的就是Unicode字符集。

*UTF-8：是Unicode的实现方式之一。UTF-8最大特点就是它是一种变长的编码方式。它可以使用1~4个字节表示一个符号，根据不同的符号而变化字节长度。

代码1：

public class Encode{
     public static void main(String[] args)throws Exception{
          String s1="你好";
          byte[] bs1=s1.getBytes();
          String s2=new String(bs1);
          System.out.println("s1="+s1);
          System.out.println("s2="+s2);
     }
}

输出结果：

s1=你好
s2=你好

解释：s1.getBytes()使用的是默认的GBK编码，new String(bs1)使用的也是默认的GBK进行解码，所以s2与s1的值相同。

代码2：

import java.util.*;
public class Encode{
     public static void main(String[] args)throws Exception{
          String str1="你好";
          byte[] bytes1=str1.getBytes();
          System.out.println("str1="+str1);
          System.out.println("bytes1="+Arrays.toString(bytes1));

          String str2=new String(bytes1,"UTF-8");
          System.out.println("str2="+str2);
     }
}

输出结果：

str1=你好
bytes1=[-60, -29, -70, -61]
str2=???
bytes2=[-17, -65, -67, -17, -65, -67, -17, -65, -67]

解释：str1.getBytes()使用默认的GBK将str1编码为bytes1=[-60, -29, -70, -61]，new String(bytes1,"UTF-8")错误的使用了UTF-8去解码bytes1，导致了str2=???乱码的出现。

代码3：

import java.util.*;
public class Encode{
     public static void main(String[] args)throws Exception{
          String str1="你好";
          byte[] bytes1=str1.getBytes();
          System.out.println("str1="+str1);
          System.out.println("bytes1="+Arrays.toString(bytes1));

          String str2=new String(bytes1,"ISO8859-1");
          System.out.println("str2="+str2);
          
          byte[] bytes2=str2.getBytes("ISO8859-1");
          String str3=new String(bytes2);
          System.out.println("bytes2="+Arrays.toString(bytes2));
          System.out.println("str3="+str3);
     }
}

输出结果：

str1=你好
bytes1=[-60,-29,-70,-61]
str2=????
bytes2=[-60,-29,-70,-61]                                                                                                        str3=你好

解释：str1.getBytes()使用GBK进行编码，new String(bytes1,"ISO8859-1")使用ISO8859-1进行解码，导致str2=???的乱码的出现；之后str2.getBytes("ISO8859-1")用ISO8859-1将str2编码得到与bytes1相同的bytes2，再对bytes2使用正确的编码方式GBK进行编码得到str3，与最初的str1的值相同。

上述代码3阐述了对乱码进行处理的一种方式，但是，这种方法并不适用于上面代码2的情况，如下面代码所示：

代码4：

import java.util.*;
public class Encode{
     public static void main(String[] args)throws Exception{
          String str1="你好";
          byte[] bytes1=str1.getBytes();
          System.out.println("str1="+str1);
          System.out.println("bytes1="+Arrays.toString(bytes1));

          String str2=new String(bytes1,"UTF-8");
          System.out.println("str2="+str2);
 
          byte[] bytes2=str2.getBytes("UTF-8");
          String str3=new String(bytes2,"GBK");
          System.out.println("bytes2="+Arrays.toString(bytes2));
          System.out.println("str3="+str3);
     }
}

输出结果：

str1=你好
bytes1=[-60, -29, -70, -61]
str2=???
bytes2=[-17, -65, -67, -17, -65, -67, -17, -65, -67]
str3=锟斤拷锟?

解释：此处并不像代码3中所示那样，此处的str3值为乱码，而不是跟str1的值相同。原因在于UTF-8中存在一个所谓的"未知字符区"，在执行new String(bytes1,"UTF-8")时，将bytes1=[-60, -29, -70, -61]拿到UTF-8中查找，没找到匹配的字符，于是在"未知字符区"选了几个未知字符赋值给str2，于是出现str2=???；再执行str2.getBytes("UTF-8")时，是将这几个未知字符进行编码，得到bytes2=[-17, -65, -67, -17, -65, -67, -17, -65, -67]，而不是原来的bytes1=[-60, -29, -70, -61]，所以即使对bytes2再用GBK进行解码也得不到想要的结果。

dsp_001

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
黑马程序员_字符编码解码

----------- android培训、java培训、java学习型技术博客、期待与您交流！ ------------关于字符集与字符编码的基础知识，见http://www.crifan.com/files/doc/docbook/char_encoding/release/html/char_encoding.html#enc_iso8859。常用的编码表： *ASCII：美
复制链接

扫一扫

专栏目录