有时java出现乱码后,想知道编码前的字符集和编码后的字符集,这样可以快速调整编码集纠正乱码,但是同样是乱码怎么看出来他编码前和编码后到底是什么字符集呢,今天闲来无聊我就写了个demo,尝试了一下。代码和结果如下:
`
String source = "中文测试";
Charset gbkCharset = Charset.forName("gbk");
Charset utf8Charset = Charset.forName("utf-8");
Charset iso88591Charset = Charset.forName("iso-8859-1");
Charset defaultCharset = Charset.defaultCharset();
System.out.printf("defaultCharset:%s%n", defaultCharset);
System.out.println(StringUtils.repeat("=",20));
String str1 = StringUtils.toEncodedString(source.getBytes(gbkCharset), utf8Charset);
System.out.printf("gbk=>utf-8:%s%n", str1);
String str4 = StringUtils.toEncodedString(source.getBytes(utf8Charset), gbkCharset);
System.out.printf("utf-8=>gbk:%s%n", str4);
String str2 = StringUtils.toEncodedString(source.getBytes(iso88591Charset), utf8Charset);
System.out.printf("iso8859-1=>utf-8:%s%n", str2);
String str5 = StringUtils.toEncodedString(source.getBytes(utf8Charset), iso88591Charset);
System.out.printf("utf-8=>iso8859-1:%s%n", str5);
String str3 = StringUtils.toEncodedString(source.getBytes(gbkCharset), iso88591Charset);
System.out.printf("gbk=>iso8859-1:%s%n", str3);
String str6 = StringUtils.toEncodedString(source.getBytes(iso88591Charset), gbkCharset);
System.out.printf("iso8859-1=>gbk:%s%n", str6);`
复制代码
运行结果:
defaultCharset:UTF-8
`=========================
gbk=>utf-8:���IJ���
utf-8=>gbk:涓枃娴嬭瘯
iso8859-1=>utf-8:????
utf-8=>iso8859-1:䏿æµè¯
gbk=>iso8859-1:ÖÐÎIJâÊÔ
iso8859-1=>gbk:????
`=========================
我是用idea写的demo,项目代码文件默认编码是utf-8。这样是不是可以根据乱码的字符,大致判断出编码前和编码后的字符集从而调整相应的编码呢?
声明:
我没有系统的写单元测试,也不知道这个方法靠不靠谱,仅供参考