在做WEB开发的时候经常会遇到乱码问题,在解析字节数组的时候指定其编码方式即可。
Testing...
public class CodeTest {
public static void main(String[] args) {
execute();
}
private static void execute() {
String s = "hello,你好!";
byte[] bytesISO8859 = null;
byte[] bytesGBK = null;
try {
bytesISO8859 = s.getBytes("iso-8859-1");
bytesGBK = s.getBytes("GBK");
} catch (java.io.UnsupportedEncodingException e) {
e.printStackTrace();
}
System.out.println("--------------\n 8859 bytes:");
System.out.println("bytes is: " + arrayToString(bytesISO8859));
System.out.println("hex format is:" + encodeHex(bytesISO8859));
System.out.println();
System.out.println("--------------\n GBK bytes:");
System.out.println("bytes is:" + arrayToString(bytesGBK));
System.out.println("hex format is:" + encodeHex(bytesGBK));
}
public static final String encodeHex(byte[] bytes) {
StringBuffer buff = new StringBuffer(bytes.length * 2);
String b;
for (int i = 0; i < bytes.length; i++) {
b = Integer.toHexString(bytes[i]);
// byte是两个字节的,而上面的Integer.toHexString会把字节扩展为4个字节
buff.append(b.length() > 2 ? b.substring(6, 8) : b);
buff.append(" ");
}
return buff.toString();
}
public static final String arrayToString(byte[] bytes) {
StringBuffer buff = new StringBuffer();
for (int i = 0; i < bytes.length; i++) {
buff.append(bytes[i] + " ");
}
return buff.toString();
}
}
结果:
--------------
8859 bytes:
bytes is: 104 101 108 108 111 63 63 63 63
hex format is: 68 65 6c 6c 6f 3f 3f 3f 3f
--------------
GBK bytes:
bytes is: 104 101 108 108 111 -93 -84 -60 -29 -70 -61 -93 -95
hex format is: 68 65 6c 6c 6f a3 ac c4 e3 ba c3 a3 a1
可见,在s中提取的8859-1格式的字节数组长度为9,中文字符都变成了“63”,ASCII码为63的是“?”,一些国外的程序在国内中文环境下运行时,经常出现乱码,上面布满了“?”,就是因为编码没有进行正确处理的结果。