Java 字节码中的UTF-8
根据oracle官网可知:
常量池的utf-8编码是被修改过的
String content is encoded in modified UTF-8. Modified UTF-8 strings are encoded so that code point sequences that contain only non-null ASCII characters can be represented using only 1 byte per code point, but all code points in the Unicode codespace can be represented.
编码举例
6A 61 76 61 2F 6C 61 6E 67 2F 4F 62 6A 65 63 74
转二进制表示
01101010
01100001
01110110
01100001
00101111
01101100
01100001
01101110
01100111
00101111
01001111
01100010
01101010
01100101
01100011
01110100
由于第一个字节都是0,那么说明是单字节表示。
查表可得
java/lang/Object
辅助代码
有需要自取
public static final String hexArray2Binary(byte[] bytes){
if(bytes == null || bytes.length <= 0)
return null;
StringBuffer sb = new StringBuffer(bytes.length*8);
for(int i = 0; i < bytes.length; i++){
String str = Integer.toBinaryString(bytes[i]&0xFF);
int len = str.length();
if(len < 8){
for(int j = 0; j < 8-len; j++){
sb.append("0");
}
}
sb.append(str);
}
return sb.toString();
}
public static final String hex2Binary(byte b) {
byte[] bytes = new byte[1];
bytes[0] = b;
return hexArray2Binary(bytes);
}
//in main func
String s2 = HexUtil.deleteWhitespace("6A 61 76 61 2F 6C 61 6E 67 2F 4F 62 6A 65 63 74");
System.out.println(s2);
byte[] bytes = HexUtil.hex2byte(s2);
System.out.println("bytes length " + bytes.length);
String s1 = HexUtil.hexArray2Binary(bytes);
System.out.println(s1);