因为之前看过Classvisit,有点好奇,java和ast的关系,js和ast的关系如下
https://segmentfault.com/a/1190000016231512
emm 原来是我想岔了,是JavaCC和ast语法树有关系,参考
以及我写的Javcc私密文档
总结
就是一定要画图,画了思维导图觉得清晰很多
字节码结构参考 :
Chapter 4. The class File Format
或者参考教程 Java ASM系列:(004)ClassFile快速参考【附源码】_lsieun_51CTO博客
代码对应github地址: java8-classfile-tutorial: Java ClassFile,
备用github: https://github.com/CodePpoi/java8-classfile-tutorial-master
我们的.class文件,遵循下面的数据结构
ClassFile {
u4 magic; //魔数 一直都是0xCAFEBABE
u2 minor_version; // u2代表两个字节
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1]; //cp_info是复杂类型,不过负责类型也是由u2 u4等组成的
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
除了ASM,其实还有其他操作字节码类库,比如javassist
最下层的JVM Specification,就是所有class文件必须遵守JVM ClassFile的规范
针对下面代码, 我们分析其字节码
public class HelloWorld extends Exception implements Cloneable, Serializable {
private static final int intValue = 10;
public void test() {
int a = 1;
int b = 2;
int c = a + b;
}
}
字节码如下:
magic
CAFEBABE
compiler_version
00000034
constant_pool_count
001C
constant_pool
|001| 0A00030017
|002| 070018
|003| 070019
|004| 07001A
|005| 07001B
|006| 010008696E7456616C7565
|007| 01000149
|008| 01000D436F6E7374616E7456616C7565
|009| 030000000A
|010| 0100063C696E69743E
|011| 010003282956
|012| 010004436F6465
|013| 01000F4C696E654E756D6265725461626C65
|014| 0100124C6F63616C5661726961626C655461626C65
|015| 01000474686973
|016| 0100134C73616D706C652F48656C6C6F576F726C643B
|017| 01000474657374
|018| 01000161
|019| 01000162
|020| 01000163
|021| 01000A536F7572636546696C65
|022| 01000F48656C6C6F576F726C642E6A617661
|023| 0C000A000B
|024| 01001173616D706C652F48656C6C6F576F726C64
|025| 0100136A6176612F6C616E672F457863657074696F6E
|026| 0100136A6176612F6C616E672F436C6F6E6561626C65
|027| 0100146A6176612F696F2F53657269616C697A61626C65
class_info
002100020003000200040005
fields_count
0001
fields
|000| 001A0006000700010008000000020009
methods_count
0002
methods
|000| 0001000A000B0001000C0000002F00010001000000052AB70001B100000002000D00000006000100000005000E0000000C000100000005000F00100000
|001| 00010011000B0001000C0000005D0002000400000009043C053D1B1C603EB100000002000D000000120004000000090002000A0004000B0008000C000E0000002A000400000009000F00100000000200070012000700010004000500130007000200080001001400070003
attributes_count
0001
attributes
|000| 0015000000020016
Process finished with exit code 0
先依次对常量池的东西进行分析:
|001| CONSTANT_Methodref {Value='#3.#23', HexCode='0A00030017'}
HexCode就是字节码的16进制, 0A是 代表 CONST_Methodref这个类型, 0003和0017代表常量池中第3个和第23个变量。
|002| CONSTANT_Class {Value='#24', HexCode='070018'}
|003| CONSTANT_Class {Value='#25', HexCode='070019'}
|004| CONSTANT_Class {Value='#26', HexCode='07001A'}
|005| CONSTANT_Class {Value='#27', HexCode='07001B'}
|006| CONSTANT_Utf8 {Value='intValue', HexCode='010008696E7456616C7565'}
|007| CONSTANT_Utf8 {Value='I', HexCode='01000149'}
HexCode的 07就是CONST_Class类型, 0018就是对应第24个变量
对于0006的intValue,这是一个字段, 字段要遵循下面结构:
field_info {
u2 access_flags; // public private等
u2 name_index;
u2 descriptor_index;
u2 attributes_count;
attribute_info attributes[attributes_count];
}
对应的字段以及分析如下:
|000| intValue:I
HexCode: 001A0006000700010008000000020009
access_flags='001A'([ACC_PRIVATE,ACC_STATIC,ACC_FINAL])
name_index='0006'(#6)
descriptor_index='0007'(#7)
attributes_count='0001'(1)
--->ConstantValue=0008000000020009
007的"I"表示int类型,
再看下面:
008| CONSTANT_Utf8 {Value='ConstantValue', HexCode='01000D436F6E7374616E7456616C7565'}
|009| CONSTANT_Integer {Value='10', HexCode='030000000A'}
|010| CONSTANT_Utf8 {Value='<init>', HexCode='0100063C696E69743E'}
|011| CONSTANT_Utf8 {Value='()V', HexCode='010003282956'}
009的CONSTANT_Integer就是10,对应变量intValue的初始值
010 对应的是无参函数构造方法init
011 就是 public void test(), ()表示括号里面没有参数,V表示返回类型为void, HexCode最后的56就是'V'对应的ascii码
对于方法,需要满足method_info结构:
method_info {
u2 access_flags;
u2 name_index;
u2 descriptor_index;
u2 attributes_count;
attribute_info attributes[attributes_count];
}
init的方法对应为
methods_count='0002' (2)
methods
|000| <init>:()V
HexCode: 0001000A000B0001000C0000002F00010001000000052AB70001B100000002000D00000006000100000005000E0000000C000100000005000F00100000
access_flags='0001'([ACC_PUBLIC])
name_index='000A'(#10)
descriptor_index='000B'(#11)
attributes_count='0001'(1)
--->Code=000C0000002F00010001000000052AB70001B100000002000D00000006000100000005000E0000000C000100000005000F00100000
test方法对应为:
|001| test:()V
HexCode: 00010011000B0001000C0000005D0002000400000009043C053D1B1C603EB100000002000D000000120004000000090002000A0004000B0008000C000E0000002A000400000009000F00100000000200070012000700010004000500130007000200080001001400070003
access_flags='0001'([ACC_PUBLIC])
name_index='0011'(#17)
descriptor_index='000B'(#11)
attributes_count='0001'(1)
--->Code=000C0000005D0002000400000009043C053D1B1C603EB100000002000D000000120004000000090002000A0004000B0008000C000E0000002A000400000009000F00100000000200070012000700010004000500130007000200080001001400070003
可以看到,最下面又有个Code字段, 其结构如下:
Code_attribute {
u2 attribute_name_index;
u4 attribute_length;
u2 max_stack;
u2 max_locals;
u4 code_length;
u1 code[code_length];
u2 exception_table_length;
{ u2 start_pc;
u2 end_pc;
u2 handler_pc;
u2 catch_type;
} exception_table[exception_table_length];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
对于Code字段的解析,I_Attributes_Method运行的结果如下:
attributes_count='0001' (1)
attributes
--->|000| Code:
HexCode: 000C0000005D0002000400000009043C053D1B1C603EB100000002000D000000120004000000090002000A0004000B0008000C000E0000002A000400000009000F00100000000200070012000700010004000500130007000200080001001400070003
attribute_name_index='000C' (#12)
attribute_length='0000005D' (93)
max_stack='0002' (2)
max_locals='0004' (4)
code_length='00000009' (9)
code: 043C053D1B1C603EB1
exception_table_length='0000' (0)
attributes_count='0002' (2)
LineNumberTable: 000D000000120004000000090002000A0004000B0008000C
LocalVariableTable: 000E0000002A000400000009000F00100000000200070012000700010004000500130007000200080001001400070003
我们看到code: 043C053D1B1C603EB1 这一段,通过K_Code_Locals 继续解析,就能得到
code = 043C053D1B1C603EB1
=== === === === === === === === ===
0000: iconst_1 // 04
0001: istore_1 // 3C
0002: iconst_2 // 05
0003: istore_2 // 3D
0004: iload_1 // 1B
0005: iload_2 // 1C
0006: iadd // 60
0007: istore_3 // 3E
0008: return // B1
=== === === === === === === === ===
LocalVariableTable:
index start_pc length name_and_type
0 0 9 this:Lsample/HelloWorld;
1 2 7 a:I
2 4 5 b:I
3 8 1 c:I
暂时先分析这么多,后面再说
思维导图
字节码的思维导图如下
总结
就是一定要画图,画了思维导图觉得清晰很多