ASM字节码学习

因为之前看过Classvisit,有点好奇,java和ast的关系,js和ast的关系如下

https://segmentfault.com/a/1190000016231512

emm 原来是我想岔了,是JavaCC和ast语法树有关系,参考

JavaCC语法分析器

以及我写的Javcc私密文档

总结

就是一定要画图,画了思维导图觉得清晰很多

字节码结构参考 :

Chapter 4. The class File Format

或者参考教程 Java ASM系列:(004)ClassFile快速参考【附源码】_lsieun_51CTO博客

代码对应github地址: java8-classfile-tutorial: Java ClassFile,

备用github: https://github.com/CodePpoi/java8-classfile-tutorial-master

我们的.class文件,遵循下面的数据结构

ClassFile {
    u4             magic; //魔数  一直都是0xCAFEBABE
    u2             minor_version; // u2代表两个字节
    u2             major_version;
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1]; //cp_info是复杂类型,不过负责类型也是由u2 u4等组成的
    u2             access_flags;
    u2             this_class;
    u2             super_class;
    u2             interfaces_count;
    u2             interfaces[interfaces_count];
    u2             fields_count;
    field_info     fields[fields_count];
    u2             methods_count;
    method_info    methods[methods_count];
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

除了ASM,其实还有其他操作字节码类库,比如javassist

最下层的JVM Specification,就是所有class文件必须遵守JVM ClassFile的规范

针对下面代码, 我们分析其字节码


public class HelloWorld extends Exception implements Cloneable, Serializable {
    private static final int intValue = 10;

    public void test() {
        int a = 1;
        int b = 2;
        int c = a + b;
    }
}

字节码如下:

magic
CAFEBABE

compiler_version
00000034

constant_pool_count
001C

constant_pool
|001| 0A00030017
|002| 070018
|003| 070019
|004| 07001A
|005| 07001B
|006| 010008696E7456616C7565
|007| 01000149
|008| 01000D436F6E7374616E7456616C7565
|009| 030000000A
|010| 0100063C696E69743E
|011| 010003282956
|012| 010004436F6465
|013| 01000F4C696E654E756D6265725461626C65
|014| 0100124C6F63616C5661726961626C655461626C65
|015| 01000474686973
|016| 0100134C73616D706C652F48656C6C6F576F726C643B
|017| 01000474657374
|018| 01000161
|019| 01000162
|020| 01000163
|021| 01000A536F7572636546696C65
|022| 01000F48656C6C6F576F726C642E6A617661
|023| 0C000A000B
|024| 01001173616D706C652F48656C6C6F576F726C64
|025| 0100136A6176612F6C616E672F457863657074696F6E
|026| 0100136A6176612F6C616E672F436C6F6E6561626C65
|027| 0100146A6176612F696F2F53657269616C697A61626C65

class_info
002100020003000200040005

fields_count
0001

fields
|000| 001A0006000700010008000000020009

methods_count
0002

methods
|000| 0001000A000B0001000C0000002F00010001000000052AB70001B100000002000D00000006000100000005000E0000000C000100000005000F00100000
|001| 00010011000B0001000C0000005D0002000400000009043C053D1B1C603EB100000002000D000000120004000000090002000A0004000B0008000C000E0000002A000400000009000F00100000000200070012000700010004000500130007000200080001001400070003

attributes_count
0001

attributes
|000| 0015000000020016


Process finished with exit code 0

先依次对常量池的东西进行分析:

|001| CONSTANT_Methodref {Value='#3.#23', HexCode='0A00030017'}

HexCode就是字节码的16进制, 0A是 代表 CONST_Methodref这个类型, 0003和0017代表常量池中第3个和第23个变量。

    |002| CONSTANT_Class {Value='#24', HexCode='070018'}
    |003| CONSTANT_Class {Value='#25', HexCode='070019'}
    |004| CONSTANT_Class {Value='#26', HexCode='07001A'}
    |005| CONSTANT_Class {Value='#27', HexCode='07001B'}
    |006| CONSTANT_Utf8 {Value='intValue', HexCode='010008696E7456616C7565'}
    |007| CONSTANT_Utf8 {Value='I', HexCode='01000149'}

HexCode的 07就是CONST_Class类型, 0018就是对应第24个变量

对于0006的intValue,这是一个字段, 字段要遵循下面结构:

field_info {
    u2             access_flags; // public private等
    u2             name_index;
    u2             descriptor_index;
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

对应的字段以及分析如下:

|000| intValue:I
HexCode: 001A0006000700010008000000020009
access_flags='001A'([ACC_PRIVATE,ACC_STATIC,ACC_FINAL])
name_index='0006'(#6)
descriptor_index='0007'(#7)
attributes_count='0001'(1)
--->ConstantValue=0008000000020009

007的"I"表示int类型,

再看下面:

    008| CONSTANT_Utf8 {Value='ConstantValue', HexCode='01000D436F6E7374616E7456616C7565'}
    |009| CONSTANT_Integer {Value='10', HexCode='030000000A'}
    |010| CONSTANT_Utf8 {Value='<init>', HexCode='0100063C696E69743E'}
    |011| CONSTANT_Utf8 {Value='()V', HexCode='010003282956'}

009的CONSTANT_Integer就是10,对应变量intValue的初始值

010 对应的是无参函数构造方法init

011 就是 public void test(), ()表示括号里面没有参数,V表示返回类型为void, HexCode最后的56就是'V'对应的ascii码

对于方法,需要满足method_info结构:

method_info {
    u2             access_flags;
    u2             name_index;
    u2             descriptor_index;
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

 init的方法对应为

methods_count='0002' (2)
methods
|000| <init>:()V
HexCode: 0001000A000B0001000C0000002F00010001000000052AB70001B100000002000D00000006000100000005000E0000000C000100000005000F00100000
access_flags='0001'([ACC_PUBLIC])
name_index='000A'(#10)
descriptor_index='000B'(#11)
attributes_count='0001'(1)
--->Code=000C0000002F00010001000000052AB70001B100000002000D00000006000100000005000E0000000C000100000005000F00100000

test方法对应为:

|001| test:()V
HexCode: 00010011000B0001000C0000005D0002000400000009043C053D1B1C603EB100000002000D000000120004000000090002000A0004000B0008000C000E0000002A000400000009000F00100000000200070012000700010004000500130007000200080001001400070003
access_flags='0001'([ACC_PUBLIC])
name_index='0011'(#17)
descriptor_index='000B'(#11)
attributes_count='0001'(1)
--->Code=000C0000005D0002000400000009043C053D1B1C603EB100000002000D000000120004000000090002000A0004000B0008000C000E0000002A000400000009000F00100000000200070012000700010004000500130007000200080001001400070003

可以看到,最下面又有个Code字段, 其结构如下:

Code_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 max_stack;
    u2 max_locals;
    u4 code_length;
    u1 code[code_length];
    u2 exception_table_length;
    {   u2 start_pc;
        u2 end_pc;
        u2 handler_pc;
        u2 catch_type;
    } exception_table[exception_table_length];
    u2 attributes_count;
    attribute_info attributes[attributes_count];
}

对于Code字段的解析,I_Attributes_Method运行的结果如下:

attributes_count='0001' (1)
attributes
--->|000| Code:
HexCode: 000C0000005D0002000400000009043C053D1B1C603EB100000002000D000000120004000000090002000A0004000B0008000C000E0000002A000400000009000F00100000000200070012000700010004000500130007000200080001001400070003
attribute_name_index='000C' (#12)
attribute_length='0000005D' (93)
max_stack='0002' (2)
max_locals='0004' (4)
code_length='00000009' (9)
code: 043C053D1B1C603EB1
exception_table_length='0000' (0)
attributes_count='0002' (2)
    LineNumberTable: 000D000000120004000000090002000A0004000B0008000C
    LocalVariableTable: 000E0000002A000400000009000F00100000000200070012000700010004000500130007000200080001001400070003

我们看到code: 043C053D1B1C603EB1 这一段,通过K_Code_Locals 继续解析,就能得到

code = 043C053D1B1C603EB1
=== === ===  === === ===  === === ===
0000: iconst_1             // 04
0001: istore_1             // 3C
0002: iconst_2             // 05
0003: istore_2             // 3D
0004: iload_1              // 1B
0005: iload_2              // 1C
0006: iadd                 // 60
0007: istore_3             // 3E
0008: return               // B1
=== === ===  === === ===  === === ===
LocalVariableTable:
index  start_pc  length  name_and_type
    0         0       9  this:Lsample/HelloWorld;
    1         2       7  a:I
    2         4       5  b:I
    3         8       1  c:I

暂时先分析这么多,后面再说

思维导图

字节码的思维导图如下

总结

就是一定要画图,画了思维导图觉得清晰很多

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值