ART深入浅出6--了解Dex文件格式（3）

双刃剑客

已于 2022-11-24 15:22:59 修改

阅读量587

点赞数 1

分类专栏： Android加固文章标签： android

于 2022-11-24 11:40:21 首次发布

原文链接：https://blog.csdn.net/doon/article/details/78016219

版权

Android加固专栏收录该内容

41 篇文章

订阅专栏

本文基于Android 7.1，不过因为从BSP拿到的版本略有区别，所以本文提到的源码未必与读者找到的源码完全一致。本文在提供源码片断时，将按照 <源码相对android工程的路径>:<行号> <类名> <函数名> 的方式，如果行号对不上，请参考类名和函数名来找到对应的源码。

本节介绍Dex Code的格式。DexCode是Dex虚拟机的核心。

CodeItem
CodeItem结构的内容
Method的内容都放在CodeItem结构中，它的定义是这样的
art/runtime/dex_file.h:281

  // Raw code_item.
  struct CodeItem {
    uint16_t registers_size_;            // the number of registers used by this code
                                         //   (locals + parameters)
    uint16_t ins_size_;                  // the number of words of incoming arguments to the method
                                         //   that this code is for
    uint16_t outs_size_;                 // the number of words of outgoing argument space required
                                         //   by this code for method invocation
    uint16_t tries_size_;                // the number of try_items for this instance. If non-zero,
                                         //   then these appear as the tries array just after the
                                         //   insns in this instance.
    uint32_t debug_info_off_;            // file offset to debug info stream
    uint32_t insns_size_in_code_units_;  // size of the insns array, in 2 byte code units
    uint16_t insns_[1];                  // actual array of bytecode.

.....
  };

这是一个开放结构，最后一个insns_是指令数组的起始地址。 ins_size_是指令的大小，单位是2字节。指令总大小是ins_size_*2.
registers_size_ 是该函数用到的总寄存器总个数。ins_size_是参数的个数。outs_size_是该函数调用子函数的大小
假设registers_siez_的大小是N, ins_size_为M，则参数寄存器序列是 (N-M ~ N - 1) 。
而且必须outs_size_ <= registers_size_。因为dex代码只能通过寄存器传递参数，所以，被用作传递参数的寄存器数量也包含在总寄存器中。
一个DexMethod需要的寄存器总大小是registers_size_ * 4。一个Long/Double型为8字节，必须占用两个寄存器。这样两个寄存器就合并成一个Wide寄存器。考虑到效率，Wide寄存器需要从偶数开始。

tries_size_ 是TryItem结构的大小，这个大小是用来记录try-catch/finally信息的。

debug_info_off_ 是调试信息，从文件头开始的。

insns_隐含的信息
insns_其实是由两部分组成：指令和Try catch信息

dex 指令，长度为 ins_size_ * 2
u2 填充数据，为了对齐，可有可无
TryItem[tries_size_] try信息数据
uled128格式的handler size
handlers 数据数组

TryCatch
TryCatch的数据结构，有TryItem和Handler两种数据。

TryItem
TryItem的定义如下：
art/runtime/dex_file.h:300

  struct TryItem {
    uint32_t start_addr_;
    uint16_t insn_count_;
    uint16_t handler_off_;                                                                                                           
....
  };

start_addr_ 是try 模块开始的dex pc，相对于CodeItem.insns_，单位是2字节。
insn_count_ 是从start_addr_开始的代码try 模块的数量
hander_off_ 是 hander结构的偏移，这个偏移是从codeItem.insns_ + sizeof(TryItem ) * codeItem.tries_size_ 开始的。

handler的结构
handler结构是一个不定长的结构，在TryItem数组之后有一个handler_size的leb128编码的数据，表示handler的结构。
一个handler结构包含多个catch块和finally块。catch块和finally块本质上没有什么区别，只是finally块没有指定Throwable对象。

art里面，通过CatchHandlerIterator对象可以遍历所有的Handler。
一个catch/finally块的结构定义在CatchHandlerIterator中，
art/runtime/dex_file.h:1635 CatchHandlerIterator

    struct CatchHandlerItem {
      uint16_t type_idx_;  // type index of the caught exception type
      uint32_t address_;  // handler address
    } handler_;

type_idx_ 是DexFile内的TypeIds数组的索引，表示一个Exception对象。
address_ 是基于CodeItem.insns_的位置，表示catch/finally的代码部分。
当是finally块时，type_idx_值为kNoDexIndex (0xffff)。
以上是解码后的数据，实际上，都是以leb128格式存储的。finally块是放在最后的，而且finally只有address_值，没有type_idx_的值。

多个catch块放在一起，是一个handler对象，结构是(伪代码）

leb128_int remaining_count;
struct {
   leb128_uint16 type_idx;
   leb128_uint32 address_idx;
}catchs[N];
leb128_uint32 finally_address_idx;

leb128前缀表示该项数据是leb128的编码。
remaining_count表示catch/finally块的个数。如果remaining_count <= 0表示最后一个是finally块。如果 remaining_count > 0表示只有catch块。
remaining_count的绝对值表示catch块的个数。如果remaining_count == 0表示没有catch块，只有finally块
一个TryItem对应一个handler块，共同组成try catch/finally块。

有多个try-catch/finally就有多个handler块组成。

Switch
switch在dex中分为packed switch和sparse switch两种。packed switch是针对 case的值相差只有1的情况。sparse switch针对的是case值之间差不相等的情况。

packed switch
指令的格式是 packed-switch vAA, +BBBBBBBB
BBBBBBBB 表示packed-switch数据的偏移，这是一个32位有符号整数，相对与当前指令的偏移。
packed-switch数据的格式是

名称   格式   说明
ident   ushort = 0x0100   识别伪运算码
size   ushort   表格中的条目数
first_key   int   第一位（即最低位）switch case 的值
targets   int[]   与 size 相对的分支目标的列表。这些目标相对应的是 switch 运算码的地址（而非此表格的地址）。
该结构的总大小是size * 4 + 4 + 4
targets的值是相对于代码开始位置。可以直接用作dexPC。

sparse switch
指令的格式是sparse-switch vAA, +BBBBBBBB
+BBBBBBBB的格式同上。
sparse-switch的格式是

名称   格式   说明
ident   ushort = 0x0200   识别伪运算码
size   ushort   表格中的条目数
keys   int[]   size 键值列表，从低到高排序
targets   int[]   与 size 相对应的分支目标的列表，每一个目标对应相同索引下的键值。这些目标相对应的是 switch 运算码的地址（而非此表格的地址）。
keys列表是按照从小到达的顺序排列的。
计算方法是，从keys中查找vAA的值对应的索引，用这个索引取targets对应的值。

Table
java代码中有很多数组填充的代码，这些代码在dex中对应的是fill-array-data数据。指令的格式是
fill-array-data vAA, +BBBBBBBB
vAA是一个array数组对象。+BBBBBBBB表示fill-array-data-payload，这个结构是

名称   格式   说明
ident   ushort = 0x0300   识别伪运算码
element_width   ushort   每个元素的字节数
size   uint   表格中的元素数
data   ubyte[]   数据值
data数据是可以直接拷贝到array数组内的数组。所以，这种格式只能用在元素为非object对象的array对象中。
————————————————
版权声明：本文为CSDN博主「漂流的代码」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/doon/article/details/78016219