昨天下午和两个朋友一起研究了一下这篇文章。大致写的都是dalvik虚拟机的编码。编码比较简单。以下是不太正规的翻译,有些还是不太明白。需要看代码再确定功能,尤其switch这块。比较晕。先这样吧。以后再说。

1 通用的设计

1.1 The machine model and calling conventions are meant to approximately imitate common real architectures and C-style calling conventions:

虚拟机的体系结构与调用约定近似于真实计算机的体系结构与C语言风格的调用约定:

1.1.1 The VM is register-based, and frames are fixed in size upon creation. Each frame consists of a particular number of registers (specified by the method) as well as any adjunct data needed to execute the method, such as (but not limited to) the program counter and a reference to the .dex file that contains the method.

dalvik虚拟机是基于寄存器的,每个方法体(frames)的运行空间在创建时已经被固定大小。每个方法体有一定数量的寄存器(通过方法指定)以及一些方法执行时所需要的附加的数据,例如(但不限于)程序计数器与包含此方法的.dex文件的引用。

1.1.2 When used for bit values (such as integers and floating point numbers), registers are considered 32 bits wide. Adjacent register pairs are used for 64-bit values. There is no alignment requirement for register pairs.

当使用位值时(例如×××与浮点数),寄存器使用32位宽度,如果是64位则使用两个相邻的32位寄存器对来表示。没有做对齐处理对这个寄存器对。

1.1.3 When used for object references, registers are considered wide enough to hold exactly one such reference.

当使用对象引用,寄存器有足够的空间来保存对象的引用。

1.1.4 In terms of bitwise representation, (Object) null == (int) 0.

对于空对象的表示,(Object) null = (int) 0。

1.1.5 The N arguments to a method land in the last N registers of the method's invocation frame, in order. Wide arguments consume two registers. Instance methods are passed a this reference as their first argument.

如果一个方法有N个参数的话,那么使用方法体最后的N个寄存器传参,按照寄存器索引排序,例如:1-10个寄存器,方法有3个参数,那么则使用8,9,10寄存器作为参数。8为第一个参数。64位参数则使用两个寄存器。方法的第一个参数表示对象的实例引用。

1.2 The storage unit in the instruction stream is a 16-bit unsigned quantity. Some bits in some instructions are ignored / must-be-zero.

指令流中的最小存储单位是一个16位的无符号值。在指令中的一些位可能被忽略/设置为0。

1.3 Instructions aren't gratuitously limited to a particular type. For example, instructions that move 32-bit register values without interpretation don't have to specify whether they are moving ints or floats.

指令的操作类型并不仅限于特定的类型。例如。move指令,32位寄存器的值可以是整型也可以是浮点型。

1.4 There are separately enumerated and indexed constant pools for references to strings, types, fields, and methods.

字符串,类型,域,与方法在常量池中的引用都有一些特定的枚举值来表示。

1.5 Bitwise literal data is represented in-line in the instruction stream.

1.6 Because, in practice, it is uncommon for a method to need more than 16 registers, and because needing more than eight registers is reasonably common, many instructions are limited to only addressing the first 16 registers. When reasonably possible, instructions allow references to up to the first 256 registers. In addition, some instructions have variants that allow for much larger register counts, including a pair of catch-all move instructions that can address registers in the range v0 – v65535. In cases where an instruction variant isn't available to address a desired register, it is expected that the register contents get moved from the original register to a low register (before the operation) and/or moved from a low result register to a high register (after the operation).

一个方法一般不使用超过16个寄存器,对于一个普通的方法超过8个寄存器就足够用了,大部分指令用最开始的16个寄存器,如果需要的话,指令可以使用最多256个寄存器。—

1.7 There are several "pseudo-instructions" that are used to hold variable-length data payloads, which are referred to by regular instructions (for example, fill-array-data). Such instructions must never be encountered during the normal flow of execution. In addition, the instructions must be located on even-numbered bytecode offsets (that is, 4-byte aligned). In order to meet this requirement, dex generation tools must emit an extra nop instruction as a spacer if such an instruction would otherwise be unaligned. Finally, though not required, it is expected that most tools will choose to emit these instructions at the ends of methods, since otherwise it would likely be the case that additional instructions would be needed to branch around them.

有一些使用队列数据的"伪指令",(例如,fill-array-data)。在正常的执行流中绝不会遇到。指令流存在于以偶数字节对齐的字节码偏移(以4字节对齐)。为了满足这种需求,dex生成工具使用额外的“nop”指令来填充对齐的空间。虽然不是必须的,大多数工具在方法末尾填充指定数量的“nop”来保证方法体的对齐粒度。如果不这样做的话,这些额外的“nop”将分布在每条指令的周围。

1.8 When installed on a running system, some instructions may be altered, changing their format, as an install-time static linking optimization. This is to allow for faster execution once linkage is known. See the associated instruction formats document for the suggested variants. The word "suggested" is used advisedly; it is not mandatory to implement these.

一条指令在静态链接优化过程中,可能会改变它们的格式目的是为了提高执行的速度,有些标志来表示哪些指令将被优化,如果没有添加这种标志优化动作不会强制执行,这些标志请观看"instruction formats document"查看这些"suggested"变量。

1.9 Human-syntax and mnemonics:

语法与助记符:

1.9.1 Dest-then-source ordering for arguments.

指令的参数第一个是目的然后是源。

1.9.2 Some opcodes have a disambiguating name suffix to indicate the type(s) they operate on:

一些opcode有一些特定名称的后缀来表明操作类型:

  1. Type-general 32-bit opcodes are unmarked.

    如果是32位opcode则不指明。


  2. Type-general 64-bit opcodes are suffixed with -wide.

    如果是64位opcode则使用“-wide”后缀。


  3. Type-specific opcodes are suffixed with their type (or a straightforward abbreviation), one of: -boolean -byte -char -short -int -long -float -double -object -string -class -void.

    通过类型后缀来表明opcode操作的类型,例如:-boolean -byte -char -short -int -long -float -double -object -string -class -void。


  4. Some opcodes have a disambiguating suffix to distinguish otherwise-identical operations that have different instruction layouts or options. These suffixes are separated from the main names with a slash ("/") and mainly exist at all to make there be a one-to-one mapping with static constants in the code that generates and interprets executables (that is, to reduce ambiguity for humans).

    有些指令有不同的选项,对于这些指令则使用一个“/”来区分不同的指令格式或者操作。


  5. In the descriptions here, the width of a value (indicating, e.g., the range of a constant or the number of registers possibly addressed) is emphasized by the use of a character per four bits of width.

    例如:vAA,表示8位,一个字母表示4位。


  6. For example, in the instruction "move-wide/from16 vAA, vBBBB":

    例如,"move-wide/from16 vAA, vBBBB":

    1. "move" is the base opcode, indicating the base operation (move a register's value).

      “move”是基础opcode,表明基础操作。


    2. "wide" is the name suffix, indicating that it operates on wide (64 bit) data.

      "wide"是名称后缀,表明操作64位数据。


    3. "from16" is the opcode suffix, indicating a variant that has a 16-bit register reference as a source.

      "from16"是opcode后缀,表明以16位寄存器作为源寄存器。


    4. "vAA" is the destination register (implied by the operation; again, the rule is that destination arguments always come first), which must be in the range v0 – v255.

      "vAA"是目的寄存器,总共有8位,范围是v0-v255。


    5. "vBBBB" is the source register, which must be in the range v0 – v65535.

      "vBBBB"是源寄存器,总共有16位,范围是v0-v65535。



2 指令集合摘要

Op & FormatMnemonic/SyntaxArgumentsDescription
00 10xnop
空循环。



使用nop来作为"packed-switch-payload","sparse-switch-payload","fill-array-data-payload"格式的标记,



在这种情况下opcode高字节表示原始数据。
01 12xmove vA, vBA: destination register (4 bits)Move the contents of one non-object register to another.


B: source register (4 bits)
02 22xmove/from16 vAA, vBBBBA: destination register (8 bits)Move the contents of one non-object register to another.


B: source register (16 bits)
03 32xmove/16 vAAAA, vBBBBA: destination register (16 bits)Move the contents of one non-object register to another.


B: source register (16 bits)
04 12xmove-wide vA, vBA: destination register pair (4 bits)Move the contents of one register-pair to another. (移动相邻的寄存器对到另外一对)


B: source register pair (4 bits)移动vN寄存器相邻的寄存器vN-1或者vN+1对到对方
05 22xmove-wide/from16 vAA, vBBBBA: destination register pair (8 bits)同上


B: source register pair (16 bits)
06 32xmove-wide/16 vAAAA, vBBBBA: destination register pair (16 bits)同上


B: source register pair (16 bits)
07 12xmove-object vA, vBA: destination register (4 bits)Move the contents of one object-bearing register to another.


B: source register (4 bits)
08 22xmove-object/from16 vAA, vBBBBA: destination register (8 bits)同上


B: source register (16 bits)
09 32xmove-object/16 vAAAA, vBBBBA: destination register (16 bits)同上


B: source register (16 bits)
0a 11xmove-result vAAA: destination register (8 bits)与invoke-kind指令成对使用,将它的调用结果移动到vAA寄存器中,返回类型是单字类型
0b 11xmove-result-wide vAAA: destination register pair (8 bits)同上,返回类型为双字类型
0c 11xmove-result-object vAAA: destination register (8 bits)同上,结果类型为对象类型,除了invoke-kind还有filled-new-array调用后必须使用其获取结果
0d 11xmove-exception vAAA: destination register (8 bits)保存一个异常触发时的结果,这条指令必须存在于异常处理例程的第一条指令。
0e 10xreturn-void
Return from a void method.相当于 return (void)
0f 11xreturn vAAA: return value register (8 bits)返回一个32位非对象值
10 11xreturn-wide vAAA: return value register-pair (8 bits)返回一个64位值
11 11xreturn-object vAAA: return value register (8 bits)返回一个对象类型的值
12 11nconst/4 vA, #+BA: destination register (4 bits)将有符号的4位的立即数扩展到有符号的32位的值


B: signed int (4 bits)
13 21sconst/16 vVV, #+BBBBA: destination register (8 bits)同上,给定立即数是16位的


B: signed int (16 bits)
14 31iconst vAA, #+BBBBBBBBA: destination register (8 bits)把一个数移动到一个寄存器中


B: arbitrary 32-bit constant
15 21hconst/high16 vAA, #+BBBB0000A: destination register (8 bits)将32位立即数的高16位移动到寄存器中


B: signed int (16 bits)
16 21sconst-wide/16 vAA, #+BBBBA: destination register (8 bits)同上,立即数位数是16位


B: signed int (16 bits)
17 31iconst-wide/32 vAA, #+BBBBBBBBA: destination register (8 bits)同上


B: signed int (32 bits)
18 51lconst-wide vAA, #+BBBBBBBBBBBBBBBBA: destination register (8 bits)同上,立即数位数是64位


B: arbitrary double-width (64-bit) constant
19 21hconst-wide/high16, vAA, #+BBBB000000000000A: destination register (8 bits)同上,但是只移动高16位


B: signed int (16 bits)
1a 21cconst-string vAA, string@BBBBA: destination register (8 bits)同上,但是后面的立即数表示在string常量池中的索引标号(16位)


B: string index
1b 31cconst-string/jumbo vAA, string@BBBBBBBBA: destination register (8 bits)同上,索引为32位


B: string index
1c 21cconst-class vAA, type@BBBBA: destination register (8 bits)同上,但是后面的立即数表示在type常量池


B: type index
1d 11xmonitor-enter vAAA: reference-bearing register (8 bits)申请一个指定对象的监视器
1e 11xmonitor-exit vAAA: reference-bearing register (8 bits)释放一个指定对象的监视器
1f 21ccheck-cast vAA, type@BBBBA: reference-bearing register (8 bits)如果vAA指定的类型与type@BBBB不匹配则抛出一个ClassCastException异常


B: type index (16 bits)vAA保存的是对象的引用,这个必须在运行时指定,B是一个索引引用
20 22cinstance-of vA, vB, type@CCCCA: destination register (4 bits)B是一个指定对象的引用,C是一个类型索引,如果B引用的类型是C所指则把B移动到A,否则把0移动到A


B: reference-bearing register (4 bits)由于B是一个对象的引用,如果C指向一个根类型索引。所以总是把0给A


C: type index (16 bits)
21 12xarray-length vA, vBA: destination register (4 bits)把B保存的数组的长度放置到A中


B: array reference-bearing register (4 bits)
22 21cnew-instance vAA, type@BBBBA: destination register (8 bits)将B所指的类型分配一个实例给A


B: type index
23 22cnew-array vA, vB, type@CCCCA: destination register (8 bits)按B所指定的大小,创建一个C类型的数组放置到A中


B: size register


C: type index
24 35cfilled-new-array {vC, vD, vE, vF, vG},A: array size and argument word count (4 bits)构造一个数组使用给定的类型与长度,使用提供的内容填充,这个类型必须是一个队列类型,这个队列的内容必须是有符号-单字

type@BBBBB: type index (16 bits)(例如,长×××或者双精度类型,引用对象类型也可以被接受)。返回实例的结果通过move-result-object指令保存


C..G: argument registers (4 bits each)
25 3rcfilled-new-array/rang {vCCCC .. vNNNN},A: array size and argument word count (8 bits)构造一个数组使用给定的类型与长度,填充它使用提供的内容。结果保存同上

type@BBBBB: type index (16 bits)例如:A的值是6,C是寄存器的索引是4,那么N = A + C - 1 = 9.


C: first argument register (16 bits)那么则寄存器索引范围是是4 - 9


N = A + C - 1
26 31tfill-array-data vAA, +BBBBBBBBA: array size and argument word count (8 bits)填充一个数组使用给定的数据类型。B保存的是一个数据表中的一个偏移值。通过这个偏移找到对应数据,然后填充给A。

(观看"fill-array-data-payload格式"补充说明)B: signed "branch" offset to table data pseudo-instruction (32 bits)
27 11xthrow vAAA: exception-bearing register (8 bits)抛出一个异常
28 10tgoto +AAA: signed branch offset (8 bits)无条件跳转到指定的指令。



+AA不能是0。
29 20tgoto/16 +AAAAA: signed branch offset (16 bits)16位偏移
2a 30tgoto/32 +AAAAAAAAA: signed branch offset (32 bits)32位偏移
2b 31tpacked-switch vAA, +BBBBBBBBA: register to test这个是模拟switch指令,A是switch(A),B是case,从B保存的数据表索引中取出数据与A做比较,如果相同则跳入到对应到指定的指令,否则继续执行下一条

参见"packed-switch-payload"格式补充说明B: signed "branch" offset to table data pseudo-instruction (32 bits)
2c 31tsparse-switch vAA, +BBBBBBBBA: register to test

参见“sparse-switch-payload”格式补充说明B: signed "branch" offset to table data pseudo-instruction (32 bits)这个是模拟switch指令,参照"sparse-switch-payload"格式
2d..31 23xcmpkind vAA, vBB, vCCA: destination register (8 bits)比较两个寄存器或者对保存的值,相等则vAA为0,1则C大于B,-1则C小于B。

2d:cmpl-float(lt bias)B: first source register or pair(lt bias)表示假设vBB小于vCC,如果小于则返回1,否则返回-1。

2e:cmpg-float(gt bias)C: second source register or pair(gt bias)表示假设按照大于比较

2f:cmpl-double(lt bias)


30:cmpg-double(gt bias)


31:cmp-long

32..37 22tif-test vA, vB, +CCCCA: first register to test (4 bits)比较两个寄存器的值作为分支指令。

32:if-eqB: second register to test (4 bits)A与B做比较。如果比较成功,则跳入到C指向的偏移指令处执行。否则执行下一条指令

33:if-neC: signed branch offset (16 bits)

34:if-lt


35:if-ge


36:if-gt


37:if-le

38..3d 21tif-testz vAA, +BBBBA: register to test (8 bits)A保存内容是否不为0,不为则跳转到B所指向的指令

38:if-eqzB: signed branch offset (16 bits)

39:if-nez


3a:if-ltz


3b:if-gez


3c:if-gtz


3d:if-lez

3e..43 10x(unused)
保留
44..51 23xarrayop vAA, vBB, vCCA: value register or pair; may be source or dest (8 bits)从B中取C指定的类型放置到A中

44:agetB: array register (8 bits)

45:aget-wideC: index register (8 bits)

46:aget-object


47:aget-boolean


48:aget-byte


49:aget-char


4a:aget-short


4b:aput


4c:aput-wide


4d:aput-object


4e:aput-boolean


4f:aput-byte


50:aput-char


51:aput-short

52..5f 22cIinstanceop vA, vB, field@CCCCA: value register or pair; may be source or dest (4 bits)同上,只不过C中保存的是类的成员变量的索引

52:igetB: object register (4 bits)

53:iget-wideC: instance field reference index (16 bits)

54:iget-object


55:iget-boolean


56:iget-byte


57:iget-char


58:iget-short


59:iput


5a:iput-wide


5b:iput-object


5c:iput-boolean


5d:iput-byte


5e:iput-char


5f:iput-short

60..6d 21cSstaticop vAA, field@BBBBA: value register or pair; may be source or dest (8 bits)同上只不过C中保存的是静态成员变量的索引

60:sgetB: static field reference index (16 bits)

61:sget-wide


62:sget-object


63:sget-boolean


64:sget-byte


65:sget-char


66:sget-short


68:sput-wide


69:sput-object


6a:sput-boolean


6b:sput-byte


6c:sput-char


6d:sput-short

6e..72 35cinvoke-kind {vC, vD, vE, vF, vG}, meth@BBBBA: argument word count (4 bits)C..G是参数队列,A是参数个数,B是方法的索引

6e:invoke-virtualB: method reference index (16 bits)

6f:invoke-superC..G: argument registers (4 bits each)

70:invoke-direct


71:invoke-static


72:invoke-interface

73 10x(unused)

74..78 3rcinvoke-kind/range{vCCCC .. vNNNN},meth@BBBBA: argument word count (8 bits)同上,只不过参数可以更长


B: method reference index (16 bits)


C: first argument register (16 bits)


N = A + C - 1
79..7a 10x(unused)

7b..8f 12xunop vA, vBA: destination register or pair (4 bits)取反,求补等操作

7b:neg-intB: source register or pair (4 bits)

7c:not-int


7d:neg-long


7e:not-long


7f:neg-float


80:neg-double


81:int-to-long


82:int-to-float


83:int-to-double


84:long-to-int


85:long-to-float


86:long-to-double


87:float-to-int


88:float-to-long


89:float-to-double


8a:double-to-int


8b:double-to-long


8c:double-to-float


8d:int-to-byte


8e:int-to-char


8f:int-to-short

90..af 23xbinop vAA, vBB, vCCA: destination register or pair (8 bits)算数操作

90:add-intB: first source register or pair (8 bits)

91:sub-intC: second source register or pair (8 bits)

92:mul-int


93:div-int


94:rem-int


95:and-int


96:or-int


97:xor-int


98:shl-int


99:shr-int


9a:ushr-int


9b:add-long


9c:sub-long


9d:mul-long


9e:div-long


9f:rem-long


a0:and-long


a1:or-long


a2:xor-long


a3:shl-long


a4:shr-long


a5:ushr-long


a6:add-float


a7:sub-float


a8:mul-float


a9:div-float


aa:rem-float


ab:add-double


ac:sub-double


ad:mul-double


ae:div-double


af:rem-double

b0..cf 12xbinop/2addr vA, vBA: destination and first source register or pair (4 bits)同上,只不过是操作的地址中的数据

b0:add-int/2addrB: second source register or pair (4 bits)

b1:sub-int/2addr


b2:mul-int/2addr


b3:div-int/2addr


b4:rem-int/2addr


b5:and-int/2addr


b6:or-int/2addr


b7:xor-int/2addr


b5:and-int/2addr


b6:or-int/2addr


b8:shl-int/2addr


b9:shr-int/2addr


ba:ushr-int/2addr


bb:add-long/2addr


bc:sub-long/2addr


bd:mul-long/2addr


be:div-long/2addr


bf:rem-long/2addr


c0:and-long/2addr


c1:or-long/2addr


c2:xor-long/2addr


c3:shl-long/2addr


c4:shr-long/2addr


c5:ushr-long/2addr


c6:add-float/2addr


c7:sub-float/2addr


c8:mul-float/2addr


c9:div-float/2addr


ca:rem-float/2addr


cb:add-double/2addr


cc:sub-double/2addr


cd:mul-double/2addr


ce:div-double/2addr


cf:rem-double/2addr

d0..d7 22sbinop/lit216 vA, vB, #+CCCCA: destination register (4 bits)B中所保存的值与C的立即数操作放置到A总

d0:add-int/lit16B: source register (4 bits)rsub-int没有后缀。

d1:rsub-int(revers subtractC: signed int constant (16 bits)

d2:mul-int/lit16


d3:div-int/lit16


d4:rem-int/lit16


d5:and-int/lit16


d6:or-int/lit16


d7:xor-int/lit16

d8..e2 22bbinop/lit8 vAA, vBB, #+CC同上只不过8位同上只不过8位
e3..fe 10x(unused)

ff -(扩展opcode)
ff是扩展OP选项,第二个OP是opcode码
00ff 41cconst-class/jumbo vAAAA, type@BBBBBBBBA: destination register (16 bits)从B中类型索引取出值放置到A中


B: type index (32 bits)
不写了和非扩展的指令相同


只不过是索引位数增大了


3 packed-switch-payload格式

NameFormatDescription
identushort=0x100伪-opcode
sizeushortswitch表的大小
firstkeyintswitch(firstkey)
targetsint[]分支值列表

这个就是模拟一个switch指令,总共大小为:(size * 2) + 4

4 sparse-switch-payload格式

NameFormatDescription
identushort=0x200伪-opcode
sizeushortswitch表的大小
keysint[]switch(keys)
targetsint[]分支值列表

总共大小为:(size * 4) + 2

5 fill-array-data-payload格式

NameFormatDescription
identushort=0x300伪-opcode
elementwidthushort元素的长度
sizeuint有几个元素
dataubyte[]数据值

总共大小为:(size * elementwidth + 1) / 2 + 4

6 数学运算详细说明

这个表不写了。就是表明了数学运算与C语言的对应。要符合IEEE754规则。