dalvik-bytecode.html粗略的翻译

最新推荐文章于 2024-08-08 16:15:46 发布

weixin_33739541

最新推荐文章于 2024-08-08 16:15:46 发布

阅读量142

点赞数

文章标签：运维 c/c++

原文链接：http://blog.51cto.com/devilogic/1213896

版权

昨天下午和两个朋友一起研究了一下这篇文章。大致写的都是dalvik虚拟机的编码。编码比较简单。以下是不太正规的翻译，有些还是不太明白。需要看代码再确定功能，尤其switch这块。比较晕。先这样吧。以后再说。

1 通用的设计

1.1 The machine model and calling conventions are meant to approximately imitate common real architectures and C-style calling conventions:

虚拟机的体系结构与调用约定近似于真实计算机的体系结构与C语言风格的调用约定：

1.1.1 The VM is register-based, and frames are fixed in size upon creation. Each frame consists of a particular number of registers (specified by the method) as well as any adjunct data needed to execute the method, such as (but not limited to) the program counter and a reference to the .dex file that contains the method.

dalvik虚拟机是基于寄存器的，每个方法体(frames)的运行空间在创建时已经被固定大小。每个方法体有一定数量的寄存器（通过方法指定）以及一些方法执行时所需要的附加的数据，例如（但不限于）程序计数器与包含此方法的.dex文件的引用。

1.1.2 When used for bit values (such as integers and floating point numbers), registers are considered 32 bits wide. Adjacent register pairs are used for 64-bit values. There is no alignment requirement for register pairs.

当使用位值时(例如×××与浮点数)，寄存器使用32位宽度，如果是64位则使用两个相邻的32位寄存器对来表示。没有做对齐处理对这个寄存器对。

1.1.3 When used for object references, registers are considered wide enough to hold exactly one such reference.

当使用对象引用，寄存器有足够的空间来保存对象的引用。

1.1.4 In terms of bitwise representation, (Object) null == (int) 0.

对于空对象的表示，(Object) null = (int) 0。

1.1.5 The N arguments to a method land in the last N registers of the method's invocation frame, in order. Wide arguments consume two registers. Instance methods are passed a this reference as their first argument.

如果一个方法有N个参数的话，那么使用方法体最后的N个寄存器传参，按照寄存器索引排序，例如：1-10个寄存器，方法有3个参数，那么则使用8,9,10寄存器作为参数。8为第一个参数。64位参数则使用两个寄存器。方法的第一个参数表示对象的实例引用。

1.2 The storage unit in the instruction stream is a 16-bit unsigned quantity. Some bits in some instructions are ignored / must-be-zero.

指令流中的最小存储单位是一个16位的无符号值。在指令中的一些位可能被忽略/设置为0。

1.3 Instructions aren't gratuitously limited to a particular type. For example, instructions that move 32-bit register values without interpretation don't have to specify whether they are moving ints or floats.

指令的操作类型并不仅限于特定的类型。例如。move指令,32位寄存器的值可以是整型也可以是浮点型。

1.4 There are separately enumerated and indexed constant pools for references to strings, types, fields, and methods.

字符串，类型，域，与方法在常量池中的引用都有一些特定的枚举值来表示。

1.5 Bitwise literal data is represented in-line in the instruction stream.

1.6 Because, in practice, it is uncommon for a method to need more than 16 registers, and because needing more than eight registers is reasonably common, many instructions are limited to only addressing the first 16 registers. When reasonably possible, instructions allow references to up to the first 256 registers. In addition, some instructions have variants that allow for much larger register counts, including a pair of catch-all move instructions that can address registers in the range v0 – v65535. In cases where an instruction variant isn't available to address a desired register, it is expected that the register contents get moved from the original register to a low register (before the operation) and/or moved from a low result register to a high register (after the operation).

一个方法一般不使用超过16个寄存器，对于一个普通的方法超过8个寄存器就足够用了，大部分指令用最开始的16个寄存器，如果需要的话，指令可以使用最多256个寄存器。—

1.7 There are several "pseudo-instructions" that are used to hold variable-length data payloads, which are referred to by regular instructions (for example, fill-array-data). Such instructions must never be encountered during the normal flow of execution. In addition, the instructions must be located on even-numbered bytecode offsets (that is, 4-byte aligned). In order to meet this requirement, dex generation tools must emit an extra nop instruction as a spacer if such an instruction would otherwise be unaligned. Finally, though not required, it is expected that most tools will choose to emit these instructions at the ends of methods, since otherwise it would likely be the case that additional instructions would be needed to branch around them.

有一些使用队列数据的"伪指令"，（例如，fill-array-data）。在正常的执行流中绝不会遇到。指令流存在于以偶数字节对齐的字节码偏移（以4字节对齐）。为了满足这种需求，dex生成工具使用额外的“nop”指令来填充对齐的空间。虽然不是必须的，大多数工具在方法末尾填充指定数量的“nop”来保证方法体的对齐粒度。如果不这样做的话，这些额外的“nop”将分布在每条指令的周围。

1.8 When installed on a running system, some instructions may be altered, changing their format, as an install-time static linking optimization. This is to allow for faster execution once linkage is known. See the associated instruction formats document for the suggested variants. The word "suggested" is used advisedly; it is not mandatory to implement these.

一条指令在静态链接优化过程中，可能会改变它们的格式目的是为了提高执行的速度，有些标志来表示哪些指令将被优化，如果没有添加这种标志优化动作不会强制执行，这些标志请观看"instruction formats document"查看这些"suggested"变量。

1.9 Human-syntax and mnemonics:

语法与助记符：

1.9.1 Dest-then-source ordering for arguments.

指令的参数第一个是目的然后是源。

1.9.2 Some opcodes have a disambiguating name suffix to indicate the type(s) they operate on:

一些opcode有一些特定名称的后缀来表明操作类型：

Type-general 32-bit opcodes are unmarked.

如果是32位opcode则不指明。
Type-general 64-bit opcodes are suffixed with -wide.

如果是64位opcode则使用“-wide”后缀。
Type-specific opcodes are suffixed with their type (or a straightforward abbreviation), one of: -boolean -byte -char -short -int -long -float -double -object -string -class -void.

通过类型后缀来表明opcode操作的类型，例如：-boolean -byte -char -short -int -long -float -double -object -string -class -void。
Some opcodes have a disambiguating suffix to distinguish otherwise-identical operations that have different instruction layouts or options. These suffixes are separated from the main names with a slash ("/") and mainly exist at all to make there be a one-to-one mapping with static constants in the code that generates and interprets executables (that is, to reduce ambiguity for humans).

有些指令有不同的选项，对于这些指令则使用一个“/”来区分不同的指令格式或者操作。
In the descriptions here, the width of a value (indicating, e.g., the range of a constant or the number of registers possibly addressed) is emphasized by the use of a character per four bits of width.

例如：vAA，表示8位，一个字母表示4位。
For example, in the instruction "move-wide/from16 vAA, vBBBB":

例如，"move-wide/from16 vAA, vBBBB":
1. "move" is the base opcode, indicating the base operation (move a register's value).
  
  “move”是基础opcode,表明基础操作。
2. "wide" is the name suffix, indicating that it operates on wide (64 bit) data.
  
  "wide"是名称后缀，表明操作64位数据。
3. "from16" is the opcode suffix, indicating a variant that has a 16-bit register reference as a source.
  
  "from16"是opcode后缀，表明以16位寄存器作为源寄存器。
4. "vAA" is the destination register (implied by the operation; again, the rule is that destination arguments always come first), which must be in the range v0 – v255.
  
  "vAA"是目的寄存器，总共有8位，范围是v0-v255。
5. "vBBBB" is the source register, which must be in the range v0 – v65535.
  
  "vBBBB"是源寄存器，总共有16位，范围是v0-v65535。

2 指令集合摘要

Op & Format	Mnemonic/Syntax	Arguments	Description
00 10x	nop		空循环。
			使用nop来作为"packed-switch-payload","sparse-switch-payload","fill-array-data-payload"格式的标记，
			在这种情况下opcode高字节表示原始数据。
01 12x	move vA, vB	A: destination register (4 bits)	Move the contents of one non-object register to another.
		B: source register (4 bits)
02 22x	move/from16 vAA, vBBBB	A: destination register (8 bits)	Move the contents of one non-object register to another.
		B: source register (16 bits)
03 32x	move/16 vAAAA, vBBBB	A: destination register (16 bits)	Move the contents of one non-object register to another.
		B: source register (16 bits)
04 12x	move-wide vA, vB	A: destination register pair (4 bits)	Move the contents of one register-pair to another. (移动相邻的寄存器对到另外一对)
		B: source register pair (4 bits)	移动vN寄存器相邻的寄存器vN-1或者vN+1对到对方
05 22x	move-wide/from16 vAA, vBBBB	A: destination register pair (8 bits)	同上
		B: source register pair (16 bits)
06 32x	move-wide/16 vAAAA, vBBBB	A: destination register pair (16 bits)	同上
		B: source register pair (16 bits)
07 12x	move-object vA, vB	A: destination register (4 bits)	Move the contents of one object-bearing register to another.
		B: source register (4 bits)
08 22x	move-object/from16 vAA, vBBBB	A: destination register (8 bits)	同上
		B: source register (16 bits)
09 32x	move-object/16 vAAAA, vBBBB	A: destination register (16 bits)	同上
		B: source register (16 bits)
0a 11x	move-result vAA	A: destination register (8 bits)	与invoke-kind指令成对使用，将它的调用结果移动到vAA寄存器中，返回类型是单字类型
0b 11x	move-result-wide vAA	A: destination register pair (8 bits)	同上，返回类型为双字类型
0c 11x	move-result-object vAA	A: destination register (8 bits)	同上，结果类型为对象类型，除了invoke-kind还有filled-new-array调用后必须使用其获取结果
0d 11x	move-exception vAA	A: destination register (8 bits)	保存一个异常触发时的结果，这条指令必须存在于异常处理例程的第一条指令。
0e 10x	return-void		Return from a void method.相当于 return (void)
0f 11x	return vAA	A: return value register (8 bits)	返回一个32位非对象值
10 11x	return-wide vAA	A: return value register-pair (8 bits)	返回一个64位值
11 11x	return-object vAA	A: return value register (8 bits)	返回一个对象类型的值
12 11n	const/4 vA, #+B	A: destination register (4 bits)	将有符号的4位的立即数扩展到有符号的32位的值
		B: signed int (4 bits)
13 21s	const/16 vVV, #+BBBB	A: destination register (8 bits)	同上，给定立即数是16位的
		B: signed int (16 bits)
14 31i	const vAA, #+BBBBBBBB	A: destination register (8 bits)	把一个数移动到一个寄存器中
		B: arbitrary 32-bit constant
15 21h	const/high16 vAA, #+BBBB0000	A: destination register (8 bits)	将32位立即数的高16位移动到寄存器中
		B: signed int (16 bits)
16 21s	const-wide/16 vAA, #+BBBB	A: destination register (8 bits)	同上,立即数位数是16位
		B: signed int (16 bits)
17 31i	const-wide/32 vAA, #+BBBBBBBB	A: destination register (8 bits)	同上
		B: signed int (32 bits)
18 51l	const-wide vAA, #+BBBBBBBBBBBBBBBB	A: destination register (8 bits)	同上，立即数位数是64位
		B: arbitrary double-width (64-bit) constant
19 21h	const-wide/high16, vAA, #+BBBB000000000000	A: destination register (8 bits)	同上，但是只移动高16位
		B: signed int (16 bits)
1a 21c	const-string vAA, string@BBBB	A: destination register (8 bits)	同上，但是后面的立即数表示在string常量池中的索引标号（16位）
		B: string index
1b 31c	const-string/jumbo vAA, string@BBBBBBBB	A: destination register (8 bits)	同上，索引为32位
		B: string index
1c 21c	const-class vAA, type@BBBB	A: destination register (8 bits)	同上，但是后面的立即数表示在type常量池
		B: type index
1d 11x	monitor-enter vAA	A: reference-bearing register (8 bits)	申请一个指定对象的监视器
1e 11x	monitor-exit vAA	A: reference-bearing register (8 bits)	释放一个指定对象的监视器
1f 21c	check-cast vAA, type@BBBB	A: reference-bearing register (8 bits)	如果vAA指定的类型与type@BBBB不匹配则抛出一个ClassCastException异常
		B: type index (16 bits)	vAA保存的是对象的引用，这个必须在运行时指定，B是一个索引引用
20 22c	instance-of vA, vB, type@CCCC	A: destination register (4 bits)	B是一个指定对象的引用，C是一个类型索引，如果B引用的类型是C所指则把B移动到A，否则把0移动到A
		B: reference-bearing register (4 bits)	由于B是一个对象的引用，如果C指向一个根类型索引。所以总是把0给A
		C: type index (16 bits)
21 12x	array-length vA, vB	A: destination register (4 bits)	把B保存的数组的长度放置到A中
		B: array reference-bearing register (4 bits)
22 21c	new-instance vAA, type@BBBB	A: destination register (8 bits)	将B所指的类型分配一个实例给A
		B: type index
23 22c	new-array vA, vB, type@CCCC	A: destination register (8 bits)	按B所指定的大小，创建一个C类型的数组放置到A中
		B: size register
		C: type index
24 35c	filled-new-array {vC, vD, vE, vF, vG},	A: array size and argument word count (4 bits)	构造一个数组使用给定的类型与长度，使用提供的内容填充，这个类型必须是一个队列类型，这个队列的内容必须是有符号-单字
	type@BBBB	B: type index (16 bits)	（例如，长×××或者双精度类型，引用对象类型也可以被接受）。返回实例的结果通过move-result-object指令保存
		C..G: argument registers (4 bits each)
25 3rc	filled-new-array/rang {vCCCC .. vNNNN},	A: array size and argument word count (8 bits)	构造一个数组使用给定的类型与长度，填充它使用提供的内容。结果保存同上
	type@BBBB	B: type index (16 bits)	例如：A的值是6,C是寄存器的索引是4,那么N = A + C - 1 = 9.
		C: first argument register (16 bits)	那么则寄存器索引范围是是4 - 9
		N = A + C - 1
26 31t	fill-array-data vAA, +BBBBBBBB	A: array size and argument word count (8 bits)	填充一个数组使用给定的数据类型。B保存的是一个数据表中的一个偏移值。通过这个偏移找到对应数据，然后填充给A。
	(观看"fill-array-data-payload格式"补充说明)	B: signed "branch" offset to table data pseudo-instruction (32 bits)
27 11x	throw vAA	A: exception-bearing register (8 bits)	抛出一个异常
28 10t	goto +AA	A: signed branch offset (8 bits)	无条件跳转到指定的指令。
			+AA不能是0。
29 20t	goto/16 +AAAA	A: signed branch offset (16 bits)	16位偏移
2a 30t	goto/32 +AAAAAAAA	A: signed branch offset (32 bits)	32位偏移
2b 31t	packed-switch vAA, +BBBBBBBB	A: register to test	这个是模拟switch指令,A是switch(A)，B是case，从B保存的数据表索引中取出数据与A做比较，如果相同则跳入到对应到指定的指令，否则继续执行下一条
	参见"packed-switch-payload"格式补充说明	B: signed "branch" offset to table data pseudo-instruction (32 bits)
2c 31t	sparse-switch vAA, +BBBBBBBB	A: register to test
	参见“sparse-switch-payload”格式补充说明	B: signed "branch" offset to table data pseudo-instruction (32 bits)	这个是模拟switch指令，参照"sparse-switch-payload"格式
2d..31 23x	cmpkind vAA, vBB, vCC	A: destination register (8 bits)	比较两个寄存器或者对保存的值，相等则vAA为0,1则C大于B，-1则C小于B。
	2d:cmpl-float(lt bias)	B: first source register or pair	(lt bias)表示假设vBB小于vCC，如果小于则返回1,否则返回-1。
	2e:cmpg-float(gt bias)	C: second source register or pair	(gt bias)表示假设按照大于比较
	2f:cmpl-double(lt bias)
	30:cmpg-double(gt bias)
	31:cmp-long
32..37 22t	if-test vA, vB, +CCCC	A: first register to test (4 bits)	比较两个寄存器的值作为分支指令。
	32:if-eq	B: second register to test (4 bits)	A与B做比较。如果比较成功,则跳入到C指向的偏移指令处执行。否则执行下一条指令
	33:if-ne	C: signed branch offset (16 bits)
	34:if-lt
	35:if-ge
	36:if-gt
	37:if-le
38..3d 21t	if-testz vAA, +BBBB	A: register to test (8 bits)	A保存内容是否不为0，不为则跳转到B所指向的指令
	38:if-eqz	B: signed branch offset (16 bits)
	39:if-nez
	3a:if-ltz
	3b:if-gez
	3c:if-gtz
	3d:if-lez
3e..43 10x	(unused)		保留
44..51 23x	arrayop vAA, vBB, vCC	A: value register or pair; may be source or dest (8 bits)	从B中取C指定的类型放置到A中
	44:aget	B: array register (8 bits)
	45:aget-wide	C: index register (8 bits)
	46:aget-object
	47:aget-boolean
	48:aget-byte
	49:aget-char
	4a:aget-short
	4b:aput
	4c:aput-wide
	4d:aput-object
	4e:aput-boolean
	4f:aput-byte
	50:aput-char
	51:aput-short
52..5f 22c	Iinstanceop vA, vB, field@CCCC	A: value register or pair; may be source or dest (4 bits)	同上，只不过C中保存的是类的成员变量的索引
	52:iget	B: object register (4 bits)
	53:iget-wide	C: instance field reference index (16 bits)
	54:iget-object
	55:iget-boolean
	56:iget-byte
	57:iget-char
	58:iget-short
	59:iput
	5a:iput-wide
	5b:iput-object
	5c:iput-boolean
	5d:iput-byte
	5e:iput-char
	5f:iput-short
60..6d 21c	Sstaticop vAA, field@BBBB	A: value register or pair; may be source or dest (8 bits)	同上只不过C中保存的是静态成员变量的索引
	60:sget	B: static field reference index (16 bits)
	61:sget-wide
	62:sget-object
	63:sget-boolean
	64:sget-byte
	65:sget-char
	66:sget-short
	68:sput-wide
	69:sput-object
	6a:sput-boolean
	6b:sput-byte
	6c:sput-char
	6d:sput-short
6e..72 35c	invoke-kind {vC, vD, vE, vF, vG}, meth@BBBB	A: argument word count (4 bits)	C..G是参数队列，A是参数个数,B是方法的索引
	6e:invoke-virtual	B: method reference index (16 bits)
	6f:invoke-super	C..G: argument registers (4 bits each)
	70:invoke-direct
	71:invoke-static
	72:invoke-interface
73 10x	(unused)
74..78 3rc	invoke-kind/range{vCCCC .. vNNNN},meth@BBBB	A: argument word count (8 bits)	同上，只不过参数可以更长
		B: method reference index (16 bits)
		C: first argument register (16 bits)
		N = A + C - 1
79..7a 10x	(unused)
7b..8f 12x	unop vA, vB	A: destination register or pair (4 bits)	取反,求补等操作
	7b:neg-int	B: source register or pair (4 bits)
	7c:not-int
	7d:neg-long
	7e:not-long
	7f:neg-float
	80:neg-double
	81:int-to-long
	82:int-to-float
	83:int-to-double
	84:long-to-int
	85:long-to-float
	86:long-to-double
	87:float-to-int
	88:float-to-long
	89:float-to-double
	8a:double-to-int
	8b:double-to-long
	8c:double-to-float
	8d:int-to-byte
	8e:int-to-char
	8f:int-to-short
90..af 23x	binop vAA, vBB, vCC	A: destination register or pair (8 bits)	算数操作
	90:add-int	B: first source register or pair (8 bits)
	91:sub-int	C: second source register or pair (8 bits)
	92:mul-int
	93:div-int
	94:rem-int
	95:and-int
	96:or-int
	97:xor-int
	98:shl-int
	99:shr-int
	9a:ushr-int
	9b:add-long
	9c:sub-long
	9d:mul-long
	9e:div-long
	9f:rem-long
	a0:and-long
	a1:or-long
	a2:xor-long
	a3:shl-long
	a4:shr-long
	a5:ushr-long
	a6:add-float
	a7:sub-float
	a8:mul-float
	a9:div-float
	aa:rem-float
	ab:add-double
	ac:sub-double
	ad:mul-double
	ae:div-double
	af:rem-double
b0..cf 12x	binop/2addr vA, vB	A: destination and first source register or pair (4 bits)	同上，只不过是操作的地址中的数据
	b0:add-int/2addr	B: second source register or pair (4 bits)
	b1:sub-int/2addr
	b2:mul-int/2addr
	b3:div-int/2addr
	b4:rem-int/2addr
	b5:and-int/2addr
	b6:or-int/2addr
	b7:xor-int/2addr
	b5:and-int/2addr
	b6:or-int/2addr
	b8:shl-int/2addr
	b9:shr-int/2addr
	ba:ushr-int/2addr
	bb:add-long/2addr
	bc:sub-long/2addr
	bd:mul-long/2addr
	be:div-long/2addr
	bf:rem-long/2addr
	c0:and-long/2addr
	c1:or-long/2addr
	c2:xor-long/2addr
	c3:shl-long/2addr
	c4:shr-long/2addr
	c5:ushr-long/2addr
	c6:add-float/2addr
	c7:sub-float/2addr
	c8:mul-float/2addr
	c9:div-float/2addr
	ca:rem-float/2addr
	cb:add-double/2addr
	cc:sub-double/2addr
	cd:mul-double/2addr
	ce:div-double/2addr
	cf:rem-double/2addr
d0..d7 22s	binop/lit216 vA, vB, #+CCCC	A: destination register (4 bits)	B中所保存的值与C的立即数操作放置到A总
	d0:add-int/lit16	B: source register (4 bits)	rsub-int没有后缀。
	d1:rsub-int(revers subtract	C: signed int constant (16 bits)
	d2:mul-int/lit16
	d3:div-int/lit16
	d4:rem-int/lit16
	d5:and-int/lit16
	d6:or-int/lit16
	d7:xor-int/lit16
d8..e2 22b	binop/lit8 vAA, vBB, #+CC	同上只不过8位	同上只不过8位
e3..fe 10x	(unused)
ff -	(扩展opcode)		ff是扩展OP选项，第二个OP是opcode码
00ff 41c	const-class/jumbo vAAAA, type@BBBBBBBB	A: destination register (16 bits)	从B中类型索引取出值放置到A中
		B: type index (32 bits)
不写了和非扩展的指令相同
只不过是索引位数增大了

3 packed-switch-payload格式

Name	Format	Description
ident	ushort=0x100	伪-opcode
size	ushort	switch表的大小
first_key	int	switch(first_key)
targets	int[]	分支值列表

这个就是模拟一个switch指令，总共大小为：(size * 2) + 4

4 sparse-switch-payload格式

Name	Format	Description
ident	ushort=0x200	伪-opcode
size	ushort	switch表的大小
keys	int[]	switch(keys)
targets	int[]	分支值列表

总共大小为：(size * 4) + 2

5 fill-array-data-payload格式

Name	Format	Description
ident	ushort=0x300	伪-opcode
element_width	ushort	元素的长度
size	uint	有几个元素
data	ubyte[]	数据值

总共大小为：(size * element_width + 1) / 2 + 4

6 数学运算详细说明

这个表不写了。就是表明了数学运算与C语言的对应。要符合IEEE754规则。

转载于:https://blog.51cto.com/devilogic/1213896

weixin_33739541

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
dalvik-bytecode.html粗略的翻译

昨天下午和两个朋友一起研究了一下这篇文章。大致写的都是dalvik虚拟机的编码。编码比较简单。以下是不太正规的翻译，有些还是不太明白。需要看代码再确定功能，尤其switch这块。比较晕。先这样吧。以后再说。1 通用的设计1.1 The machine model and calling conventions are meant to approximately imitat...
复制链接

扫一扫