前言
在逆向的时候遇到过反编译 py 字节码,之前也就没咋在意,啥不会查就完事儿了,好家伙,省赛让我给遇到了,直接嘤嘤嘤😭,但还好解出来了;
今天趁这个机会,系统的学习一下,以防下次阴沟里翻船,本博文的 Python 版本是3.8.5,版本不同形成的字节码会略有不同,但是大同小异;
【记】2021年第四届浙江省大学生网络安全技能挑战赛:
什么是 py 字节码?
Python 代码先被编译为字节码后,再由 Python 虚拟机来执行字节码,Python 的字节码是一种类似汇编指令的中间语言,一个 Python 语句会对应若干字节码指令,虚拟机一条一条执行字节码指令,从而完成程序执行。
Python 的 dis
模块支持对 Python 代码进行反汇编, 生成字节码指令。
结构:
源码行号 | 指令在函数中的偏移 | 指令符号 | 指令参数 | 实际参数值
源码:
str = [88, 117, 124, 124, 127, 48, 71, 127, 98, 124, 116, 48, 61, 61, 121, 116, 33, 32, 100, 62]
def test():
for st in str:
print(chr(st^16),end='')
test()
字节码:
1 0 LOAD_CONST 0 (0)
2 LOAD_CONST 1 (None)
4 IMPORT_NAME 0 (dis)
6 STORE_NAME 0 (dis)
3 8 LOAD_CONST 2 (88)
10 LOAD_CONST 3 (117)
12 LOAD_CONST 4 (124)
14 LOAD_CONST 4 (124)
16 LOAD_CONST 5 (127)
18 LOAD_CONST 6 (48)
20 LOAD_CONST 7 (71)
22 LOAD_CONST 5 (127)
24 LOAD_CONST 8 (98)
26 LOAD_CONST 4 (124)
28 LOAD_CONST 9 (116)
30 LOAD_CONST 6 (48)
32 LOAD_CONST 10 (61)
34 LOAD_CONST 10 (61)
36 LOAD_CONST 11 (121)
38 LOAD_CONST 9 (116)
40 LOAD_CONST 12 (33)
42 LOAD_CONST 13 (32)
44 LOAD_CONST 14 (100)
46 LOAD_CONST 15 (62)
48 BUILD_LIST 20
50 STORE_NAME 1 (str)
5 52 LOAD_CONST 16 (<code object test at 0x0170E2F8, file "1.py", line 5>)
54 LOAD_CONST 17 ('test')
56 MAKE_FUNCTION 0
58 STORE_NAME 2 (test)
9 60 LOAD_NAME 2 (test)
62 CALL_FUNCTION 0
64 POP_TOP
66 LOAD_CONST 1 (None)
68 RETURN_VALUE
Disassembly of <code object test at 0x0170E2F8, file "1.py", line 5>:
6 0 LOAD_GLOBAL 0 (str)
2 GET_ITER
>> 4 FOR_ITER 24 (to 30)
6 STORE_FAST 0 (st)
7 8 LOAD_GLOBAL 1 (print)
10 LOAD_GLOBAL 2 (chr)
12 LOAD_FAST 0 (st)
14 LOAD_CONST 1 (16)
16 BINARY_XOR
18 CALL_FUNCTION 1
20 LOAD_CONST 2 ('')
22 LOAD_CONST 3 (('end',))
24 CALL_FUNCTION_KW 2
26 POP_TOP
28 JUMP_ABSOLUTE 4
>> 30 LOAD_CONST 0 (None)
32 RETURN_VALUE
稍后会详细介绍;
变量
1、CONST
LOAD_CONST
加载 const 变量,比如数值、字符串等等,一般用于传给函数的参数;
11 52 LOAD_NAME 2 (test)
54 LOAD_CONST 16 ('nice')
56 CALL_FUNCTION 1
58 POP_TOP
test('nice')
2、局部变量
LOAD_FAST
一般加载局部变量的值,也就是读取值,用于计算或者函数调用传参等;STORE_FAST
一般用于保存值到局部变量;
61 77 LOAD_FAST 0 (n)
80 LOAD_FAST 3 (p)
83 INPLACE_DIVIDE
84 STORE_FAST 0 (n)
n = n / p
那问题来了,函数的形参也是局部变量,如何区分出是函数形参还是其他局部变量呢?
我们可以自己写一段代码推敲一下:
import dis
str = ''
def test(arg):
str = 'idi10t'
print(arg,str)
dis.dis(test)
6 0 LOAD_CONST 1 ('idi10t')
2 STORE_FAST 1 (str)
7 4 LOAD_GLOBAL 0 (print)
6 LOAD_FAST 0 (arg)
8 LOAD_FAST 1 (str)
10 CALL_FUNCTION 2
12 POP_TOP
14 LOAD_CONST 0 (None)
16 RETURN_VALUE
可以得出结论:形参没有初始化,也就是从函数开始到 LOAD_FAST
该变量的位置,如果没有看到 STORE_FAST
,那么该变量就是函数形参;而其他局部变量在使用之前肯定会使用 STORE_FAST
进行初始化。
3、全局变量
LOAD_GLOBAL
用来加载全局变量,包括指定函数名,类名,模块名等全局符号;STORE_GLOBAL
用来给全局变量赋值;
import dis
def test():
global str
str = 'idi10t'
print(str)
dis.dis(test)
5 0 LOAD_CONST 1 ('idi10t')
2 STORE_GLOBAL 0 (str)
6 4 LOAD_GLOBAL 1 (print)
6 LOAD_GLOBAL 0 (str)
8 CALL_FUNCTION 1
10 POP_TOP
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
常用数据类型
list
BUILD_LIST
用于创建一个 list 结构:
str = [88, 117, 124, 124, 127, 48, 71, 127, 98, 124, 116, 48, 61, 61, 121, 116, 33, 32, 100, 62]
3 8 LOAD_CONST 2 (88)
10 LOAD_CONST 3 (117)
12 LOAD_CONST 4 (124)
14 LOAD_CONST 4 (124)
16 LOAD_CONST 5 (127)
18 LOAD_CONST 6 (48)
20 LOAD_CONST 7 (71)
22 LOAD_CONST 5 (127)
24 LOAD_CONST 8 (98)
26 LOAD_CONST 4 (124)
28 LOAD_CONST 9 (116)
30 LOAD_CONST 6 (48)
32 LOAD_CONST 10 (61)
34 LOAD_CONST 10 (61)
36 LOAD_CONST 11 (121)
38 LOAD_CONST 9 (116)
40 LOAD_CONST 12 (33)
42 LOAD_CONST 13 (32)
44 LOAD_CONST 14 (100)
46 LOAD_CONST 15 (62)
48 BUILD_LIST 20
50 STORE_NAME 1 (str)
再看看另一种的 list 创建方式:
str = [88, 117, 124, 124, 127, 48, 71, 127, 98, 124, 116, 48, 61, 61, 121, 116, 33, 32, 100, 62]
[x for x in str if x != 48]
1 0 LOAD_CONST 0 (88)
2 LOAD_CONST 1 (117)
4 LOAD_CONST 2 (124)
6 LOAD_CONST 2 (124)
8 LOAD_CONST 3 (127)
10 LOAD_CONST 4 (48)
12 LOAD_CONST 5 (71)
14 LOAD_CONST 3 (127)
16 LOAD_CONST 6 (98)
18 LOAD_CONST 2 (124)
20 LOAD_CONST 7 (116)
22 LOAD_CONST 4 (48)
24 LOAD_CONST 8 (61)
26 LOAD_CONST 8 (61)
28 LOAD_CONST 9 (121)
30 LOAD_CONST 7 (116)
32 LOAD_CONST 10 (33)
34 LOAD_CONST 11 (32)
36 LOAD_CONST 12 (100)
38 LOAD_CONST 13 (62)
40 BUILD_LIST 20
42 STORE_NAME 0 (str)
3 44 LOAD_CONST 14 (<code object <listcomp> at 0x016FE2F8, file "1.py", line 3>)
46 LOAD_CONST 15 ('<listcomp>')
48 MAKE_FUNCTION 0
50 LOAD_NAME 0 (str)
52 GET_ITER
54 CALL_FUNCTION 1
56 POP_TOP
58 LOAD_CONST 16 (None)
60 RETURN_VALUE
Disassembly of <code object <listcomp> at 0x016FE2F8, file "1.py", line 3>:
3 0 BUILD_LIST 0 # 创建 list,为赋值给某变量
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 16 (to 22)
6 STORE_FAST 1 (x)
8 LOAD_FAST 1 (x)
10 LOAD_CONST 0 (48)
12 COMPARE_OP 3 (!=)
14 POP_JUMP_IF_FALSE 4 # 不满足条件则 break
16 LOAD_FAST 1 (x) # 读取满足条件的 x
18 LIST_APPEND 2 # 把每个满足条件的 x 存入 list
20 JUMP_ABSOLUTE 4
>> 22 RETURN_VALUE
dict
BUILD_MAP
用于创建一个空的 dict;STORE_NAME
用于初始化 dict 的内容;
str = {'name' : 'id10t'}
str['age'] = 3
1 0 LOAD_CONST 0 ('name')
2 LOAD_CONST 1 ('id10t')
4 BUILD_MAP 1
6 STORE_NAME 0 (str)
2 8 LOAD_CONST 2 (3)
10 LOAD_NAME 0 (str)
12 LOAD_CONST 3 ('age')
14 STORE_SUBSCR
16 LOAD_CONST 4 (None)
18 RETURN_VALUE
slice
这里直接借用了大佬博文的数据;
BUILD_SLICE
用于创建 slice,对于 list、元组、字符串都可以使用 slice 的方式进行访问。
但是要注意 BUILD_SLICE
用于 [x:y:z] 这种类型的 slice,结合 BINARY_SUBSCR
读取 slice 的值,结合 STORE_SUBSCR
用于修改 slice
的值。
另外 SLICE + n
用于 [a:b] 类型的访问,STORE_SLICE + n
用于 [a:b] 类型的修改,其中 n 表示如下:
SLICE+0()
Implements TOS = TOS[:].
SLICE+1()
Implements TOS = TOS1[TOS:].
SLICE+2()
Implements TOS = TOS1[:TOS].
SLICE+3()
Implements TOS = TOS2[TOS1:TOS].
13 0 LOAD_CONST 1 (1)
3 LOAD_CONST 2 (2)
6 LOAD_CONST 3 (3)
9 BUILD_LIST 3
12 STORE_FAST 0 (k1) //k1 = [1, 2, 3]
14 15 LOAD_CONST 4 (10)
18 BUILD_LIST 1
21 LOAD_FAST 0 (k1)
24 LOAD_CONST 5 (0)
27 LOAD_CONST 1 (1)
30 LOAD_CONST 1 (1)
33 BUILD_SLICE 3
36 STORE_SUBSCR //k1[0:1:1] = [10]
15 37 LOAD_CONST 6 (11)
40 BUILD_LIST 1
43 LOAD_FAST 0 (k1)
46 LOAD_CONST 1 (1)
49 LOAD_CONST 2 (2)
52 STORE_SLICE+3 //k1[1:2] = [11]
16 53 LOAD_FAST 0 (k1)
56 LOAD_CONST 1 (1)
59 LOAD_CONST 2 (2)
62 SLICE+3
63 STORE_FAST 1 (a) //a = k1[1:2]
17 66 LOAD_FAST 0 (k1)
69 LOAD_CONST 5 (0)
72 LOAD_CONST 1 (1)
75 LOAD_CONST 1 (1)
78 BUILD_SLICE 3
81 BINARY_SUBSCR
82 STORE_FAST 2 (b) //b = k1[0:1:1]
循环
while
Python3.8 及以上就没有 SETUP_LOOP
了,
大致意思就是将循环块送入到堆栈当中去,
i = 0
while i < 10:
i += 1
1 0 LOAD_CONST 0 (0)
2 STORE_NAME 0 (i)
2 >> 4 LOAD_NAME 0 (i)
6 LOAD_CONST 1 (10)
8 COMPARE_OP 0 (<)
10 POP_JUMP_IF_FALSE 22
3 12 LOAD_NAME 0 (i)
14 LOAD_CONST 2 (1)
16 INPLACE_ADD
18 STORE_NAME 0 (i)
20 JUMP_ABSOLUTE 4
>> 22 LOAD_CONST 3 (None)
24 RETURN_VALUE
for
Python 中典型的 for in
结构:
for i in range(8):
2 4 LOAD_NAME 1 (range)
6 LOAD_CONST 1 (8)
8 CALL_FUNCTION 1
10 GET_ITER
>> 12 FOR_ITER 38 (to 52)
14 STORE_NAME 2 (i)
...
50 JUMP_ABSOLUTE 12
>> 52 LOAD_CONST 6 (None)
if
POP_JUMP_IF_FALSE
和 JUMP_FORWARD
一般用于分支判断跳转:
POP_JUMP_IF_FALSE
表示条件结果为FALSE
就跳转到目标偏移指令;JUMP_FORWARD
直接跳转到目标偏移指令;
i = 0
if i < 5:
print('i < 5')
elif i > 5:
print('i > 5')
else:
print('i = 5')
1 0 LOAD_CONST 0 (0)
2 STORE_NAME 0 (i)
2 4 LOAD_NAME 0 (i)
6 LOAD_CONST 1 (5)
8 COMPARE_OP 0 (<)
10 POP_JUMP_IF_FALSE 22
3 12 LOAD_NAME 1 (print)
14 LOAD_CONST 2 ('i < 5')
16 CALL_FUNCTION 1
18 POP_TOP
20 JUMP_FORWARD 26 (to 48)
4 >> 22 LOAD_NAME 0 (i)
24 LOAD_CONST 1 (5)
26 COMPARE_OP 4 (>)
28 POP_JUMP_IF_FALSE 40
5 30 LOAD_NAME 1 (print)
32 LOAD_CONST 3 ('i > 5')
34 CALL_FUNCTION 1
36 POP_TOP
38 JUMP_FORWARD 8 (to 48)
7 >> 40 LOAD_NAME 1 (print)
42 LOAD_CONST 4 ('i = 5')
44 CALL_FUNCTION 1
46 POP_TOP
>> 48 LOAD_CONST 5 (None)
50 RETURN_VALUE
其他指令
上述就是比较常用的一些指令了,当然还有更多的指令,这里就不一一介绍了,详情见官方文档,这里的是 Python3.8 版本的官方文档;
后记
开卷有益,多多益善;
参考: