python pyc文件解析,如何了解.pyc文件内容

I have a .pyc file. I need to understand the content of that file to know how the disassembler works of python, i.e. how can I generate a output like dis.dis(function) from .pyc file content.

for e.g.

>>> def sqr(x):

... return x*x

...

>>> import dis

>>> dis.dis(sqr)

2 0 LOAD_FAST 0 (x)

3 LOAD_FAST 0 (x)

6 BINARY_MULTIPLY

7 RETURN_VALUE

I need to get a output like this using the .pyc file.

解决方案

.pyc files contain some metadata and a marshaled code object; to load the code object and disassemble that use:

import dis, marshal, sys

header_sizes = [

# (size, first version this applies to)

# pyc files were introduced in 0.9.2 way, way back in June 1991.

(8, (0, 9, 2)), # 2 bytes magic number, \r\n, 4 bytes UNIX timestamp

(12, (3, 6)), # added 4 bytes file size

# bytes 4-8 are flags, meaning of 9-16 depends on what flags are set

# bit 0 not set: 9-12 timestamp, 13-16 file size

# bit 0 set: 9-16 file hash (SipHash-2-4, k0 = 4 bytes of the file, k1 = 0)

(16, (3, 7)), # inserted 4 bytes bit flag field at 4-8

# future version may add more bytes still, at which point we can extend

# this table. It is correct for Python versions up to 3.9

]

header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)

with open(pycfile, "rb") as f:

metadata = f.read(header_size) # first header_size bytes are metadata

code = marshal.load(f) # rest is a marshalled code object

dis.dis(code)

Demo with the bisect module:

>>> import bisect

>>> import dis, marshal

>>> import sys

>>> header_sizes = [(8, (0, 9, 2)), (12, (3, 6)), (16, (3, 7))]

>>> header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)

>>> pycfile = getattr(bisect, '__cached__', pycfile.__file__)

>>> with open(pycfile, "rb") as f:

... metadata = f.read(header_size) # first header_size bytes are metadata

... code = marshal.load(f) # rest is bytecode

...

>>> dis.dis(code)

1 0 LOAD_CONST 0 ('Bisection algorithms.')

2 STORE_NAME 0 (__doc__)

3 4 LOAD_CONST 12 ((0, None))

6 LOAD_CONST 3 ()

8 LOAD_CONST 4 ('insort_right')

10 MAKE_FUNCTION 1 (defaults)

12 STORE_NAME 1 (insort_right)

15 14 LOAD_CONST 13 ((0, None))

16 LOAD_CONST 5 ()

18 LOAD_CONST 6 ('bisect_right')

20 MAKE_FUNCTION 1 (defaults)

22 STORE_NAME 2 (bisect_right)

36 24 LOAD_CONST 14 ((0, None))

26 LOAD_CONST 7 ()

28 LOAD_CONST 8 ('insort_left')

30 MAKE_FUNCTION 1 (defaults)

32 STORE_NAME 3 (insort_left)

49 34 LOAD_CONST 15 ((0, None))

36 LOAD_CONST 9 ()

38 LOAD_CONST 10 ('bisect_left')

40 MAKE_FUNCTION 1 (defaults)

42 STORE_NAME 4 (bisect_left)

71 44 SETUP_FINALLY 12 (to 58)

72 46 LOAD_CONST 1 (0)

48 LOAD_CONST 11 (('*',))

50 IMPORT_NAME 5 (_bisect)

52 IMPORT_STAR

54 POP_BLOCK

56 JUMP_FORWARD 20 (to 78)

73 >> 58 DUP_TOP

60 LOAD_NAME 6 (ImportError)

62 COMPARE_OP 10 (exception match)

64 POP_JUMP_IF_FALSE 76

66 POP_TOP

68 POP_TOP

70 POP_TOP

74 72 POP_EXCEPT

74 JUMP_FORWARD 2 (to 78)

>> 76 END_FINALLY

77 >> 78 LOAD_NAME 2 (bisect_right)

80 STORE_NAME 7 (bisect)

78 82 LOAD_NAME 1 (insort_right)

84 STORE_NAME 8 (insort)

86 LOAD_CONST 2 (None)

88 RETURN_VALUE

Disassembly of :

12 0 LOAD_GLOBAL 0 (bisect_right)

2 LOAD_FAST 0 (a)

4 LOAD_FAST 1 (x)

6 LOAD_FAST 2 (lo)

8 LOAD_FAST 3 (hi)

10 CALL_FUNCTION 4

12 STORE_FAST 2 (lo)

13 14 LOAD_FAST 0 (a)

16 LOAD_METHOD 1 (insert)

18 LOAD_FAST 2 (lo)

20 LOAD_FAST 1 (x)

22 CALL_METHOD 2

24 POP_TOP

26 LOAD_CONST 1 (None)

28 RETURN_VALUE

Disassembly of :

26 0 LOAD_FAST 2 (lo)

2 LOAD_CONST 1 (0)

4 COMPARE_OP 0 (

6 POP_JUMP_IF_FALSE 16

27 8 LOAD_GLOBAL 0 (ValueError)

10 LOAD_CONST 2 ('lo must be non-negative')

12 CALL_FUNCTION 1

14 RAISE_VARARGS 1

28 >> 16 LOAD_FAST 3 (hi)

18 LOAD_CONST 3 (None)

20 COMPARE_OP 8 (is)

22 POP_JUMP_IF_FALSE 32

29 24 LOAD_GLOBAL 1 (len)

26 LOAD_FAST 0 (a)

28 CALL_FUNCTION 1

30 STORE_FAST 3 (hi)

30 >> 32 LOAD_FAST 2 (lo)

34 LOAD_FAST 3 (hi)

36 COMPARE_OP 0 (

38 POP_JUMP_IF_FALSE 80

31 40 LOAD_FAST 2 (lo)

42 LOAD_FAST 3 (hi)

44 BINARY_ADD

46 LOAD_CONST 4 (2)

48 BINARY_FLOOR_DIVIDE

50 STORE_FAST 4 (mid)

32 52 LOAD_FAST 1 (x)

54 LOAD_FAST 0 (a)

56 LOAD_FAST 4 (mid)

58 BINARY_SUBSCR

60 COMPARE_OP 0 (

62 POP_JUMP_IF_FALSE 70

64 LOAD_FAST 4 (mid)

66 STORE_FAST 3 (hi)

68 JUMP_ABSOLUTE 32

33 >> 70 LOAD_FAST 4 (mid)

72 LOAD_CONST 5 (1)

74 BINARY_ADD

76 STORE_FAST 2 (lo)

78 JUMP_ABSOLUTE 32

34 >> 80 LOAD_FAST 2 (lo)

82 RETURN_VALUE

Disassembly of :

45 0 LOAD_GLOBAL 0 (bisect_left)

2 LOAD_FAST 0 (a)

4 LOAD_FAST 1 (x)

6 LOAD_FAST 2 (lo)

8 LOAD_FAST 3 (hi)

10 CALL_FUNCTION 4

12 STORE_FAST 2 (lo)

46 14 LOAD_FAST 0 (a)

16 LOAD_METHOD 1 (insert)

18 LOAD_FAST 2 (lo)

20 LOAD_FAST 1 (x)

22 CALL_METHOD 2

24 POP_TOP

26 LOAD_CONST 1 (None)

28 RETURN_VALUE

Disassembly of :

60 0 LOAD_FAST 2 (lo)

2 LOAD_CONST 1 (0)

4 COMPARE_OP 0 (

6 POP_JUMP_IF_FALSE 16

61 8 LOAD_GLOBAL 0 (ValueError)

10 LOAD_CONST 2 ('lo must be non-negative')

12 CALL_FUNCTION 1

14 RAISE_VARARGS 1

62 >> 16 LOAD_FAST 3 (hi)

18 LOAD_CONST 3 (None)

20 COMPARE_OP 8 (is)

22 POP_JUMP_IF_FALSE 32

63 24 LOAD_GLOBAL 1 (len)

26 LOAD_FAST 0 (a)

28 CALL_FUNCTION 1

30 STORE_FAST 3 (hi)

64 >> 32 LOAD_FAST 2 (lo)

34 LOAD_FAST 3 (hi)

36 COMPARE_OP 0 (

38 POP_JUMP_IF_FALSE 80

65 40 LOAD_FAST 2 (lo)

42 LOAD_FAST 3 (hi)

44 BINARY_ADD

46 LOAD_CONST 4 (2)

48 BINARY_FLOOR_DIVIDE

50 STORE_FAST 4 (mid)

66 52 LOAD_FAST 0 (a)

54 LOAD_FAST 4 (mid)

56 BINARY_SUBSCR

58 LOAD_FAST 1 (x)

60 COMPARE_OP 0 (

62 POP_JUMP_IF_FALSE 74

64 LOAD_FAST 4 (mid)

66 LOAD_CONST 5 (1)

68 BINARY_ADD

70 STORE_FAST 2 (lo)

72 JUMP_ABSOLUTE 32

67 >> 74 LOAD_FAST 4 (mid)

76 STORE_FAST 3 (hi)

78 JUMP_ABSOLUTE 32

68 >> 80 LOAD_FAST 2 (lo)

82 RETURN_VALUE(

Note that this is separates out the top level code object, defining the module, and the code objects of functions and classes. In Python 3.6 and older the dis.dis() function won't recurse. In those versions, if you wanted to analyse the functions contained, you'll need to load the nested code objects from the top-level code.co_consts array. For example, the insort_right function's code object is loaded with LOAD_CONST 3, so you look for the code object at that index:

>>> code.co_consts[3]

>>> dis.dis(code.co_consts[3])

12 0 LOAD_GLOBAL 0 (bisect_right)

2 LOAD_FAST 0 (a)

4 LOAD_FAST 1 (x)

6 LOAD_FAST 2 (lo)

8 LOAD_FAST 3 (hi)

10 CALL_FUNCTION 4

12 STORE_FAST 2 (lo)

13 14 LOAD_FAST 0 (a)

16 LOAD_METHOD 1 (insert)

18 LOAD_FAST 2 (lo)

20 LOAD_FAST 1 (x)

22 CALL_METHOD 2

24 POP_TOP

26 LOAD_CONST 1 (None)

28 RETURN_VALUE

I personally would avoid trying to parse the .pyc file with anything other than the matching Python version and marshal module. The marshal format is basically an internal serialisation format that changes with the needs of Python itself. New features like list comprehensions and with statements and async/await require new additions to the format, which is not published other than as C source code.

If you do go this route, and manage to read a code object by other means than using the module, you'll have to parse out the disassembly from the various attributes of the code object; see the dis module source for details on how to do this (you'll have to use the co_firstlineno and co_lnotab attributes to create a bytecode-offset-to-linenumber map, for example).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值