I have a .pyc file. I need to understand the content of that file to know how the disassembler works of python, i.e. how can I generate a output like dis.dis(function) from .pyc file content.
for e.g.
>>> def sqr(x):
... return x*x
...
>>> import dis
>>> dis.dis(sqr)
2 0 LOAD_FAST 0 (x)
3 LOAD_FAST 0 (x)
6 BINARY_MULTIPLY
7 RETURN_VALUE
I need to get a output like this using the .pyc file.
解决方案
.pyc files contain some metadata and a marshaled code object; to load the code object and disassemble that use:
import dis, marshal, sys
header_sizes = [
# (size, first version this applies to)
# pyc files were introduced in 0.9.2 way, way back in June 1991.
(8, (0, 9, 2)), # 2 bytes magic number, \r\n, 4 bytes UNIX timestamp
(12, (3, 6)), # added 4 bytes file size
# bytes 4-8 are flags, meaning of 9-16 depends on what flags are set
# bit 0 not set: 9-12 timestamp, 13-16 file size
# bit 0 set: 9-16 file hash (SipHash-2-4, k0 = 4 bytes of the file, k1 = 0)
(16, (3, 7)), # inserted 4 bytes bit flag field at 4-8
# future version may add more bytes still, at which point we can extend
# this table. It is correct for Python versions up to 3.9
]
header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)
with open(pycfile, "rb") as f:
metadata = f.read(header_size) # first header_size bytes are metadata
code = marshal.load(f) # rest is a marshalled code object
dis.dis(code)
Demo with the bisect module:
>>> import bisect
>>> import dis, marshal
>>> import sys
>>> header_sizes = [(8, (0, 9, 2)), (12, (3, 6)), (16, (3, 7))]
>>> header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)
>>> pycfile = getattr(bisect, '__cached__', pycfile.__file__)
>>> with open(pycfile, "rb") as f:
... metadata = f.read(header_size) # first header_size bytes are metadata
... code = marshal.load(f) # rest is bytecode
...
>>> dis.dis(code)
1 0 LOAD_CONST 0 ('Bisection algorithms.')
2 STORE_NAME 0 (__doc__)
3 4 LOAD_CONST 12 ((0, None))
6 LOAD_CONST 3 ()
8 LOAD_CONST 4 ('insort_right')
10 MAKE_FUNCTION 1 (defaults)
12 STORE_NAME 1 (insort_right)
15 14 LOAD_CONST 13 ((0, None))
16 LOAD_CONST 5 ()
18 LOAD_CONST 6 ('bisect_right')
20 MAKE_FUNCTION 1 (defaults)
22 STORE_NAME 2 (bisect_right)
36 24 LOAD_CONST 14 ((0, None))
26 LOAD_CONST 7 ()
28 LOAD_CONST 8 ('insort_left')
30 MAKE_FUNCTION 1 (defaults)
32 STORE_NAME 3 (insort_left)
49 34 LOAD_CONST 15 ((0, None))
36 LOAD_CONST 9 ()
38 LOAD_CONST 10 ('bisect_left')
40 MAKE_FUNCTION 1 (defaults)
42 STORE_NAME 4 (bisect_left)
71 44 SETUP_FINALLY 12 (to 58)
72 46 LOAD_CONST 1 (0)
48 LOAD_CONST 11 (('*',))
50 IMPORT_NAME 5 (_bisect)
52 IMPORT_STAR
54 POP_BLOCK
56 JUMP_FORWARD 20 (to 78)
73 >> 58 DUP_TOP
60 LOAD_NAME 6 (ImportError)
62 COMPARE_OP 10 (exception match)
64 POP_JUMP_IF_FALSE 76
66 POP_TOP
68 POP_TOP
70 POP_TOP
74 72 POP_EXCEPT
74 JUMP_FORWARD 2 (to 78)
>> 76 END_FINALLY
77 >> 78 LOAD_NAME 2 (bisect_right)
80 STORE_NAME 7 (bisect)
78 82 LOAD_NAME 1 (insort_right)
84 STORE_NAME 8 (insort)
86 LOAD_CONST 2 (None)
88 RETURN_VALUE
Disassembly of :
12 0 LOAD_GLOBAL 0 (bisect_right)
2 LOAD_FAST 0 (a)
4 LOAD_FAST 1 (x)
6 LOAD_FAST 2 (lo)
8 LOAD_FAST 3 (hi)
10 CALL_FUNCTION 4
12 STORE_FAST 2 (lo)
13 14 LOAD_FAST 0 (a)
16 LOAD_METHOD 1 (insert)
18 LOAD_FAST 2 (lo)
20 LOAD_FAST 1 (x)
22 CALL_METHOD 2
24 POP_TOP
26 LOAD_CONST 1 (None)
28 RETURN_VALUE
Disassembly of :
26 0 LOAD_FAST 2 (lo)
2 LOAD_CONST 1 (0)
4 COMPARE_OP 0 (
6 POP_JUMP_IF_FALSE 16
27 8 LOAD_GLOBAL 0 (ValueError)
10 LOAD_CONST 2 ('lo must be non-negative')
12 CALL_FUNCTION 1
14 RAISE_VARARGS 1
28 >> 16 LOAD_FAST 3 (hi)
18 LOAD_CONST 3 (None)
20 COMPARE_OP 8 (is)
22 POP_JUMP_IF_FALSE 32
29 24 LOAD_GLOBAL 1 (len)
26 LOAD_FAST 0 (a)
28 CALL_FUNCTION 1
30 STORE_FAST 3 (hi)
30 >> 32 LOAD_FAST 2 (lo)
34 LOAD_FAST 3 (hi)
36 COMPARE_OP 0 (
38 POP_JUMP_IF_FALSE 80
31 40 LOAD_FAST 2 (lo)
42 LOAD_FAST 3 (hi)
44 BINARY_ADD
46 LOAD_CONST 4 (2)
48 BINARY_FLOOR_DIVIDE
50 STORE_FAST 4 (mid)
32 52 LOAD_FAST 1 (x)
54 LOAD_FAST 0 (a)
56 LOAD_FAST 4 (mid)
58 BINARY_SUBSCR
60 COMPARE_OP 0 (
62 POP_JUMP_IF_FALSE 70
64 LOAD_FAST 4 (mid)
66 STORE_FAST 3 (hi)
68 JUMP_ABSOLUTE 32
33 >> 70 LOAD_FAST 4 (mid)
72 LOAD_CONST 5 (1)
74 BINARY_ADD
76 STORE_FAST 2 (lo)
78 JUMP_ABSOLUTE 32
34 >> 80 LOAD_FAST 2 (lo)
82 RETURN_VALUE
Disassembly of :
45 0 LOAD_GLOBAL 0 (bisect_left)
2 LOAD_FAST 0 (a)
4 LOAD_FAST 1 (x)
6 LOAD_FAST 2 (lo)
8 LOAD_FAST 3 (hi)
10 CALL_FUNCTION 4
12 STORE_FAST 2 (lo)
46 14 LOAD_FAST 0 (a)
16 LOAD_METHOD 1 (insert)
18 LOAD_FAST 2 (lo)
20 LOAD_FAST 1 (x)
22 CALL_METHOD 2
24 POP_TOP
26 LOAD_CONST 1 (None)
28 RETURN_VALUE
Disassembly of :
60 0 LOAD_FAST 2 (lo)
2 LOAD_CONST 1 (0)
4 COMPARE_OP 0 (
6 POP_JUMP_IF_FALSE 16
61 8 LOAD_GLOBAL 0 (ValueError)
10 LOAD_CONST 2 ('lo must be non-negative')
12 CALL_FUNCTION 1
14 RAISE_VARARGS 1
62 >> 16 LOAD_FAST 3 (hi)
18 LOAD_CONST 3 (None)
20 COMPARE_OP 8 (is)
22 POP_JUMP_IF_FALSE 32
63 24 LOAD_GLOBAL 1 (len)
26 LOAD_FAST 0 (a)
28 CALL_FUNCTION 1
30 STORE_FAST 3 (hi)
64 >> 32 LOAD_FAST 2 (lo)
34 LOAD_FAST 3 (hi)
36 COMPARE_OP 0 (
38 POP_JUMP_IF_FALSE 80
65 40 LOAD_FAST 2 (lo)
42 LOAD_FAST 3 (hi)
44 BINARY_ADD
46 LOAD_CONST 4 (2)
48 BINARY_FLOOR_DIVIDE
50 STORE_FAST 4 (mid)
66 52 LOAD_FAST 0 (a)
54 LOAD_FAST 4 (mid)
56 BINARY_SUBSCR
58 LOAD_FAST 1 (x)
60 COMPARE_OP 0 (
62 POP_JUMP_IF_FALSE 74
64 LOAD_FAST 4 (mid)
66 LOAD_CONST 5 (1)
68 BINARY_ADD
70 STORE_FAST 2 (lo)
72 JUMP_ABSOLUTE 32
67 >> 74 LOAD_FAST 4 (mid)
76 STORE_FAST 3 (hi)
78 JUMP_ABSOLUTE 32
68 >> 80 LOAD_FAST 2 (lo)
82 RETURN_VALUE(
Note that this is separates out the top level code object, defining the module, and the code objects of functions and classes. In Python 3.6 and older the dis.dis() function won't recurse. In those versions, if you wanted to analyse the functions contained, you'll need to load the nested code objects from the top-level code.co_consts array. For example, the insort_right function's code object is loaded with LOAD_CONST 3, so you look for the code object at that index:
>>> code.co_consts[3]
>>> dis.dis(code.co_consts[3])
12 0 LOAD_GLOBAL 0 (bisect_right)
2 LOAD_FAST 0 (a)
4 LOAD_FAST 1 (x)
6 LOAD_FAST 2 (lo)
8 LOAD_FAST 3 (hi)
10 CALL_FUNCTION 4
12 STORE_FAST 2 (lo)
13 14 LOAD_FAST 0 (a)
16 LOAD_METHOD 1 (insert)
18 LOAD_FAST 2 (lo)
20 LOAD_FAST 1 (x)
22 CALL_METHOD 2
24 POP_TOP
26 LOAD_CONST 1 (None)
28 RETURN_VALUE
I personally would avoid trying to parse the .pyc file with anything other than the matching Python version and marshal module. The marshal format is basically an internal serialisation format that changes with the needs of Python itself. New features like list comprehensions and with statements and async/await require new additions to the format, which is not published other than as C source code.
If you do go this route, and manage to read a code object by other means than using the module, you'll have to parse out the disassembly from the various attributes of the code object; see the dis module source for details on how to do this (you'll have to use the co_firstlineno and co_lnotab attributes to create a bytecode-offset-to-linenumber map, for example).