原标题:Python字节码解混淆
前言
上次打NISCCTF2019留下来的一道题,关于pyc文件逆向,接着这道题把Python Bytecode解混淆相关的知识和工具全部过一遍。同时在已有的基础上进一步创新得到自己的成果,这是上篇,做基础铺垫和已有工具梳理
pyc文件结构
首先pyc文件是python源码进行编译之后得到的字节码文件。虽然Python是解释型语言,但并非直接解释源码,而是先编译到字节码然后解释执行字节码。Python2和Python3字节码有区别不通用,同时目前以及可遇见范围内的事实标准都是CPython实现。
Pyc文件由3部分组成:
最开始4个字节是标识此pyc的版本的Magic Number, 具体对应关系在Python/import.c内定义。
/*Magic word to reject .pyc files generated by other Python versions.Itshould change for each incompatible change to the bytecode.Thevalue of CR and LF is incorporated so if you ever read or writea.pyc file in text mode the magic number will be wrong; also, theAppleMPW compiler swaps their values, botching string constants.
Themagic numbers must be spaced apart atleast 2 values, as the-Uinterpeter flag will cause MAGIC+1 being used. They have beenoddnumbers for some time now.
Therewere a variety of old schemes for setting the magic number.Thecurrent working scheme is to increment the previous value by10.
Knownvalues:Python1.5: 20121Python1.5.1: 20121Python1.5.2: 20121Python1.6: 50428Python2.0: 50823Python2.0.1: 50823Python2.1: 60202Python2.1.1: 60202Python2.1.2: 60202Python2.2: 60717Python2.3a0: 62011Python2.3a0: 62021Python2.3a0: 62011 (!)Python2.4a0: 62041Python2.4a3: 62051Python2.4b1: 62061Python2.5a0: 62071Python2.5a0: 62081 (ast-branch)Python2.5a0: 62091 (with)Python2.5a0: 62092 (changed WITH_CLEANUP opcode)Python2.5b3: 62101 (fix wrong code: for x, in ...)Python2.5b3: 62111 (fix wrong code: x += yield)Python2.5c1: 62121 (fix wrong lnotab with for loops andstoringconstants that should have been removed)Python2.5c2: 62131 (fix wrong code: for x, in ... in listcomp/genexp)Python2.6a0: 62151 (peephole optimizations and STORE_MAP opcode)Python2.6a1: 62161 (WITH_CLEANUP optimization)Python2.7a0: 62171 (optimize list comprehensions/change LIST_APPEND)Python2.7a0: 62181 (optimize conditional branches:introducePOP_JUMP_IF_FALSE and POP_JUMP_IF_TRUE)Python2.7a0 62191 (introduce SETUP_WITH)Python2.7a0 62201