pyinstxtractor.py 的改进 - 反编译pyinstaller生成exe的工具

编写历程

使用网上的pyinstxtractor.py提取PyInstaller生成的exe文件, 发现无法用uncompyle6反编译提取的pyc文件, 报错。
对比原先的pyc文件, 和提取的文件, 发现:
使用notepad++对比pyc文件
提取的文件内容是一样的, 但文件头和原先的pyc文件不一样。(注意: 上图中数据e3是pyc文件内容部分的开始, 其前面是文件头)
提取的文件
然后, 对比PYZ_00…pyz_extracted文件夹里的文件, 也发现文件头不一样,
说明网上的pyinstxtractor.py有bug
仔细分析后, 重写pyinstxtractor.py, 代码如下:
(注意修改的部分, 和注释)

源代码
# coding:utf-8
# 改编自网上的pyinstxtractor.py
r"""
PyInstaller Extractor v2.1 (Supports pyinstaller 3.3+, 3.2, 3.1, 3.0, 2.1, 2.0)
Author : Extreme Coders
E-mail : extremecoders(at)hotmail(dot)com
Web    : https://0xec.blogspot.com
Date   : 29-November-2017
Url    : https://sourceforge.net/projects/pyinstallerextractor/

For any suggestions, leave a comment on
https://forum.tuts4you.com/topic/34455-pyinstaller-extractor/

This script extracts a pyinstaller generated executable file.
Pyinstaller installation is not needed. The script has it all.

For best results, it is recommended to run this script in the
same version of python as was used to create the executable.
This is just to prevent unmarshalling errors(if any) while
extracting the PYZ archive.

Usage : Just copy this script to the directory where your exe resides
        and run the script with the exe file name as a parameter

C:\path\to\exe\>python pyinstxtractor.py <filename>
$ /path/to/exe/python pyinstxtractor.py <filename>

Licensed under GNU General Public License (GPL) v3.
You are free to modify this source.

CHANGELOG
================================================

Version 1.1 (Jan 28, 2014)
-------------------------------------------------
- First Release
- Supports only pyinstaller 2.0

Version 1.2 (Sept 12, 2015)
-------------------------------------------------
- Added support for pyinstaller 2.1 and 3.0 dev
- Cleaned up code
- Script is now more verbose
- Executable extracted within a dedicated sub-directory

(Support for pyinstaller 3.0 dev is experimental)

Version 1.3 (Dec 12, 2015)
-------------------------------------------------
- Added support for pyinstaller 3.0 final
- Script is compatible with both python 2.x & 3.x (Thanks to Moritz Kroll @ Avira Operations GmbH & Co. KG)

Version 1.4 (Jan 19, 2016)
-------------------------------------------------
- Fixed a bug when writing pyc files >= version 3.3 (Thanks to Daniello Alto: https://github.com/Djamana)

Version 1.5 (March 1, 2016)
-------------------------------------------------
- Added support for pyinstaller 3.1 (Thanks to Berwyn Hoyt for reporting)

Version 1.6 (Sept 5, 2016)
-------------------------------------------------
- Added support for pyinstaller 3.2
- Extractor will use a random name while extracting unnamed files.
- For encrypted pyz archives it will dump the contents as is. Previously, the tool would fail.

Version 1.7 (March 13, 2017)
-------------------------------------------------
- Made the script compatible with python 2.6 (Thanks to Ross for reporting)

Version 1.8 (April 28, 2017)
-------------------------------------------------
- Support for sub-directories in .pyz files (Thanks to Moritz Kroll @ Avira Operations GmbH & Co. KG)

Version 1.9 (November 29, 2017)
-------------------------------------------------
- Added support for pyinstaller 3.3
- Display the scripts which are run at entry (Thanks to Michael Gillespie @ malwarehunterteam for the feature request)

***** 版本 2.0 (2020-12-13) *****
- 修复了提取pyc文件的bug。
***** 版本 2.1.1 (2021-2-23) *****
- 修复了从PYZ中提取pyc文件的bug, 兼容几乎所有Python3版本; 可直接提取pyz文件。
***** 版本 2.2 (2022-7-25) *****
- 兼容Python 3.10。
"""

from __future__ import print_function
import os
import struct
import marshal
import zlib
import sys
import imp
import types
from uuid import uuid4 as uniquename
# 新加入的代码
try:
    from xdis.magics import magics
except ImportError:print("错误: 需使用pip安装xdis模块。")

__version__='2.2'

class CTOCEntry:
    def __init__(self, position, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name):
        self.position = position
        self.cmprsdDataSize = cmprsdDataSize
        self.uncmprsdDataSize = uncmprsdDataSize
        self.cmprsFlag = cmprsFlag
        self.typeCmprsData = typeCmprsData
        self.name = name


class PyInstArchive:
    PYINST20_COOKIE_SIZE = 24           # For pyinstaller 2.0
    PYINST21_COOKIE_SIZE = 24 + 64      # For pyinstaller 2.1+
    MAGIC = b'MEI\014\013\012\013\016'  # Magic number which identifies pyinstaller

    def __init__(self, path):
        self.filePath = path


    def open(self):
        try:
            self.fPtr = open(self.filePath, 'rb')
            self.fileSize = os.stat(self.filePath).st_size
        except:
            print('[*] Error: Could not open {0}'.format(self.filePath))
            return False
        return True


    def close(self):
        try:
            self.fPtr.close()
        except:
            pass


    def checkFile(self):
        print('[*] Processing {0}'.format(self.filePath))
        # Check if it is a 2.0 archive
        self.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET)
        magicFromFile = self.fPtr.read(len(self.MAGIC))

        if magicFromFile == self.MAGIC:
            self.pyinstVer = 20     # pyinstaller 2.0
            print('[*] Pyinstaller version: 2.0')
            return True

        # Check for pyinstaller 2.1+ before bailing out
        self.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET)
        magicFromFile = self.fPtr.read(len(self.MAGIC))

        if magicFromFile == self.MAGIC:
            print('[*] Pyinstaller version: 2.1+')
            self.pyinstVer = 21     # pyinstaller 2.1+
            return True

        print('[*] Error : Unsupported pyinstaller version or not a pyinstaller archive')
        return False


    def getCArchiveInfo(self):
        try:
            if self.pyinstVer == 20:
                self.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET)

                # Read CArchive cookie
                (magic, lengthofPackage, toc, tocLen, self.pyver) = \
                struct.unpack('!8siiii', self.fPtr.read(self.PYINST20_COOKIE_SIZE))

            elif self.pyinstVer == 21:
                self.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET)

                # Read CArchive cookie
                (magic, lengthofPackage, toc, tocLen, self.pyver, pylibname) = \
                struct.unpack('!8siiii64s', self.fPtr.read(self.PYINST21_COOKIE_SIZE))

        except:
            print('[*] Error : The file is not a pyinstaller archive')
            return False

        print('[*] Python version: {0}'.format(self.pyver))

        # Overlay is the data appended at the end of the PE
        self.overlaySize = lengthofPackage
        self.overlayPos = self.fileSize - self.overlaySize
        self.tableOfContentsPos = self.overlayPos + toc
        self.tableOfContentsSize = tocLen

        print('[*] Length of package: {0} bytes'.format(self.overlaySize))
        return True


    def parseTOC(self):
        # Go to the table of contents
        self.fPtr.seek(self.tableOfContentsPos, os.SEEK_SET)

        self.tocList = []
        parsedLen = 0

        # Parse table of contents
        while parsedLen < self.tableOfContentsSize:
            (entrySize, ) = struct.unpack('!i', self.fPtr.read(4))
            nameLen = struct.calcsize('!iiiiBc')

            (entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name) = \
            struct.unpack( \
                '!iiiBc{0}s'.format(entrySize - nameLen), \
                self.fPtr.read(entrySize - 4))

            name = name.decode('utf-8').rstrip('\0')
            if len(name) == 0:
                name = str(uniquename())
                print('[!] Warning: Found an unamed file in CArchive. Using random name {0}'.format(name))

            self.tocList.append( \
                                CTOCEntry(                      \
                                    self.overlayPos + entryPos, \
                                    cmprsdDataSize,             \
                                    uncmprsdDataSize,           \
                                    cmprsFlag,                  \
                                    typeCmprsData,              \
                                    name                        \
                                ))

            parsedLen += entrySize
        print('[*] Found {0} files in CArchive'.format(len(self.tocList)))



    def extractFiles(self):
        print('[*] Beginning extraction...please standby')
        extractionDir = os.path.join(os.getcwd(), os.path.basename(self.filePath) + '_extracted')

        if not os.path.exists(extractionDir):
            os.mkdir(extractionDir)

        os.chdir(extractionDir)
        # 新加入的代码:加入pyc文件的magic部分
        pyverstr=str(self.pyver)
        if len(pyverstr)==2:
            magic=magics["%s.%s"%(pyverstr[0],pyverstr[1:])]
        else:
            magic=magics["%s.%s"%(pyverstr[0],pyverstr[2:])] # 兼容Python 3.10及以上
        if self.pyver>=37: # 2.2.1版改进
            pycheader=magic+b'\x00'*12 # 文件头
        else:
            pycheader=magic+b'\x00'*8 # 文件头

        for entry in self.tocList:
            basePath = os.path.dirname(entry.name)
            if basePath != '':
                # Check if path exists, create if not
                if not os.path.exists(basePath):
                    os.makedirs(basePath)

            self.fPtr.seek(entry.position, os.SEEK_SET)
            data = self.fPtr.read(entry.cmprsdDataSize)

            if entry.cmprsFlag == 1:
                data = zlib.decompress(data)
                # Malware may tamper with the uncompressed size
                # Comment out the assertion in such a case
                assert len(data) == entry.uncmprsdDataSize # Sanity Check

            f=open(entry.name, 'wb')
            if entry.typeCmprsData == b's':
                print('[+] Possible entry point: {0}'.format(entry.name))
                f.write(pycheader+data)
                f.close()
            elif entry.typeCmprsData == b'z' or entry.typeCmprsData == b'Z':
                f.write(data)
                f.close()
                self._extractPyz(entry.name)
    # 2.1版加入的代码
    def _checkPyz(self,name):
        with open(name, 'rb') as f:
            pyzMagic = f.read(4)
            return pyzMagic == b'PYZ\0' # Sanity Check

    def _extractPyz(self, name):
        dirName =  name + '_extracted'
        # Create a directory for the contents of the pyz
        if not os.path.exists(dirName):
            os.mkdir(dirName)

        with open(name, 'rb') as f:
            pyzMagic = f.read(4)
            assert pyzMagic == b'PYZ\0' # Sanity Check

            pycHeader = f.read(4) # Python magic value

            if imp.get_magic() != pycHeader:
                print('[!] Warning: The script is running in a different python version than the one used to build the executable')
                print('    Run this script in Python{0} to prevent extraction errors(if any) during unmarshalling'.format(self.pyver))

            (tocPosition, ) = struct.unpack('!i', f.read(4))
            f.seek(tocPosition, os.SEEK_SET)

            try:
                toc = marshal.load(f)
            except:
                print('[!] Unmarshalling FAILED. Cannot extract {0}. Extracting remaining files.'.format(name))
                return

            print('[*] Found {0} files in PYZ archive'.format(len(toc)))

            # From pyinstaller 3.1+ toc is a list of tuples
            if type(toc) == list:
                toc = dict(toc)

            for key in toc.keys():
                (ispkg, pos, length) = toc[key]
                f.seek(pos, os.SEEK_SET)

                fileName = key
                try:
                    # for Python > 3.3 some keys are bytes object some are str object
                    fileName = key.decode('utf-8')
                except:
                    pass

                # Make sure destination directory exists, ensuring we keep inside dirName
                destName = os.path.join(dirName, fileName.replace("..", "__"))
                destDirName = os.path.dirname(destName)
                if not os.path.exists(destDirName):
                    os.makedirs(destDirName)

                try:
                    data = f.read(length)
                    data = zlib.decompress(data)
                except:
                    print('[!] Error: Failed to decompress {0}, probably encrypted. Extracting as is.'.format(fileName))
                    open(destName + '.pyc.encrypted', 'wb').write(data)
                    continue

                with open(destName + '.pyc', 'wb') as pycFile:
                    pycFile.write(pycHeader)      # Write pyc magic
                    pycFile.write(b'\0' * 4)      # Write timestamp

                    if self.pyver>=37: # 2.2.1版改进
                        # 原来的代码: b'\0' * 4
                        pycFile.write(b'\0' * 8)
                    elif self.pyver>=33:
                        pycFile.write(b'\0' * 4) # Size parameter added in Python 3.3
                    pycFile.write(data)


def main():
    if len(sys.argv) < 2:
        print('[*] Usage: pyinstxtractor.py <filename>')

    else:
        arch = PyInstArchive(sys.argv[1])
        if arch.open():
            if arch.checkFile():
                if arch.getCArchiveInfo():
                    arch.parseTOC()
                    arch.extractFiles()
                    arch.close()
                    print('[*] Successfully extracted pyinstaller archive: {0}'.format(sys.argv[1]))
                    print('')
                    print('''You can now use a python decompiler \
on the pyc files within the extracted directory''')
                    # 加入的代码
                    try:
                        import uncompyle6
                    except ImportError:
                        print("Warning: 你可能没有安装pyc反编译器")

                    return
            # 2.1版加入的代码
            elif arch._checkPyz(sys.argv[1]):
                arch.pyver=100 # 默认pyver
                arch._extractPyz(sys.argv[1])

            arch.close()


if __name__ == '__main__':
    main()

uncompyle6工具的使用

uncompyle6是反编译pyc文件的一个Python库。
在Windows中,按Win+R键,输入cmd,启动命令提示符。
先输入命令回车:pip install uncompyle6
然后输入命令:python -m uncompyle6 文件名.pyc,等待一段时间后,就能看到反编译的输出结果了。
另外,使用命令python -m uncompyle6 文件名.pyc > 输出文件名.py 可以将反编译的输出结果写入特定的py文件里。

如果不想做这些繁琐的步骤,作者自己编写了一个调用uncompyle6的脚本,在Windows中双击可以直接运行:

import sys,os,traceback
import uncompyle6.bin.uncompile as uncompiler
__version__='2.0.1'

def run_uncompile(filename):
    flag=False # 监测sys.stderr中有无警告或错误消息
    _w=sys.stderr.write
    def w(*arg,**kw):
        nonlocal flag
        flag=True
        _w(*arg,**kw)
    def start_check(): # 开始监测
        sys.stderr.write=w
    def end_check():  # 停止监测
        sys.stderr.write=_w

    tofilename=filename[:-1]
    if os.path.isfile(tofilename):
        result=input("文件%s已存在,要替换它吗? "%tofilename)
        if not result.lower().startswith('y'):return
    try:
        sys.stdout=open(tofilename,"w",encoding="utf-8")
        sys.argv[1]=filename
        start_check()
        uncompiler.main_bin()
    except Exception:
        end_check()
        print("文件%s反编译失败,错误消息详见%s"% (filename,tofilename)
              ,file=sys.stderr)
        #traceback.print_exc()
        traceback.print_exc(file=sys.stdout)
    else:
        end_check()
        if not flag:
            print("文件%s反编译成功"%filename,file=sys.stderr)
        else:
            print("文件%s反编译失败, 有警告或错误"%filename,file=sys.stderr)
            print("按Enter键继续...",end='',file=sys.stderr)
            input()
    finally:
        sys.stdout.close()

if __name__=="__main__":
    try:
        if len(sys.argv)>1:
            files=sys.argv[1:]
            sys.argv[0]=uncompiler.__file__
            sys.argv[1:]=['']
            for file in files:
                if not file.endswith(".pyc"):
                    print("警告: %s 可能不是pyc文件"%file,file=sys.stderr)
                run_uncompile(file)
        else:
            file=input("拖曳文件到本窗口,然后按回车 (或输入文件名):\n").strip('"')
            sys.argv[0]=uncompiler.__file__
            sys.argv.append('')
            run_uncompile(file)
    finally:
        sys.stdout=sys.__stdout__

结语
编写这个pyinstxtractor.py的目的, 是提取 - 木兰编程语言 …
功夫不负有心人, 我用uncompyle6工具成功提取了源代码(在这里:ulang - Gitcode)。
本文结束,以上是作者告诉后人的经验。

  • 15
    点赞
  • 29
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 12
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 12
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

qfcy_

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值