LLM4Decompile 开源项目教程

最新推荐文章于 2024-08-12 08:40:48 发布

邹岩讳Sally

最新推荐文章于 2024-08-12 08:40:48 发布

阅读量417

点赞数 4

本文链接：https://blog.csdn.net/gitblog_00776/article/details/141117905

版权

LLM4Decompile 开源项目教程

LLM4Decompile项目地址:https://gitcode.com/gh_mirrors/ll/LLM4Decompile

项目介绍

LLM4Decompile 是一个利用大型语言模型（LLMs）进行二进制代码反编译的开源项目。该项目旨在将二进制代码转换为高级源代码，通过优化 LLM 训练过程，引入新的训练方法和模型架构，显著提高了反编译代码的可读性和可执行性。LLM4Decompile 包括多个模型，参数大小从 1.3 亿到 33 亿不等，这些模型在 HumanEval 和 ExeBench 基准测试中显著超越了 GPT-4o 和 Ghidra。

项目快速启动

环境准备

首先，克隆项目仓库并设置开发环境：

git clone https://github.com/albertan017/LLM4Decompile.git
cd LLM4Decompile
conda create -n llm4decompile python=3.9 -y
conda activate llm4decompile
pip install -r requirements.txt

使用示例

以下是一个简单的使用示例，展示如何将 C 代码编译为二进制并反编译为汇编指令：

import subprocess
import os

OPT = ["O0", "O1", "O2", "O3"]
fileName = 'samples/sample'  # 文件路径

for opt_state in OPT:
    output_file = fileName + '_' + opt_state
    input_file = fileName + '.c'
    compile_command = f'gcc -o {output_file} {input_file} -{opt_state} -lm'  # 编译命令
    subprocess.run(compile_command, shell=True)