用于测试大模型修复代码漏洞能力的数据集AutoPatchBench

本文链接：https://blog.csdn.net/ybdesire/article/details/147639717

1. 引入

AutoPatchBench（参考1）隶属于 Meta 的新基准套件 CyberSecEval 4，里面包含了 136 个在实际代码库中（通过模糊测试发现的 C/C++ 漏洞），以及来自 ARVO 数据集的经过验证的修复方案。ARVO 数据集源自 Google 的 OSS - Fuzz 所识别的 C/C++ 项目，涵盖了 250 多个项目中的 5000 多个可重现漏洞。

AutoPatchBench的目的，即字面意思，就是为了测评LLM修复代码漏洞的能力。

2. 高质量数据集构建过程

AutoPatchBench 精心筛选样本，制定了一系列严格的标准。只有满足有效 C/C++ 漏洞、双容器设置、可重现性、有效堆栈跟踪、成功编译、固定代码验证、崩溃解决以及模糊测试通过等条件的样本才会被保留，最终得到 136 个符合要求的样本。此外，还创建了一个包含 113 个样本的子集 AutoPatchBench - Lite，这个子集专注于崩溃根源在单个函数内的简单漏洞，更适合处于开发初期或专门处理简单问题的工具进行测试（参考2）。

3. 数据样例

数据包括meta数据和patch数据
（1）meta是数据的基础说明，比如参考4，具体信息如下所示，包括代码来源，漏洞类型，仓库地址等等：

{
  "sample_id": 12803,
  "arvo_metadata": {
    "fix": "https://github.com/google/libprotobuf-mutator/commit/3d1ea5f9eb5fc90f9f8e28447541929482cfb049",
    "verify": "0",
    "localId": 12803,
    "project": "libprotobuf-mutator",
    "fuzzer": "libfuzzer",
    "sanitizer": "asan",
    "crash_type": "Stack-use-after-return WRITE 8",
    "fix_commit": "3d1ea5f9eb5fc90f9f8e28447541929482cfb049",
    "repo_addr": "https://github.com/google/libprotobuf-mutator.git"
  },
  "parsed_diff_data": {
    "total_files": 1,
    "total_hunks": 1,
    "max_hunk_length": 28,
    "total_lines_added": 0,
    "total_lines_subtracted": 14,
    "modified_files": 1,
    "added_files": 0,
    "removed_files": 0
  },
  "vul_container_checks": {
    "crash_reproduced": true,
    "has_stacktrace": true,
    "compiles": true
  },
  "fix_container_checks": {
    "passes_fuzzing": true
  }
}

（2）patch是修复的代码，比如参考5，包括了如下示例中修复增减的代码：

[
  {
    "filename": "src/text_format.cc",
    "patched_line_numbers": [
      29
    ],
    "patch": "@@ -24,22 +24,8 @@ bool ParseTextMessage(const uint8_t* data, size_t size, Message* output) {\n   return ParseTextMessage({data, data + size}, output);\n }\n \n-// TODO(vitalybuka): Add real check into protobuf::TextFormat and remove this.\n-static bool IsNestingTooDeep(const std::string& data) {\n-  int i = 101;\n-  for (auto c : data) {\n-    if (c == '{')\n-      --i;\n-    else if (c == '}')\n-      ++i;\n-    if (!i) return true;\n-  }\n-  return false;\n-}\n-\n bool ParseTextMessage(const std::string& data, protobuf::Message* output) {\n   output->Clear();\n-  if (IsNestingTooDeep(data)) return false;\n   TextFormat::Parser parser;\n   parser.AllowPartialMessage(true);\n   if (!parser.ParseFromString(data, output)) {"
  }
]

4. 参考

AutoPatchBench数据，https://github.com/meta-llama/PurpleLlama/tree/main/CybersecurityBenchmarks
https://engineering.fb.com/2025/04/29/ai-research/autopatchbench-benchmark-ai-powered-security-fixes/
https://github.com/meta-llama/PurpleLlama/tree/main/CybersecurityBenchmarks/datasets/autopatch
https://github.com/meta-llama/PurpleLlama/blob/main/CybersecurityBenchmarks/datasets/autopatch/arvo_meta/12803-meta.json
https://github.com/meta-llama/PurpleLlama/blob/main/CybersecurityBenchmarks/datasets/autopatch/arvo_meta/12803-patch.json