zerox - 使用视觉模型将 PDF 转换为 Markdown

最新推荐文章于 2025-04-08 09:24:32 发布

小众AI

最新推荐文章于 2025-04-08 09:24:32 发布

阅读量1.6k

点赞数 23

分类专栏： AI开源文章标签： pdf 人工智能 AI编程

本文链接：https://blog.csdn.net/puterkey/article/details/145116618

版权

7900 Stars 478 Forks 39 Issues 17 贡献者 MIT License Python 语言

代码: https://github.com/getomni-ai/zerox

主页: OmniAI. Automate document workflows

更多AI开源软件：AI开源 - 小众AI

zerox基于视觉模型 API 服务，提供了将 PDF 文档转化为 Markdown 的功能。其原理是先将原文件（如 pdf、docx）转换为图片，然后把图片发给视觉模型处理，最后汇总所有结果生成完整的 Markdown 文件。

主要功能

一种非常简单的 OCR 文档以进行 AI 摄取的方法。毕竟，文档应该是一种视觉表示。带有奇怪的布局、表格、图表等。视觉模型很有意义！

传入文件（pdf、docx、image 等）
将该文件转换为一系列图像
将每张图片传递给 GPT 并很好地请求 Markdown
聚合响应并返回 Markdown

Node Zerox安装和使用

npm install zerox

Zerox 使用和用于 pdf => 图像处理步骤。这些应该会自动拉取，但您可能需要手动安装。graphicsmagickghostscript

在 linux 上使用：

sudo apt-get update
sudo apt-get install -y graphicsmagick

Node 用法

**使用文件 URL**

import { zerox } from "zerox";

const result = await zerox({
  filePath: "https://omni-demo-data.s3.amazonaws.com/test/cs101.pdf",
  openaiAPIKey: process.env.OPENAI_API_KEY,
});

**从本地路径**

import path from "path";
import { zerox } from "zerox";

const result = await zerox({
  filePath: path.resolve(__dirname, "./cs101.pdf"),
  openaiAPIKey: process.env.OPENAI_API_KEY,
});

选项

const result = await zerox({
  // Required
  filePath: "path/to/file",
  openaiAPIKey: process.env.OPENAI_API_KEY,

  // Optional
  cleanup: true, // Clear images from tmp after run.
  concurrency: 10, // Number of pages to run at a time.
  correctOrientation: true, // True by default, attempts to identify and correct page orientation.
  errorMode: ErrorMode.IGNORE, // ErrorMode.THROW or ErrorMode.IGNORE, defaults to ErrorMode.IGNORE.
  maintainFormat: false, // Slower but helps maintain consistent formatting.
  maxRetries: 1, // Number of retries to attempt on a failed page, defaults to 1.
  maxTesseractWorkers: -1, // Maximum number of tesseract workers. Zerox will start with a lower number and only reach maxTesseractWorkers if needed.
  model: "gpt-4o-mini", // Model to use (gpt-4o-mini or gpt-4o).
  onPostProcess: async ({ page, progressSummary }) => Promise<void>, // Callback function to r

最低0.47元/天解锁文章