PolyglotPDF 使用教程

舒莲菲Peace

于 2025-04-01 10:36:13 发布

阅读量888

点赞数 14

本文链接：https://blog.csdn.net/gitblog_00770/article/details/146903023

版权

PolyglotPDF 使用教程

PolyglotPDF (PDF translation)Multilingual PDF processing tool, supports online and offline translation while maintaining original layout; performs OCR on scanned PDFs, faster than ocrmypdf. Provides a Web UI for comparing original PDFs, includes chat with PDF functionality, and academic PDF search based on the Semantic Scholar API. 项目地址: https://gitcode.com/gh_mirrors/po/PolyglotPDF

1. 项目介绍

PolyglotPDF 是一个先进的 PDF 处理工具，它采用专业技术实现 PDF 文档中的文本、表格和公式的超快速识别，通常在 1 秒内完成处理。它具有 OCR 功能，并且能够保留原始文档格式的翻译，全文档翻译通常在 10 秒内完成（速度可能因翻译 API 提供商而异）。

2. 项目快速启动

安装

有多种方式可以使用 PolyglotPDF。一种是通过安装库：

pip install EbookTranslator

基本使用

EbookTranslator your_file.pdf

带参数的使用

EbookTranslator your_file.pdf -o en -t zh -b 1 -e 10 -c /path/to/config.json -d 300

在 Python 代码中使用

from EbookTranslator import main_function

translator = main_function(
    pdf_path="your_file.pdf",
    original_language="en",
    target_language="zh",
    bn=1,
    en=10,
    config_path="/path/to/config.json",
    DPI=300
)
translator.main()

3. 应用案例和最佳实践

案例一：将英文 PDF 文档翻译成中文，并保留原文格式。
最佳实践：在配置文件中预先设置好翻译 API 的密钥和模型名称，以便快速调用。

4. 典型生态项目

OCR 识别：使用 Tesseract 进行 OCR 识别，提高文本识别的准确性。
翻译服务：集成了 Doubao、Deepseek、Qwen、OpenAI 等多种翻译 API，提供灵活的翻译服务选择。

以上是 PolyglotPDF 的基本使用教程，希望对您有所帮助。

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考