pdf文件拆分为单个pdf_如何根据文件内容拆分重命名移动一批PDF文件

本文介绍了如何使用Python、Linux工具和PDFtk将一个包含多个文档的PDF文件拆分为单独的PDF文件,并根据内容重命名。文章详细说明了安装和配置所需软件的过程,以及如何编写脚本来识别页面上的特定标识符,从而实现自动化拆分和重命名。此解决方案适用于需要基于文件内容管理大量PDF文件的情况。
摘要由CSDN通过智能技术生成

pdf文件拆分为单个pdf

Article Update 13-March-2020: I removed the full source code and the code snippets. The article that remains should act as a "design roadmap" for members who want to write the code in the programming language of your choice. If you are interested in discussing the program further, please contact me via the EE message system. 文章更新2020年3月13日:我删除了完整的源代码和代码片段。 对于希望用您选择的编程语言编写代码的成员,剩下的文章应作为“设计路线图”。 如果您有兴趣进一步讨论该程序,请通过EE消息系统与我联系。 INTRODUCTION 介绍

This Article is a follow-up to the Article entitled How To Rename-Move a Batch of PDF Files Based on Contents of the Files, recently published here at Experts Exchange.

本文是最近在Experts Exchange上发布的标题为“ 如何基于文件内容重命名移动一批PDF文件的文章”的后续文章。

I considered adding the new feature (splitting a single document into multiple documents) to that Article and program, but concluded that it is a significant enough enhancement to warrant a new Article and program.

我考虑过在该条款和程序中添加新功能(将一个文档拆分为多个文档),但是得出的结论是,它是一项重要的增强功能,足以保证可以使用新的条款和程序。

PREVIOUS ARTICLE 上一条

To understand this Article, it will be helpful to read the previous Article, but to get things going here right away, here's a summary of the previous problem and solution.

要理解本文, 阅读上一篇文章会有所帮助,但是为了让事情马上开始,这里是上一个问题和解决方案的摘要。

There is a large batch of PDF files, all with cryptic names, such as [D123456.PDF]. Inside each file on the first line of the first page (always starting at a fixed column and running to the end of the line) is a human-friendly identifier for the file, such as [John Smith]. The requirement is to loop through all of the files in a specified folder in an automated fashion, changing the file names from, for example,

有大量PDF文件,所有文件都带有隐名,例如[D123456.PDF]。 在第一页第一行(始终从固定列开始到行尾)的每个文件中都有一个易于识别的文件标识符,例如[John Smith]。 要求是自动循环遍历指定文件夹中的所有文件,并更改文件名,例如,

D123456.PDF

D123456.PDF

to

D123456 John Smith.PDF

D123456 John Smith.PDF

That is, add the identifier from the first line of the first page to the file name.

也就是说,将标识符从首页的第一行添加到文件名。

NEW REQUIREMENT 新要求

Following publication of the previous Article and the program that implements the solution, the Original Poster (OP) of the question that prompted the Article asked if an enhancement is possible. Specifically, a single PDF file may be composed of what are really multiple PDF files, and the OP wants the program to split the single PDF into multiple PDFs. For example, pages 1 to 3 of [D123456.PDF] may be an invoice for John Smith, while page 4 may be a different invoice, and pages 5 to 6 yet another invoice. With the previous program, the 6-page [D123456.PDF] would simply be renamed to [D123456 John Smith.PDF], still containing all six pages (three invoices). The OP wants the program to split the original PDF file and create three PDFs, one for each of the invoices. The program still has to rename the files based on content, but, in addition, has to provide a suffix for the multiple files, such as

在上一篇文章和实现该解决方案的程序发布之后,提示该文章的问题的原始张贴者(OP)询问是否可以进行增强。 具体来说,单个PDF文件可能由实际上是多个PDF文件组成,并且OP希望程序将单个PDF拆分为多个PDF。 例如,[D123456.PDF]的第1至3页可能是John Smith的发票,而第4页可能是其他发票,而第5至6页则是另一张

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值