Python之利用PyPDF2库实现对PDF的删除和合并

概述

PyPDF2是Python中用于对PDF操作的第三方库,提供了删除、合并、裁剪、转换等操作
最主要有四个类:
The PdfFileReader Class
The PdfFileMerger Class
The PageObject Class
The PdfFileWriter Class

安装

打开命令行键入

pip install PyPDF2

一、The PdfFileReader Class

 PyPDF2.PdfFileReader(stream, strict=True, warndest=None, overwriteWarnings=True)

Parameters:
stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file.

strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to True.

warndest – Destination for logging warnings (defaults to sys.stderr).

overwriteWarnings (bool) – Determines whether to override Python’s warnings.py module with a custom implementation (defaults to True).

1、getNumPages()

Calculates the number of pages in this PDF file.

Returns: number of pages
Return type: int
Raises PdfReadError:
if file is encrypted and restrictions prevent this action.

2、getPage(pageNumber)

Retrieves a page by number from this PDF file.

Parameters: pageNumber (int)
– The page number to retrieve (pages begin at zero)
Returns: a PageObject instance.
Return type: PageObject

二、The PdfFileWriter Class

class PyPDF2.PdfFileWriter
This class supports writing PDF files out, given pages produced by another class (typically PdfFileReader).

1、addPage(page)

Adds a page to this PDF file. The page is usually acquired from a PdfFileReader instance.

Parameters: page (PageObject) – The page to add to the document. Should be an instance of PageObject

2、write(stream)

Writes the collection of pages added to this object out as a PDF file.

Parameters: stream – An object to write the file to. The object must support the write method and the tell method, similar to a file object.

三、The PdfFileMerger Class

Initializes a PdfFileMerger object. PdfFileMerger merges multiple PDFs into a single PDF. It can concatenate, slice, insert, or any combination of the above.
初始化一个PdfFileMerger对象,PdfFileMerger 用来将多个PDF合并为一个PDF,它能够连接,切割,插入或者以上的任意组合

See the functions merge() (or append()) and write() for usage information.

Parameters: strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to True.

方法1、append()

append(fileobj, bookmark=None, pages=None, import_bookmarks=True)

Identical to the merge() method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.
和merge()方法相同,但假定的是你想要把全部页面连接到文件的最后而不是指定位置

Parameters:
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
一个文件对象(python中用open()创建的对象)或者类似文件对象的能够支持标准读取和寻找方法的对象,也可以是一个代表指向PDF文件路径的字符串
bookmark (str) – Optionally, you may specify a bookmark to be applied at the beginning of the included file by supplying the text of the bookmark.
pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.
可以是一个页码序列或者一个(start, stop[, step])元组,用来合并指定范围的源文件页面到输出文件

import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

在这里插入代码片

方法2、merge()

merge(position, fileobj, bookmark=None, pages=None, import_bookmarks=True)

Merges the pages from the given file into the output file at the specified page number.
从指定位置合并来自给定文件的页面到输出文件

Parameters:
position (int) – The page number to insert this file. File will be inserted after the given number.
插入文件的页码数,将插入到给定页数的后面
0口1 口2 口3
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
bookmark (str) – Optionally, you may specify a bookmark to be applied at the beginning of the included file by supplying the text of the bookmark.
pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.

import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

注意:position和pages均指的是下图绿色数字,pages的范围是绿色数字之间囊括的页面
在这里插入图片描述

方法3、write()

write(fileobj)
Writes all data that has been merged to the given output file.
将所有被合并的数据写入到给定的输出文件中

Parameters: fileobj – Output file. Can be a filename or any kind of file-like object.
输出文件,可以是一个文件名或者所有类似文件对象的对象

实例一:删除

#PDF_delete.py
from PyPDF2 import PdfFileWriter, PdfFileReader

def PDF_delete(index):
    output = PdfFileWriter()  # 声明一个用于输出PDF的实例
    input1 = PdfFileReader(open("C:/Users/Yuanzheng/Desktop/Test1.pdf", "rb"))  # 读取本地PDF文件
    pages = input1.getNumPages()  # 读取文档的页数
    for i in range(pages):
        if i + 1 in index:
            continue  # 待删除的页面
        output.addPage(input1.getPage(i))  # 读取PDF的第i页,添加到输出Output实例中
    outputStream = open("C:/Users/Yuanzheng/Desktop/Test-Output1.pdf", "wb")
    output.write(outputStream)  # 把编辑后的文档保存到本地
PDF_delete([2])

实例二:合并

#PDF_merger.py
from PyPDF2 import PdfFileMerger

merger = PdfFileMerger()

input1 =open("C:/Users/Yuanzheng/Desktop/Test1.pdf","rb")
input2  = open("C:/Users/Yuanzheng/Desktop/Test2.pdf","rb")

merger.append(fileobj= input1)
merger.merge(position=0,fileobj=input2,pages=(1,3))

output = open("C:/Users/Yuanzheng/Desktop/PyPDF-Output2.pdf","wb")
merger.write(output)

Reference:https://pythonhosted.org/PyPDF2/PageObject.html

  • 4
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
可以使用 PyPDF2 实现 PDF 的分割和合并。 首先需要安装 PyPDF2,可以使用以下命令进行安装: ``` pip install PyPDF2 ``` 接下来就可以开始实现分割和合并功能了。 ## PDF 分割 PDF 分割可以将一个 PDF 文件拆分成多个单独的文件,可以按照页面数或者指定的页面进行拆分。 以下是一个简单的分割示例,将一个 PDF 文件按照每两页分割成一个文件: ```python import os from PyPDF2 import PdfFileReader, PdfFileWriter def split_pdf(input_path): with open(input_path, 'rb') as input_pdf: input_pdf_reader = PdfFileReader(input_pdf) output_pdf_writer = PdfFileWriter() for i in range(input_pdf_reader.getNumPages()): if i % 2 == 0: output_pdf_writer.addPage(input_pdf_reader.getPage(i)) with open(f'{os.path.splitext(input_path)[0]}_{i//2}.pdf', 'wb') as output_pdf: output_pdf_writer.write(output_pdf) output_pdf_writer = PdfFileWriter() else: output_pdf_writer.addPage(input_pdf_reader.getPage(i)) if input_pdf_reader.getNumPages() % 2 != 0: with open(f'{os.path.splitext(input_path)[0]}_{input_pdf_reader.getNumPages()//2}.pdf', 'wb') as output_pdf: output_pdf_writer.write(output_pdf) ``` 以上代码将每两页作为一组,将其拆分成多个文件。拆分后的文件名为原文件名加上下划线和组数,例如 `input_0.pdf`、`input_1.pdf` 等。 ## PDF 合并 PDF 合并可以将多个 PDF 文件合并成一个文件。 以下是一个简单的合并示例,将多个 PDF 文件合并成一个文件: ```python from PyPDF2 import PdfFileMerger def merge_pdf(input_paths, output_path): pdf_merger = PdfFileMerger() for input_path in input_paths: with open(input_path, 'rb') as input_pdf: pdf_merger.append(input_pdf) with open(output_path, 'wb') as output_pdf: pdf_merger.write(output_pdf) ``` 以上代码将所有输入的 PDF 文件按照顺序合并成一个文件,并保存到指定的输出文件中。 以上就是使用 PyPDF2 实现 PDF 分割和合并的简单示例。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值