PyPDF2

最新推荐文章于 2024-10-05 08:31:42 发布

g863402758

最新推荐文章于 2024-10-05 08:31:42 发布

阅读量9.2k

点赞数 5

分类专栏： python

python 专栏收录该内容

29 篇文章 0 订阅

订阅专栏

安装

直接使用 pip 安装就可以了
pip install PyPDF2

PyPDF2 包含了 PdfFileReader PdfFileMerger PageObject PdfFileWriter 四个常用的主要 Class。

　简单读写 PDF

from PyPDF2 import PdfFileReader, PdfFileWriter
infn = 'infn.pdf'
outfn = 'outfn.pdf'
# 获取一个 PdfFileReader 对象
pdf_input = PdfFileReader(open(infn, 'rb'))
# 获取 PDF 的页数
page_count = pdf_input.getNumPages()
print(page_count)
# 返回一个 PageObject
page = pdf_input.getPage(i)

# 获取一个 PdfFileWriter 对象
pdf_output = PdfFileWriter()
# 将一个 PageObject 加入到 PdfFileWriter 中
pdf_output.addPage(page)
# 输出到文件中
pdf_output.write(open(outfn, 'wb'))

应用实例合并分割 PDF

from PyPDF2 import PdfFileReader, PdfFileWriter

def split_pdf(infn, outfn):
    pdf_output = PdfFileWriter()
    pdf_input = PdfFileReader(open(infn, 'rb'))
    # 获取 pdf 共用多少页
    page_count = pdf_input.getNumPages()
    print(page_count)
    # 将 pdf 第五页之后的页面，输出到一个新的文件
    for i in range(5, page_count):
        pdf_output.addPage(pdf_input.getPage(i))
    pdf_output.write(open(outfn, 'wb'))

def merge_pdf(infnList, outfn):
    pdf_output = PdfFileWriter()
    for infn in infnList:
        pdf_input = PdfFileReader(open(infn, 'rb'))
        # 获取 pdf 共用多少页
        page_count = pdf_input.getNumPages()
        print(page_count)
        for i in range(page_count):
            pdf_output.addPage(pdf_input.getPage(i))
    pdf_output.write(open(outfn, 'wb'))

if __name__ == '__main__':
    infn = 'infn.pdf'
    outfn = 'outfn.pdf'
    split_pdf(infn, outfn)

应用实例源代码可以在 https://github.com/xchaoinfo/Py-example-by-xchaoinfo 找到。

Refer: PyPDF2 Documentation

转自：https://zhuanlan.zhihu.com/p/26647491

Easy Concatenation with `pdfcat`

PyPDF2 contains a growing variety of sample programs meant to demonstrate its features. It also contains useful scripts such as pdfcat, located within the Scripts folder. This script makes it easy to concatenate PDF files by using Python slicing syntax. Because we are slicing PDF pages, we refer to the slices as page ranges.

Page range expression examples:

`:`	all pages	`-1`	last page
`22`	just the 23rd page	`:-1`	all but the last page
`0:3`	the first three pages	`-2`	second-to-last page
`:3`	the first three pages	`-2:`	last two pages
`5:`	from the sixth page onward	`-3:-1`	third & second to last

The third stride or step number is also recognized:

`::2`	0 2 4 ... to the end
`1:10:2`	1 3 5 7 9
`::-1`	all pages in reverse order
`3:0:-1`	3 2 1 but not 0
`2::-1`	2 1 0

Usage for pdfcat is as follows:

 
  >>> pdfcat [-h] [-o output.pdf] [-v] input.pdf [page_range...] ...

You can add as many input files as you like. You may also specify as many page ranges as needed for each file.

Optional arguments:

`-h, --help`	Show the help message and exit
`-o, --output`	Follow this argument with the output PDF file. Will be created if it doesn’t exist.
`-v, --verbose`	Show page ranges as they are being read

Examples:

 
  >>> pdfcat -o output.pdf head.pdf content.pdf :6 7: tail.pdf -1

Concatenates all of head.pdf, all but page seven of content.pdf, and the last page of tail.pdf, producing output.pdf.

 
  >>> pdfcat chapter*.pdf >book.pdf

You can specify the output file by redirection.

>>> pdfcat chapter?.pdf chapter10.pdf >book.pdf

In case you don’t want chapter 10 before chapter 2.

其它转化工具：pdftk

用法：

pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf

g863402758

关注

5
点赞
踩
9

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

PyPDF2

安装

简单读写 PDF

应用实例 合并分割 PDF

Easy Concatenation with pdfcat

　简单读写 PDF

应用实例合并分割 PDF

Easy Concatenation with `pdfcat`