PyPDF2

安装

直接使用 pip 安装就可以了
pip install PyPDF2

PyPDF2 包含了 PdfFileReader PdfFileMerger PageObject PdfFileWriter 四个常用的主要 Class。

 简单读写 PDF

from PyPDF2 import PdfFileReader, PdfFileWriter
infn = 'infn.pdf'
outfn = 'outfn.pdf'
# 获取一个 PdfFileReader 对象
pdf_input = PdfFileReader(open(infn, 'rb'))
# 获取 PDF 的页数
page_count = pdf_input.getNumPages()
print(page_count)
# 返回一个 PageObject
page = pdf_input.getPage(i)

# 获取一个 PdfFileWriter 对象
pdf_output = PdfFileWriter()
# 将一个 PageObject 加入到 PdfFileWriter 中
pdf_output.addPage(page)
# 输出到文件中
pdf_output.write(open(outfn, 'wb'))

应用实例 合并分割 PDF

from PyPDF2 import PdfFileReader, PdfFileWriter

def split_pdf(infn, outfn):
    pdf_output = PdfFileWriter()
    pdf_input = PdfFileReader(open(infn, 'rb'))
    # 获取 pdf 共用多少页
    page_count = pdf_input.getNumPages()
    print(page_count)
    # 将 pdf 第五页之后的页面,输出到一个新的文件
    for i in range(5, page_count):
        pdf_output.addPage(pdf_input.getPage(i))
    pdf_output.write(open(outfn, 'wb'))

def merge_pdf(infnList, outfn):
    pdf_output = PdfFileWriter()
    for infn in infnList:
        pdf_input = PdfFileReader(open(infn, 'rb'))
        # 获取 pdf 共用多少页
        page_count = pdf_input.getNumPages()
        print(page_count)
        for i in range(page_count):
            pdf_output.addPage(pdf_input.getPage(i))
    pdf_output.write(open(outfn, 'wb'))

if __name__ == '__main__':
    infn = 'infn.pdf'
    outfn = 'outfn.pdf'
    split_pdf(infn, outfn)

应用实例源代码可以在 github.com/xchaoinfo/Py 找到。

Refer: PyPDF2 Documentation

转自:https://zhuanlan.zhihu.com/p/26647491

Easy Concatenation with pdfcat

PyPDF2 contains a growing variety of sample programs meant to demonstrate its features. It also contains useful scripts such as pdfcat, located within the Scripts folder. This script makes it easy to concatenate PDF files by using Python slicing syntax. Because we are slicing PDF pages, we refer to the slices as page ranges.

Page range expression examples:

:all pages-1last page
22just the 23rd page:-1all but the last page
0:3the first three pages-2second-to-last page
:3the first three pages-2:last two pages
5:from the sixth page onward-3:-1third & second to last

The third stride or step number is also recognized:

::20 2 4 ... to the end
1:10:21 3 5 7 9
::-1all pages in reverse order
3:0:-13 2 1 but not 0
2::-12 1 0

Usage for pdfcat is as follows:

>>> pdfcat [-h] [-o output.pdf] [-v] input.pdf [page_range...] ...

You can add as many input files as you like. You may also specify as many page ranges as needed for each file.

Optional arguments:
-h--helpShow the help message and exit
-o--outputFollow this argument with the output PDF file. Will be created if it doesn’t exist.
-v--verboseShow page ranges as they are being read

Examples:

>>> pdfcat -o output.pdf head.pdf content.pdf :6 7: tail.pdf -1

Concatenates all of head.pdf, all but page seven of content.pdf, and the last page of tail.pdf, producing output.pdf.

>>> pdfcat chapter*.pdf >book.pdf

You can specify the output file by redirection.

>>> pdfcat chapter?.pdf chapter10.pdf >book.pdf

In case you don’t want chapter 10 before chapter 2.



其它转化工具:pdftk

用法:

pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf


评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值