安装
直接使用 pip 安装就可以了
pip install PyPDF2
PyPDF2 包含了 PdfFileReader PdfFileMerger PageObject PdfFileWriter 四个常用的主要 Class。
简单读写 PDF
from PyPDF2 import PdfFileReader, PdfFileWriter
infn = 'infn.pdf'
outfn = 'outfn.pdf'
# 获取一个 PdfFileReader 对象
pdf_input = PdfFileReader(open(infn, 'rb'))
# 获取 PDF 的页数
page_count = pdf_input.getNumPages()
print(page_count)
# 返回一个 PageObject
page = pdf_input.getPage(i)
# 获取一个 PdfFileWriter 对象
pdf_output = PdfFileWriter()
# 将一个 PageObject 加入到 PdfFileWriter 中
pdf_output.addPage(page)
# 输出到文件中
pdf_output.write(open(outfn, 'wb'))
应用实例 合并分割 PDF
from PyPDF2 import PdfFileReader, PdfFileWriter
def split_pdf(infn, outfn):
pdf_output = PdfFileWriter()
pdf_input = PdfFileReader(open(infn, 'rb'))
# 获取 pdf 共用多少页
page_count = pdf_input.getNumPages()
print(page_count)
# 将 pdf 第五页之后的页面,输出到一个新的文件
for i in range(5, page_count):
pdf_output.addPage(pdf_input.getPage(i))
pdf_output.write(open(outfn, 'wb'))
def merge_pdf(infnList, outfn):
pdf_output = PdfFileWriter()
for infn in infnList:
pdf_input = PdfFileReader(open(infn, 'rb'))
# 获取 pdf 共用多少页
page_count = pdf_input.getNumPages()
print(page_count)
for i in range(page_count):
pdf_output.addPage(pdf_input.getPage(i))
pdf_output.write(open(outfn, 'wb'))
if __name__ == '__main__':
infn = 'infn.pdf'
outfn = 'outfn.pdf'
split_pdf(infn, outfn)
应用实例源代码可以在 https://github.com/xchaoinfo/Py-example-by-xchaoinfo 找到。
Refer: PyPDF2 Documentation
转自:https://zhuanlan.zhihu.com/p/26647491
Easy Concatenation with pdfcat
PyPDF2 contains a growing variety of sample programs meant to demonstrate its features. It also contains useful scripts such as pdfcat
, located within the Scripts
folder. This script makes it easy to concatenate PDF files by using Python slicing syntax. Because we are slicing PDF pages, we refer to the slices as page ranges.
Page range expression examples:
: | all pages | -1 | last page |
22 | just the 23rd page | :-1 | all but the last page |
0:3 | the first three pages | -2 | second-to-last page |
:3 | the first three pages | -2: | last two pages |
5: | from the sixth page onward | -3:-1 | third & second to last |
The third stride or step number is also recognized:
::2 | 0 2 4 ... to the end |
1:10:2 | 1 3 5 7 9 |
::-1 | all pages in reverse order |
3:0:-1 | 3 2 1 but not 0 |
2::-1 | 2 1 0 |
Usage for pdfcat
is as follows:
>>> pdfcat [-h] [-o output.pdf] [-v] input.pdf [page_range...] ...
You can add as many input files as you like. You may also specify as many page ranges as needed for each file.
-
Optional arguments:
-
-h, --help Show the help message and exit -o, --output Follow this argument with the output PDF file. Will be created if it doesn’t exist. -v, --verbose Show page ranges as they are being read
Examples:
>>> pdfcat -o output.pdf head.pdf content.pdf :6 7: tail.pdf -1
Concatenates all of head.pdf
, all but page seven of content.pdf
, and the last page of tail.pdf
, producing output.pdf
.
>>> pdfcat chapter*.pdf >book.pdf
You can specify the output file by redirection.
>>> pdfcat chapter?.pdf chapter10.pdf >book.pdf
In case you don’t want chapter 10 before chapter 2.
其它转化工具:pdftk
用法:
pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf