python剪切文件到另一个文件夹,Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面中...

最新推荐文章于 2022-12-01 16:49:29 发布

瑞恩的奇幻博物馆

最新推荐文章于 2022-12-01 16:49:29 发布

阅读量342

点赞数

文章标签： python剪切文件到另一个文件夹

Lets say you have a pdf page with various complex elements inside.

The objective is to crop a region of the page (to extract only one of the elements) and then paste it in another pdf page.

Here is a simplified version of my code:

import PyPDF2

import PyPdf

def extract_tree(in_file, out_file):

with open(in_file, 'rb') as infp:

# Read the document that contains the tree (in its first page)

reader = pyPdf.PdfFileReader(infp)

page = reader.getPage(0)

# Crop the tree. Coordinates below are only referential

page.cropBox.lowerLeft = [100,200]

page.cropBox.upperRight = [250,300]

# Create an empty document and add a single page containing only the cropped page

writer = pyPdf.PdfFileWriter()

writer.addPage(page)

with open(out_file, 'wb') as outfp:

writer.write(outfp)

def insert_tree_into_page(tree_document, text_document):

# Load the first page of the document containing 'text text text text...'

text_page = PyPDF2.PdfFileReader(file(text_document,'rb')).getPage(0)

# Load the previously cropped tree (cropped using 'extract_tree')

tree_page = PyPDF2.PdfFileReader(file(tree_document,'rb')).getPage(0)

# Overlay the text-page and the tree-crop

text_page.mergeScaledTranslatedPage(page2=tree_page,scale='1.0',tx='100',ty='200')

# Save the result into a new empty document

output = PyPDF2.PdfFileWriter()

output.addPage(text_page)

outputStream = file('merged_document.pdf','wb')

output.write(outputStream)

# First, crop the tree and save it into cropped_document.pdf

extract_tree('document1.pdf', 'cropped_document.pdf')

# Now merge document2.pdf with cropped_document.pdf

insert_tree_into_page('cropped_document.pdf', 'document2.pdf')

The method "extract_tree" seems to be working. It generates a pdf file containing only the cropped region (in the example, the tree).

The problem in that when I try to paste the tree in the new page, the star and the house of the original image are pasted anyway

解决方案

I had the exact same issue. In the end, the solution for me was to make a small edit to the source code of pyPDF2 (from this pull request, which never made it into the master branch). What you need to do is insert these lines into the method _mergePage of the class PageObject inside the file pdf.py:

page2Content = ContentStream(page2Content, self.pdf)

page2Content.operations.insert(0, [map(FloatObject, [page2.trimBox.getLowerLeft_x(), page2.trimBox.getLowerLeft_y(), page2.trimBox.getWidth(), page2.trimBox.getHeight()]), "re"])

page2Content.operations.insert(1, [[], "W"])

page2Content.operations.insert(2, [[], "n"])

(see the pull request for exactly where to put them). With that done, you can then crop the section of a pdf you want, and merge it with another page with no issues. There's no need to save the cropped section into a separate pdf, unless you want to.

from PyPDF2 import PdfFileReader, PdfFileWriter

tree_page = PdfFileReader(open('document1.pdf','rb')).getPage(0)

text_page = PdfFileReader(open('document2.pdf','rb')).getPage(0)

tree_page.cropBox.lowerLeft = [100,200]

tree_page.cropBox.upperRight = [250, 300]

text_page.mergeScaledTranslatedPage(page2=tree_page, scale='1.0', tx='100', ty='200')

output = PdfFileWriter()

output.addPage(text_page)

output.write(open('merged_document.pdf', 'wb'))

Maybe there's a better way of doing this that inserts that code without directly editing the source code. I'd be grateful if anyone finds a way to do it as this admittedly is a slightly dodgy hack.

瑞恩的奇幻博物馆

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python剪切文件到另一个文件夹,Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面中...

Lets say you have a pdf page with various complex elements inside.The objective is to crop a region of the page (to extract only one of the elements) and then paste it in another pdf page.Here is a si...
复制链接

扫一扫

python剪切文件到另一个文件夹,Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面中...

“相关推荐”对你有帮助么？