python剪切文件到另一个文件夹,Python + PyPdf:裁剪页面区域并将其粘贴到另一个页面中...

Lets say you have a pdf page with various complex elements inside.

The objective is to crop a region of the page (to extract only one of the elements) and then paste it in another pdf page.

Oy6BA.png

Here is a simplified version of my code:

import PyPDF2

import PyPdf

def extract_tree(in_file, out_file):

with open(in_file, 'rb') as infp:

# Read the document that contains the tree (in its first page)

reader = pyPdf.PdfFileReader(infp)

page = reader.getPage(0)

# Crop the tree. Coordinates below are only referential

page.cropBox.lowerLeft = [100,200]

page.cropBox.upperRight = [250,300]

# Create an empty document and add a single page containing only the cropped page

writer = pyPdf.PdfFileWriter()

writer.addPage(page)

with open(out_file, 'wb') as outfp:

writer.write(outfp)

def insert_tree_into_page(tree_document, text_document):

# Load the first page of the document containing 'text text text text...'

text_page = PyPDF2.PdfFileReader(file(text_document,'rb')).getPage(0)

# Load the previously cropped tree (cropped using 'extract_tree')

tree_page = PyPDF2.PdfFileReader(file(tree_document,'rb')).getPage(0)

# Overlay the text-page and the tree-crop

text_page.mergeScaledTranslatedPage(page2=tree_page,scale='1.0',tx='100',ty='200')

# Save the result into a new empty document

output = PyPDF2.PdfFileWriter()

output.addPage(text_page)

outputStream = file('merged_document.pdf','wb')

output.write(outputStream)

# First, crop the tree and save it into cropped_document.pdf

extract_tree('document1.pdf', 'cropped_document.pdf')

# Now merge document2.pdf with cropped_document.pdf

insert_tree_into_page('cropped_document.pdf', 'document2.pdf')

The method "extract_tree" seems to be working. It generates a pdf file containing only the cropped region (in the example, the tree).

The problem in that when I try to paste the tree in the new page, the star and the house of the original image are pasted anyway

解决方案

I had the exact same issue. In the end, the solution for me was to make a small edit to the source code of pyPDF2 (from this pull request, which never made it into the master branch). What you need to do is insert these lines into the method _mergePage of the class PageObject inside the file pdf.py:

page2Content = ContentStream(page2Content, self.pdf)

page2Content.operations.insert(0, [map(FloatObject, [page2.trimBox.getLowerLeft_x(), page2.trimBox.getLowerLeft_y(), page2.trimBox.getWidth(), page2.trimBox.getHeight()]), "re"])

page2Content.operations.insert(1, [[], "W"])

page2Content.operations.insert(2, [[], "n"])

(see the pull request for exactly where to put them). With that done, you can then crop the section of a pdf you want, and merge it with another page with no issues. There's no need to save the cropped section into a separate pdf, unless you want to.

from PyPDF2 import PdfFileReader, PdfFileWriter

tree_page = PdfFileReader(open('document1.pdf','rb')).getPage(0)

text_page = PdfFileReader(open('document2.pdf','rb')).getPage(0)

tree_page.cropBox.lowerLeft = [100,200]

tree_page.cropBox.upperRight = [250, 300]

text_page.mergeScaledTranslatedPage(page2=tree_page, scale='1.0', tx='100', ty='200')

output = PdfFileWriter()

output.addPage(text_page)

output.write(open('merged_document.pdf', 'wb'))

Maybe there's a better way of doing this that inserts that code without directly editing the source code. I'd be grateful if anyone finds a way to do it as this admittedly is a slightly dodgy hack.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值