python pdf获取页面大小（高度、宽度）

最新推荐文章于 2024-11-08 15:05:27 发布

原创最新推荐文章于 2024-11-08 15:05:27 发布 · 9.7k 阅读

18 ·

CC 4.0 BY-SA版权

文章标签：

#python

文档处理专栏收录该内容

13 篇文章

订阅专栏

本文介绍了如何使用Python的pdfplumber和PyPDF2库获取PDF页面的高度和宽度。针对PDF首页，提供了两种解决方案，并提到pdfplumber方法虽然简单但效率较低，而PyPDF2方法效率高但需要处理可能的加密问题。

部署运行你感兴趣的模型镜像

问题描述

如题，获取PDF页面的高度和宽度，这里仅获取首页的高度和宽度

解决方案

两种解决方案，分别通过 pdfplumber 和 PyPDF2 两个包来实现

方案1

import time
import pdfplumber

path = 'E:/data/DT_test/PDF_test/all_type.pdf'

def run(path):
    with pdfplumber.open(path) as pdf:
        page_1 = pdf.pages[0]
        return page_1.height, page_1.width 
    
start = time.time()
height, width = run(path)
print('height: %s, width: %s'%(height, width)) #height: 841.920, width: 595.200
print('cost time:', time.time()-start) #cost time: 0.07300710678100586

方案2

import time
from PyPDF2 import PdfFileReader

path = 'E:/data/DT_test/PDF_test/all_type.pdf'

def run(path):
    pdf = PdfFileReader(open(path, 'rb'))
    page_1 = pdf.getPage(0)
    if page_1.get('/Rotate', 0) in [90, 270]:
        return page_1['/MediaBox'][2], page_1['/MediaBox'][3]
    else:
        return page_1['/MediaBox'][3], page_1['/MediaBox'][2]

start = time.time()
height, width = run(path)
print('height: %s, width: %s'%(height, width)) #height: 841.92, width: 595.2
print('cost time:', time.time()-start) #cost time: 0.007000923156738281