python MAC pdf2image 的安装以及遇到的一些坑

最新推荐文章于 2025-04-07 15:52:16 发布

qq_1096260969

最新推荐文章于 2025-04-07 15:52:16 发布

阅读量7.2k

点赞数 3

分类专栏：计算机视觉机器学习深度学习

本文链接：https://blog.csdn.net/qq_36489878/article/details/103880834

版权

深度学习同时被 3 个专栏收录

12 篇文章

订阅专栏

机器学习

11 篇文章

订阅专栏

计算机视觉

9 篇文章

订阅专栏

pdf2image 是一个将pdf文件转为image文件的包。

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object

或者可以去github 的官网链接看相关的安装教程。

github地址为：https://github.com/Belval/pdf2image

安装：

pip install pdf2image

但是对于不同的平台的需要的安装相应的组件。

Windows

对于windows用户需要安装 poppler for Windows, 并且需要添加环境变量，将bin/文件夹的路径添加到换机变量中

Mac

对于mac用户，需要也需要安装 poppler for Mac.具体的安装见下面。

Linux

大多数发行版附带pdftoppm和pdftocairo。如果尚未安装，请参考软件包管理器以安装poppler-utils

pdf2image的使用

一共有三种使用的方法：

直接使用路径去读取pdf文件。
先使用open打开，再利用convert_from_bytes去解析。
采用临时文件的方式去读取

#以下三种方式都可以读取文件，第三种最好
image = convert_from_path('data/0.pdf')
image = convert_from_bytes(open('data/0.pdf', 'rb').read())
with tempfile.TemporaryDirectory() as path:
    image_from_path = convert_from_path('data/0.pdf', output_folder=path)

from pdf2image import convert_from_path,convert_from_bytes
import tempfile
from pdf2image.exceptions import (
    PDFInfoNotInstalledError,
    PDFPageCountError,
    PDFSyntaxError
)
def pdf2image2(pdfPath, imagePath, pageNum):
    #方法一：
    #convert_from_path('a.pdf', dpi=500, "output",fmt="JPEG",output_file="ok",thread_count=4)
    #这会将a.pdf转换成在output文件夹下形如ok_线程id-页码.jpg的一些文件。
    #若不指定thread_count则默认为1，并且在文件名中显示id. 这种转换是直接写入到磁盘上的，因此不会占用太多内存。
    
#     #下面的写法直接写入到内存,
#     images = convert_from_path(pdfPath, dpi=96)
#     for image in images:
#         if not os.path.exists(imagePath):
#             os.makedirs(imagePath)
#         image.save(imagePath+'/'+'psReport_%s.png' % images.index(image), 'PNG')
    
    #方法二：
    images = convert_from_bytes(open(pdfPath, 'rb').read())
    for image in images:
        if not os.path.exists(imagePath):
            os.makedirs(imagePath)
        image.save(imagePath+'/'+'psReport_%s.png' % images.index(image), 'PNG')    
    
    #方法三，也是最推荐的方法
#     with tempfile.TemporaryDirectory() as path:
#         images_from_path = convert_from_path(pdfPath, output_folder=path, dpi=96)
#         for image in images_from_path:
#             if not os.path.exists(imagePath):
#                 os.makedirs(imagePath)
#             image.save(imagePath+'/'+'psReport_%s.png' % images_from_path.index(image), 'PNG')
#         print(images_from_path)
pdf_path = "data/pdf/bpm.pdf"
image_path = "data/image"
page_num = 35
pdf2image2(pdf_path, image_path, page_num)

填坑笔记

1.第一次在直接pip安装完成之后，由于没有安装poppler这个包，一直报错，原来是没有没有安装popppler组件，提示PDFInfoNotInstalledError:Unable to get page count.Is poppler installed and in PATH?

2.由于我的是mac电脑这里注重介绍一下mac的poppler安装。

参考链接：http://macappstore.org/poppler/

第一步，在中断中输入以下代码：

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" < /dev/null 2> /dev/null

第二步：

brew install poppler

等以上都运行完毕的时候，就已经安装完成了。

Reference

https://blog.csdn.net/qq_30159015/article/details/80200202

https://cloud.tencent.com/developer/article/1481638

python MAC pdf2image 的安装 以及遇到的一些坑

pdf2image 是一个将pdf文件转为image文件的包。

安装：

Windows

Mac

Linux

pdf2image的使用

填坑笔记

1.第一次在直接pip安装完成之后，由于没有安装poppler这个包，一直报错，原来是没有没有安装popppler组件，提示PDFInfoNotInstalledError:Unable to get page count.Is poppler installed and in PATH?

2.由于我的是mac电脑这里注重介绍一下mac的poppler安装。

Reference

python MAC pdf2image 的安装以及遇到的一些坑