13python利用pdfplumber库提取PDF文字以及表格内容

最新推荐文章于 2024-05-05 20:04:55 发布

Python学习中的进阶者

最新推荐文章于 2024-05-05 20:04:55 发布

阅读量5.9k

点赞数 8

分类专栏： Python职场实用技能文章标签： python

本文链接：https://blog.csdn.net/weixin_42850424/article/details/105451262

版权

pip install pypdf2
pip install pdfplumber==0.5.14
利用pdfplumber提取文字

import pdfplumber

with pdfplumber.open("Netease Q2 2019 Earnings Release-Final.pdf") as pdf:
    first_page = pdf.pages[0]
    print(first_page.extract_text())

利用pdfplumber提取表格

import pdfplumber

with pdfplumber.open("simple_1.pdf") as pdf:
    first_page = pdf.pages[0]
    print(first_page.extract_table())

利用pdfplumber提取多个简单的表格

import pdfplumber

with pdfplumber.open("simple_1.pdf") as pdf:
    table_page = pdf.pages[0]
    for table in table_page.extract_tables():
    	print(table)

需要设置一下.extract_table()方法里面的参数

import pdfplumber

with pdfplumber.open("Netease Q2 2019 Earnings Release-Final.pdf") as pdf:
    table_page = pdf.pages[9]
    table = table_page.extract_table(
        table_settings = {
   
            'vertical_strategy':"text",
            "horizontal_strategy":"text",
        })
    print(table)

将获取的数据写到Excel中

import pdfplumber

with pdfplumber.open("Netease Q2 2019 Earnings Release-Final.pdf") as pdf:
    table_page = pdf.pages[9]
    table = table_page.extract_table(
        table_settings = {
   
            'vertical_strategy':"text",
            "horizontal_strategy":"text",
        })

from openpy

最低0.47元/天解锁文章

Python学习中的进阶者

关注

8
点赞
踩
75

收藏

觉得还不错? 一键收藏
0
评论
13python利用pdfplumber库提取PDF文字以及表格内容

利用pdfplumber提取文字import pdfplumberwith pdfplumber.open("Netease Q2 2019 Earnings Release-Final.pdf") as pdf: first_page = pdf.pages[0] print(first_page.extract_text())利用pdfplumber提取表格impo...
复制链接

扫一扫