提取pdf文件内容

最新推荐文章于 2024-05-22 17:00:57 发布

I_HAVE_COME

最新推荐文章于 2024-05-22 17:00:57 发布

阅读量982

点赞数

文章标签： python

本文链接：https://blog.csdn.net/I_HAVE_COME/article/details/106012926

版权

pdfplumber提取文字
import pdfplumber
with pdfplumber.open(‘XXX.Pdf’) as pd:
#pdfplumber.open(PDF路径)
first_page=pdf.pages[0]
#pdf.pages[页数]
print(first_page.extract_text())

pdfplumber提取表格
with pdfplumber.open(‘XXX.Pdf’) as pd:
table_page=pdf.pages[0]
table=table_page.extract_table()
print(table)
#提取多个表格
for table in table_page.extract_tables():
print(table) #table信息为列表类型
提取表格时的设定
table_page.extract_table(
table_settings={
‘vertical_strategy’:‘text’
‘horizontal_strategy’:‘text’
})

写入excel表格文件中
from openpyxl import Workbook
workbook=Workbook()
sheet=workbook.active
for row in table:
sheet.append(row)
workbook.save(filename=‘XXX.xlsx’) #存在空行和将单词分到多个不同列的问题

去除空行，将每个元素连成一个字符串，如果还是一个空字符串那么肯定是空行
new_table=[]
for row in table:
if not ‘’.join([str(item) fo

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

I_HAVE_COME

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
提取pdf文件内容

pdfplumber提取文字import pdfplumberwith pdfplumber.open(‘XXX.Pdf’) as pd:#pdfplumber.open(PDF路径)first_page=pdf.pages[0]#pdf.pages[页数]print(first_page.extract_text())pdfplumber提取表格with pdfplumber.open(‘XXX.Pdf’) as pd:table_page=pdf.pages[0]table=tabl
复制链接

扫一扫