python提取PDF表格中数据

向着光-

已于 2022-01-23 13:26:21 修改

阅读量1.4k

点赞数 1

分类专栏： python文本自动化文章标签： python 开发语言后端

于 2022-01-23 13:22:16 首次发布

本文链接：https://blog.csdn.net/qq_37343780/article/details/122650265

版权

python文本自动化专栏收录该内容

2 篇文章 0 订阅

订阅专栏

PDF数据

数据分为两半，需要提取出来后将右边的数据续接到左边

#-*-coding:GBK -*-
import pdfplumber
import pandas as pd 
import matplotlib.pyplot as plt
import numpy as np

pages=range(13,17)#提取的pdf页数范围
result_all=None

#打开pdf文件
with pdfplumber.open(r'C:\Users\chenb\Desktop\XXX.pdf') as pdf:
    for p in pages:
        page = pdf.pages[p]
        for table in page.extract_tables():
            tb=pd.DataFrame(table[1:],columns=table[0],index=None)
        
        #处理格式
        tb=tb.drop(index=0)
        tb1=tb.iloc[:,0:4] #提取0:4列
        tb2=tb.iloc[:,4:8] #提取4:8列
        result_page = pd.concat([tb1, tb2]) #拼接在一起
        if result_all is None:
            result_all = result_page
        else:
            result_all = pd.concat([result_all,result_page])

#保存为excel
result_all.to_excel(r'C:\Users\chenb\Desktop\XXX.xlsx',index=False)
result_read = pd.read_excel(r'C:\Users\chenb\Desktop\XXX.xlsx',sheet_name='Sheet1',header=0)
print(result_read.dtypes)


#绘制图形
plt.figure(figsize=(10, 12))
ax=plt.subplot()
ax.xaxis.set_ticks_position('top') #将x轴的位置设置在顶部
ax.invert_yaxis()#翻转y轴
plt.plot(result_read.iloc[:,1],result_read.iloc[:,0])
plt.plot(result_read.iloc[:,2],result_read.iloc[:,0])
plt.plot(result_read.iloc[:,3],result_read.iloc[:,0])

plt.show()

提取结果

向着光-

关注

1
点赞
踩
15

收藏

觉得还不错? 一键收藏
打赏
0
评论
python提取PDF表格中数据

数据分为两半，需要提取出来后将右边的数据续接到左边#-*-coding:GBK -*-import pdfplumberimport pandas as pd import matplotlib.pyplot as pltimport numpy as nppages=range(13,17)#提取的pdf页数范围result_all=None#打开pdf文件with pdfplumber.open(r'C:\Users\chenb\Desktop\XXX.pdf') as pdf:.
复制链接

扫一扫