python提取pdf表格数据,如何使用Python从PDF文件中提取图表/表格/图形？

最新推荐文章于 2024-01-19 11:29:16 发布

Yungever

最新推荐文章于 2024-01-19 11:29:16 发布

阅读量391

点赞数

文章标签： python提取pdf表格数据

Searched quite a bit but as I couldn't find a solution for this kind of problem, hence posting a clear question on the same. Most answers cover image/text extraction which are comparatively easier.

I've a requirement of extracting tables and graphs as text (csv) and images respectively from PDFs.

Can anyone help me with an efficient python 3.6 code to solve the same?

Till now I could achieve extracting jpgs using startmark = b"\xff\xd8" and endmark = b"\xff\xd9", but not all tables and graphs in a PDF are plain jpgs, hence my code fails badly in achieving that.

Example, I want to extract table from page 11 and graphs from page 12 as image or something which is feasible from the below given link. How to go about it?

解决方案

For extracting tables you can use camelot

Here is an article about it.