python taslate_Python slate包_程序模块 - PyPI - Python中文网

最新推荐文章于 2023-01-17 11:57:24 发布

weixin_39908616

最新推荐文章于 2023-01-17 11:57:24 发布

阅读量205

点赞数

文章标签： python taslate

slate是一个python包，它简化了提取过程

PDF文件中的文本。这取决于pdfminer包。

slate提供了一个类，pdf。pdf接受一个类似文件的对象

将从文档中提取所有文本，表示每一页

作为文本字符串：>>> with open('example.pdf') as f:

... doc = slate.PDF(f)

...

>>> doc

[..., ..., ...]

>>> doc[1]

'Text from page 2...'

如果您的pdf受密码保护，请将密码作为

第二个参数：>>> with open('secrets.pdf') as f:

... doc = slate.PDF(f, 'password')

...

>>> doc[0]

"My mother doesn't know this, but..."

更复杂的操作

如果您想访问图像、字体文件和其他

信息，然后花点时间学习pdfminer api。

pdfminer怎么了？Getting simple things done, like extracting the text

is quite complex. The program is not designed to return

Python objects, which makes interfacing things irritating.

It’s an extremely complete set of tools, with multiple

and moderately steep learning curves.

It’s not written with hackability in mind.

欢迎加入QQ群-->： 979659372

推荐PyPI第三方库

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注