python函数大全pdf_求教使用python库提取pdf的方法?

之前用过的pdfminer pip install pdfminer

# -*- coding: utf-8 -*-

from bs4 import BeautifulSoup

import requests

import re

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter

from pdfminer.converter import TextConverter

from pdfminer.layout import LAParams

from cStringIO import StringIO

#from io import StringIO for python3

from io import open

from pdfminer.pdfpage import PDFPage

def pdf_txt(url):

rsrcmgr = PDFResourceManager()

retstr = StringIO()

codec = 'utf-8'

laparams = LAParams()

device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)

f = requests.get(url).content

fp = StringIO(f)

interpreter = PDFPageInterpreter(rsrcmgr, device)

password = ""

maxpages = 0

caching = True

pagenos = set()

for page in PDFPage.get_pages(fp,

pagenos,

maxpages=maxpages,

password=password,

caching=caching,

check_extractable=True):

interpreter.process_page(page)

fp.close()

device.close()

str = retstr.getvalue()

retstr.close()

return str

txt=tpdf_txt('http://pythonscraping.com/pages/warandpeace/chapter1.pdf')

print txt

#如果pdf含有中文,输出到文件

#open('pdf.txt','wb').write(txt)

python readpdf.py

'''

CHAPTER I

"Well, Prince, so Genoa and Lucca are now just family estates of

theBuonapartes. But I warn you, if you don't tell me that this

means war,if you still try to defend the infamies and horrors

perpetrated bythat Antichrist- I really believe he is Antichrist- I will

havenothing more to do with you and you are no longer my friend,

no longermy 'faithful slave,' as you call yourself! But how do you

do? I seeI have frightened you- sit down and tell me all the news."

It was in July, 1805, and the speaker was the well-known

AnnaPavlovna Scherer, maid of honor and favorite of the

Empress MaryaFedorovna. With these words she greeted Prince

Vasili Kuragin, a manof high rank and importance, who was the

first to arrive at herreception. Anna Pavlovna had had a cough for

some days. She was, asshe said, suffering from la grippe; grippe

being then a new word inSt. Petersburg, used only by the elite.

All her invitations without exception, written in French,

anddelivered by a scarlet-liveried footman that morning, ran as

'''

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值