又是美好的一天,如果大家因为PDF转word而烦恼的话可以尝试用用Python的PDFMiner3K,这是一个非常好用的库,下面是把任意PDF读成字符串,然后用stingio转化成文件对象:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
#作者:どうでもいい菌
#网址:https://space.bilibili.com/450634867
from urllib.request import urlopen
from pdfminer.pdfinterp import PDFResourceManager,process_pdf
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from io import StringIO
from io import open
import os
def readpdf(pdfile):
rsrcmgr = PDFResourceManager()
retstr = StringIO()
laparams = LAParams()
device = TextConverter(rsrcmgr,retstr,laparams = laparams)
process_pdf(rsrcmgr,device,pdfFile)
device.close()
content = retstr.getvalue()
retstr.close()
return content
while True:
Pdf_file_address = input('请输入文件地址:')
Txt_file_address = input('请输入输出文件地址:')
if os.path.exists(Pdf_file_address) and os.path.exists(Txt_file_address):
break
else:
print( "Sorry, I cannot find the address.")
pdfFile = open(str(Pdf_file_address),'rb')
outputString = readpdf(pdfFile)
print(outputString)
pdfFile.close()
fileewriter = open(str(Txt_file_address),'w')
fileewriter.write(outputString)
fileewriter.close()
print('已写入')
好了,以上就是所有代码了,详细教程和结果可以看视频学习,up会出 一起视频教程详细讲解操作过程的,有喜欢的小伙伴记得点赞哦!