1、使用VS code搭建Python编译环境
2、安装pdf2doc库
pip install pdf2docx
3、编写代码
3.1 使用parse将pdf转化为docx
编写 pdf2docxParse.py
from pdf2docx import parse
pdf_file = 'demo-image-overlap.pdf'
docx_file = 'demo-image-overlap.docx'
parse(pdf_file, docx_file)
运行 pdf2docxParse.py
python pdf2docxParse.py
3.2 使用convert将pdf转化为docx
3.2.1 编写 pdf2docxConvert.py
from pdf2docx import Converter
pdf_file = 'demo-image-overlap.pdf'
docx_file = 'demo-image-overlap.docx'
cv = Converter(pdf_file)
cv.convert(docx_file, start=0, end=None)
cv.close()
3.2.2 运行 pdf2docxConvert.py
python pdf2docxConvert.py
3.3 使用命令行输入pdf 转化pdf
3.3.1 编写 SMQHPdf2Docx.py
'''
@Description 使用命令行到处pdf
@Author: 少莫千华
@Time: 2023-06-11
'''
import argparse
from pdf2docx import Converter
def main(pdf_file,docx_file):
cv = Converter(pdf_file)
cv.convert(docx_file, start=0, end=None)
cv.close()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--pdf",type=str)
args = parser.parse_args()
main(args.pdf,args.pdf + '.docx')
3.2.3 运行 SMQHPdf2Docx.py
python SMQHPdf2Docx.py --pdf demo-image-overlap.pdf
3.3 转化效果
PDF
DOCX