python-docx的使用
说明,通过docx库把word的标题都提取出来,转化成markdown的格式
待完善部分,图像提取,表格提取
demo.py
from docx import Document
path = '2_test.docx' # 文件路径
wordfile = Document(path) # 读入文件
paragraphs = wordfile.paragraphs
list_txt = []
title1_number = 0
title2_number = 0
for paragraph in paragraphs:
print(paragraph.style.name)
print(paragraph.text)
if paragraph.style.name == 'Heading 1':
title1_number += 1
title1 = f'# {