0.安装python-docx模块
windows:pip install python-docx
mac:pip3 install python-docx
1.word文档结构
Document: 文档
Paragraph:段落
Run:文字块
共有三段
2.提取文字
2.1提取段落实例,段数:
.paragraphs 获取一个列表,包含每个段落的实例
from docx import Document
doc = Document("0.docx")
print(doc.paragraphs)
print(len(doc.paragraphs))
结果:
[<docx.text.paragraph.Paragraph object at 0x000001F88E2F2E80>, <docx.text.paragraph.Paragraph object at 0x000001F88E2F2C88>, <docx.text.paragraph.Paragraph object at 0x000001F88E2F2EF0>]
3
结果说明有三段
2.2提取段落内容
from docx import Document
doc = Document("0.docx")
for paragraph in doc.paragraphs:
print(paragraph.text)
以上便是excel与python结合的第二部分内容,后续将会持续更新excel,ppt,爬虫,人工智能等相关内容,敬请关注
2.3获取文字块run
excel与python结合的第二部分内容,后续将会持续更新excel,ppt,爬虫,人工智能
一个格式为一个文字块run 上述句子有7个文字块run
from docx import Document
doc = Document("0.docx")
paragraph = doc.paragraphs[1]
runs = paragraph.runs
print(runs)
[<docx.text.run.Run object at 0x000001F88E2F2E10>, <docx.text.run.Run object at 0x000001F88E2F2C88>, <docx.text.run.Run object at 0x000001F88E2F2E80>, <docx.text.run.Run object at 0x000001F88E2F2DD8>, <docx.text.run.Run object at 0x000001F88E2F2EB8>, <docx.text.run.Run object at 0x000001F88E2F2F28>, <docx.text.run.Run object at 0x000001F88E2F2F60>]
paragraph.runs 获取一个列表,得到每个文字块的实例
2.4提取文字块的内容
from docx import Document
doc = Document("0.docx")
paragraph = doc.paragraphs[1]
runs = paragraph.runs
print(runs)
for run in runs:
print(run.text)
excel与python结合的第二部分内容,
后续将会持续更新excel
,
ppt
,
爬虫
,人工智能
以上便是
word与python结合的第一部分内容,
后续将会持续更新excel,ppt,爬虫,人工智能
等相关内容,敬请关注