1. 安装模块
Windows用户打开命令行输入:pip install python-pptx
Mac用户打开终端/Terminal输入:pip3 install python-pptx
使用windows系统,如果出现无法安装情况,可以在cmd模式下输入网址选择国内清华镜像。
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple python-pptx
导入模块:import pptx
2. 读取PPT文档内容
先了解下PPT基本结构在python分别是什么含义:
Slide:幻灯片,就是演示文稿中每一页的页面。
Shape:方框,在每页幻灯片内插入的方框,可以是形状,也可以是文本框。
Run:文字块,一般为较少字符。
Paragraph:段落,通常有序号ㆍ、1.等。
2.1 幻灯片 slide
1)获取slide
.slides-> 得到一个列表,包含了每个slide
from pptx import Presentation
prs = Presentation('示例文件.pptx')
for slide in prs.slides:
print(slide)
输出结果:
<pptx.slide.Slide object at 0x0000000003737318>
<pptx.slide.Slide object at 0x0000000003737228>
<pptx.slide.Slide object at 0x0000000003737818>
<pptx.slide.Slide object at 0x0000000003737408>
<pptx.slide.Slide object at 0x00000000037377C8>
<pptx.slide.Slide object at 0x00000000037376D8>
<pptx.slide.Slide object at 0x0000000003737F98>
<pptx.slide.Slide object at 0x00000000037372C8>
<pptx.slide.Slide object at 0x00000000037373B8>
…
2.2 形状 shape
1) 获取形状 shape
from pptx import Presentation
prs = Presentation('示例文件.pptx')
for slide in prs.slides:
for shape in slide.shapes:
print(shape)
输出结果:
<pptx.shapes.autoshape.Shape object at 0x000000000379C390>
<pptx.shapes.picture.Picture object at 0x000000000379C4E0>
<pptx.shapes.picture.Picture object at 0x000000000379C0F0>
<pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000379C080>
<pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000379C400>
<pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000379C390>
<pptx.shapes.graphfrm.GraphicFrame object at 0x000000000379C080>
<pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000379C400>…
2) 输出shape中的文字
shape.has_text_frame->是否有文字
shape.text_frame->获取文字框
# 获取所有Slide中的内容
from pptx import Presentation
prs = Presentation('示例文件.pptx')
for slide in prs.slides:
for shape in slide.shapes:
if shape.has_text_frame:
text_frame = shape.text_frame
print(text_frame.text)
# 获取某一页Slide中的内容
from pptx import Presentation
prs = Presentation('示例文件.pptx')
for i, slide in enumerate(prs.slides):
if i == 3:
for shape in slide.shapes:
if shape.has_text_frame:
text_frame = shape.text_frame
print(text_frame.text)
2.3 段落 paragraph
1)输出shape中的某个paragraph
从shape中找paragraphs-> 获取shpae中的段落
for paragragh in text_frame.paragraphs:
print(paragragh.text)
from pptx import Presentation
prs = Presentation('示例文件.pptx')
for slide in prs.slides:
for shape in slide.shapes:
if shape.has_text_frame:
text_frame = shape.text_frame
for paragragh in text_frame.paragraphs:
print(paragragh.text)
"""
注意:
该方法同样也直接获取Shpae中的文字内容;
但是这个更灵活,先获取每个Shape,然后在获取每个Shape中的paragraph;
我们可以针对paragraph,写一个判断条件,只获取第几个paragraph;
""
3. 向PPT文档写入内容
幻灯片母版及占位符
Slides_layouts:版式,一个幻灯片母版由多个版式组成,索引从0开始。
Placeholder:占位符:存在PPT母版里面的幻灯片的某一部件:Placeholder
3.1 添加slide和内容
1)选择PPT模板
a、使用ppt自带的模板
prs= Presentation()
prs.slide_layouts[index]
ppt自带了常用的1-48种模板通过index选择对应的模板
b、使用自定义ppt模板
prs= Presentation('template.pptx')
2)确认占位符id
prs.slides_layouts[0] # 获取第一套母版的第一个版式
from pptx import Presentation
prs = Presentation("示例文件2.pptx")
slide = prs.slides.add_slide(prs.slide_layouts[0]) # 用第一个母版生成一页ppt
for shape in slide.placeholders: # 获取这一页所有的占位符
phf = shape.placeholder_format
print(f'{phf.idx}--{shape.name}--{phf.type}') # id号--占位符形状名称-占位符的类型
输出结果:
0–Title 1–TITLE (1)