Python自动化办公 - 对PPT的操作（Python-pptx的基本使用）

牧文山

已于 2022-03-09 18:47:42 修改

阅读量2w

点赞数 20

分类专栏： Python自动化办公文章标签： python ppt

于 2020-08-25 22:56:34 首次发布

本文链接：https://blog.csdn.net/weixin_42750611/article/details/108029796

版权

1. 安装模块

Windows用户打开命令行输入：pip install python-pptx

Mac用户打开终端/Terminal输入：pip3 install python-pptx

使用windows系统，如果出现无法安装情况，可以在cmd模式下输入网址选择国内清华镜像。

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple python-pptx

导入模块：import pptx

2. 读取PPT文档内容

在这里插入图片描述

先了解下PPT基本结构在python分别是什么含义：

Slide：幻灯片，就是演示文稿中每一页的页面。

Shape：方框，在每页幻灯片内插入的方框，可以是形状，也可以是文本框。

Run：文字块，一般为较少字符。

Paragraph：段落，通常有序号ㆍ、1.等。

2.1 幻灯片 slide

1）获取slide

.slides-> 得到一个列表，包含了每个slide

from pptx import Presentation

prs = Presentation('示例文件.pptx')
for slide in prs.slides:
    print(slide)

输出结果：

<pptx.slide.Slide object at 0x0000000003737318>
<pptx.slide.Slide object at 0x0000000003737228>
<pptx.slide.Slide object at 0x0000000003737818>
<pptx.slide.Slide object at 0x0000000003737408>
<pptx.slide.Slide object at 0x00000000037377C8>
<pptx.slide.Slide object at 0x00000000037376D8>
<pptx.slide.Slide object at 0x0000000003737F98>
<pptx.slide.Slide object at 0x00000000037372C8>
<pptx.slide.Slide object at 0x00000000037373B8>
…

2.2 形状 shape

1) 获取形状 shape

from pptx import Presentation

prs = Presentation('示例文件.pptx')
for slide in prs.slides:
    for shape in slide.shapes:
        print(shape)

输出结果：

<pptx.shapes.autoshape.Shape object at 0x000000000379C390>
<pptx.shapes.picture.Picture object at 0x000000000379C4E0>
<pptx.shapes.picture.Picture object at 0x000000000379C0F0>
<pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000379C080>
<pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000379C400>
<pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000379C390>
<pptx.shapes.graphfrm.GraphicFrame object at 0x000000000379C080>
<pptx.shapes.placeholder.SlidePlaceholder object at 0x000000000379C400>

…

2) 输出shape中的文字

shape.has_text_frame->是否有文字
shape.text_frame->获取文字框

# 获取所有Slide中的内容
from pptx import Presentation

prs = Presentation('示例文件.pptx')
for slide in prs.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            text_frame = shape.text_frame
            print(text_frame.text)

# 获取某一页Slide中的内容
from pptx import Presentation

prs = Presentation('示例文件.pptx')
for i, slide in enumerate(prs.slides):
    if i == 3:
        for shape in slide.shapes:
            if shape.has_text_frame:
                text_frame = shape.text_frame
                print(text_frame.text)

2.3 段落 paragraph

1）输出shape中的某个paragraph

从shape中找paragraphs-> 获取shpae中的段落
for paragragh in text_frame.paragraphs:
print(paragragh.text)

from pptx import Presentation

prs = Presentation('示例文件.pptx')
for slide in prs.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            text_frame = shape.text_frame
            for paragragh in text_frame.paragraphs:
                print(paragragh.text)
                
"""
注意：
该方法同样也直接获取Shpae中的文字内容；
但是这个更灵活，先获取每个Shape，然后在获取每个Shape中的paragraph；
我们可以针对paragraph，写一个判断条件，只获取第几个paragraph；
""

3. 向PPT文档写入内容

幻灯片母版及占位符

在这里插入图片描述

Slides_layouts：版式，一个幻灯片母版由多个版式组成，索引从0开始。

Placeholder：占位符：存在PPT母版里面的幻灯片的某一部件：Placeholder

3.1 添加slide和内容

1）选择PPT模板

a、使用ppt自带的模板

prs= Presentation()
prs.slide_layouts[index]

ppt自带了常用的1-48种模板通过index选择对应的模板

b、使用自定义ppt模板

prs= Presentation('template.pptx')

2）确认占位符id

prs.slides_layouts[0] # 获取第一套母版的第一个版式

from pptx import Presentation

prs = Presentation("示例文件2.pptx")
slide = prs.slides.add_slide(prs.slide_layouts[0])  # 用第一个母版生成一页ppt
for shape in slide.placeholders:         # 获取这一页所有的占位符
    phf = shape.placeholder_format
    print(f'{phf.idx}--{shape.name}--{phf.type}')  # id号--占位符形状名称-占位符的类型