python之python-pptx批量处理pptx文件基础

Magician_liu

于 2024-07-14 16:03:03 发布

阅读量445

点赞数 14

文章标签： python c# 开发语言 ppt

本文链接：https://blog.csdn.net/Magician_liu/article/details/140418069

版权

python操作ppt

文章目录

python操作ppt

ppt是我们日常生活中广泛使用的office软件之一，但当我们遇到了某些繁琐重复的ppt操作时，利用python来批量化操作ppt可以大大节省我们的时间。例如修改ppt中所出现的某个关键词，或随心所欲地替换ppt某位置的数据为我们想要替换的数据。

pptx 是python的一个非标准库，需要在命令行中安装

pip install python-pptx

注意：与python-docx类似，只能操作pptx文件，而对ppt文件无效.

from pptx import Presentation
file_path='./magician.ppt'
prs = Presentation(file_path)
print(prs)

out

raise PackageNotFoundError("Package not found at '%s'" % pkg_file)
pptx.exc.PackageNotFoundError: Package not found at './magician.ppt'

1.简介

我们仔细回想一下，ppt由哪些元素组成呢?

1.首先我们要编辑ppt时，首先要打开其对应的ppt文件，其对应代码中的Presentation对象

2.进入ppt文件后，我们要选择某个幻灯片，其对应代码中的Slide对象

3.而每个幻灯片又被分为若干个块，所谓的编辑区，其对应代码中的Shape对象，其里面有文本编辑区或图片等

4.而每个文本编辑区又可以有不同的段落，其对应代码中的paragraph，而paragraph中又有run，是根据段落中不同格式的文字段分的。其与word中的paragraph类似。

如果需要了解word的批量操作，可以观看我的另一篇博客笔记:

python之python-docx批量处理docx文件基础

其构造图如下:
在这里插入图片描述
简单来说，一个PPT文件为presentation，基本的结构为展示

文件presentation-幻灯片页slide-形状shape组成，形状就需要区分开，是包含文本的形状还是不包含文本的形状(纯图片等)。

如果是包含文本的形状，则可以通过获取内部的文本框，一个文本框又可以看作是一个小的word文档，包含段落paragraph -文字块run

准备ppt文件结构如下图
在这里插入图片描述

2.打开ppt文件

from pptx import Presentation
file_path='./magician.pptx' #这里给出需要打开的文件路径,如果不填则会创建一个新的空对象
prs = Presentation(file_path)
print(prs) #获取ppt文件对象

out:

<pptx.presentation.Presentation object at 0x000001AA692D8730>

进程已结束，退出代码为 0

3. 定位获取幻灯片页

用pptx.slides可以获得一个列表，包括所有的幻灯片页

from pptx import Presentation
file_path='./magician.pptx'
prs = Presentation(file_path)
for slide in prs.slides:
    print(slide)

out

<pptx.slide.Slide object at 0x00000265EA739060>
<pptx.slide.Slide object at 0x00000265EA738F10>
<pptx.slide.Slide object at 0x00000265EA738EB0>

进程已结束，退出代码为 0

确实ppt中有三张幻灯片，确认无误！

4.获取形状

只要熟悉了类似 Excel 和 Word 的多级结构， PPT 的结构就很好理解了。每一个幻灯片页都有一个或者多个形状 shape

from pptx import Presentation
file_path='./magician.pptx'
prs = Presentation(file_path)
for slide in prs.slides:
    for shape in slide.shapes:
        print(shape)

out:

<pptx.shapes.placeholder.SlidePlaceholder object at 0x00000232BE0B8F10>
<pptx.shapes.placeholder.SlidePlaceholder object at 0x00000232BE0B8E80>
<pptx.shapes.group.GroupShape object at 0x00000232BE0B8F10>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8E80>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8F10>
<pptx.shapes.group.GroupShape object at 0x00000232BE0B8E80>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8F10>
<pptx.shapes.connector.Connector object at 0x00000232BE0B8E80>
<pptx.shapes.connector.Connector object at 0x00000232BE0B8F10>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8E80>
<pptx.shapes.group.GroupShape object at 0x00000232BE0B8F10>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8E80>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8F10>
<pptx.shapes.group.GroupShape object at 0x00000232BE0B8E80>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8F10>
<pptx.shapes.connector.Connector object at 0x00000232BE0B8E80>
<pptx.shapes.connector.Connector object at 0x00000232BE0B8F10>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8E80>
<pptx.shapes.group.GroupShape object at 0x00000232BE0B8F10>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8E80>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8F10>
<pptx.shapes.group.GroupShape object at 0x00000232BE0B8E80>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8F10>
<pptx.shapes.connector.Connector object at 0x00000232BE0B8E80>
<pptx.shapes.connector.Connector object at 0x00000232BE0B8F10>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8E80>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8DC0>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8E80>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8DC0>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8E80>
<pptx.shapes.picture.Picture object at 0x00000232BE0B8DC0>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8CD0>
<pptx.shapes.autoshape.Shape object at 0x00000232BE0B8DC0>
<pptx.shapes.placeholder.SlidePlaceholder object at 0x00000232BE0B8CD0>
<pptx.shapes.picture.Picture object at 0x00000232BE0B8DC0>
<pptx.shapes.picture.Picture object at 0x00000232BE0B8CD0>

进程已结束，退出代码为 0

5.获取文本框内容

要获取文字内容，很容易就联系到文字在形状 shape 的下级结构了从 Word 中的学习我们也可以推知，文字的承载单位是段落 paragraph 和文字块 run

很自然可以想到用下列的代码获取文字

for slide in prs.slides:
	for shape in slide.shapes:
		for paragraph in shape.text_frame.paragraphs: #text_frame存在的情况下
			print(paragraph.text)

或者

for slide in prs.slides:
    for shape in slide.shapes:
        for paragraph in shape.text_frame.paragraphs:
            for run in paragraph.runs:
                print(run.text)

6. 判断shape是否有文字

从上图可以看到，图片的shape是没有任何文字的

一个形状中有没有文字，关键就在于它有没有包含文本框

下面是与文本框有关的操作：

shape.has_text_frame 判断形状中是否有文字框

shape.text_frame 获取文字内容

在PPT中，文字框才是文字的载体，因此可以先判断shape是否包含文字has_text_fram，获取文字的

代码如下：

for slide in prs.slides:
	for shape in slide.shapes:
		if shape.has_text_frame:
			text_frame = shape.text_frame
			print(text_frame.text)

out




主讲人：***
节目效果炸裂的秘诀


主讲人：***
节目效果炸裂的秘诀


主讲人：***
节目效果炸裂的秘诀



青铜动态图表
开发工具
透视表
函数


王者动态图表

进程已结束，退出代码为 0

7 .访问段落和文字块

每一个文本框都可以看成是一个小的 Word 文件，里面有段落和文字块两级结构：

for slide in prs.slides:
	for shape in slide.shapes:
		if shape.has_text_frame:
			text_frame = shape.text_frame
			for paragraph in text_frame.paragraphs:
				for run in paragraph.runs:
					run.text = '新的文字内容'
                    #run.font.size = Pt(12) 与word类似，可以看我的另一篇word笔记

注意，要保留原始的ppt格式，需要使用run来设置文字，否则格式会被paragraph重置。

8.写ppt

创建全新 PPT 的代码可以类比创建 Word 文件的代码，实例化的过程中不给予具体路径则为创建空白文件

ppt = pptx.Presentation(pptx='xxx.pptx')
#定位
#然后赋值修改
ppt.save('存储路径.pptx')

9.实例图助理解(秒懂）

在这里插入图片描述