使用python-docx实现对word文档里的字符串、图片批量替换

最新推荐文章于 2025-04-23 10:19:32 发布

GIS开发者

最新推荐文章于 2025-04-23 10:19:32 发布

阅读量7.2k

点赞数 11

分类专栏： Python 文章标签： python docx word 文档替换

本文链接：https://blog.csdn.net/GISuuser/article/details/125619199

版权

Python 专栏收录该内容

30 篇文章

订阅专栏

python-docx 是用于创建和更新Microsoft Word（.docx）文件的Python库。它的API文档也非常简单，看完之后，只能简单理解一些基础用法。最近碰到一个需求，需要对word模版里的内容进行统一替换，替换内容比较多。从网上查到了很多种基于python-docx 的做法，但都有一定的缺陷，不能适用于各种场景。

网上的做法整体是两种：

对paragraph中的文字进行替换。但是这有一个问题，原来整段的文字格式都会丢失。
遍历paragraph.runs中，对其中的每一段文字进行判断和替换。有个漏洞，就是需要被替换的字符串可能会被拆分到多个run中，导致匹配不到

相对比较好的解决办法：

对runs中的内容进行一定程度的拼接，但是有缺点，部分文字的样式可能会消失，可以尽量让每一段文字的样式保持一致来避免这种情况。

安装方法

pip install python-docx

文档中文字替换

# coding=utf-8
from docx import Document
from docx.shared import Cm


def replace_word(doc, tag, pv):
    # replace in paragraph

    for paragraph in doc.paragraphs:
        if tag not in paragraph.text:
            continue
        tmp = ''
        runs = paragraph.runs
        for i, run in enumerate(runs):
            tmp += run.text  # 合并run字符串
            if tag in tmp:
                # 如果存在匹配得字符串，那么将当前得run替换成合并后得字符串
                run.text = run.text.replace(run.text, tmp)
                run.text = run.text.replace(tag, pv)
                tmp = ''
            else:
                # 如果没匹配到目标字符串则把当前run置空
                run.text = run.text.replace(run.text, '')
            if i == len(runs) - 1:
                # 如果是当前段落一直没有符合规则得字符串直接将当前run替换为tmp
                run.add_text(tmp)


if __name__ == '__main__':
    document = Document(r"E:\测试数据\模版.docx")
    replace_word(document, "#{cropName}", "冬小麦")

表格中文字替换

如果需要替换表格中文字，则和上面的方法不一样：

# coding=utf-8
from docx import Document
from docx.shared import Cm


def replace_word(doc, tag, pv):
    for table in doc.tables:
        for row in table.rows:
            for cell in row.cells:
                # 如果只是为了内容，直接替换cell.text,但是为了保存原有格式，需要将每个单元格的文本当作一段看待，以此提取出run来不修改原格式

                for paragraph in cell.paragraphs:
                    if tag in paragraph.text:
                        has_replaced = False
                        for run in paragraph.runs:
                            run.clear()
                            if not has_replaced:
                                run.add_text(pv)
                                has_replaced = True


if __name__ == '__main__':
    document = Document(r"E:\测试数据\模版.docx")
    replace_word(document, "#{cropName}", "冬小麦")

文档中图片替换

# coding=utf-8
from docx import Document
from docx.shared import Cm


def replace_picture(doc, tag, pic, width=14.63):
    for paragraph in doc.paragraphs:
        if tag in paragraph.text:
            has_replaced = False
            for run in paragraph.runs:
                run.clear()
                if not has_replaced:
                    run.add_picture(pic, width=Cm(width))
                    has_replaced = True


if __name__ == '__main__':
    document = Document(r"E:\测试数据\01遥感验标数据及模板\01遥感验标数据及模板\模版.docx")
    replace_picture(document, "#{qztp}1", ur"E:\测试数据\1.jpg")