Python动态修改Word文档内容，保留格式样式，并批量生成PDF

最新推荐文章于 2024-07-19 07:51:17 发布

雨田Larry

最新推荐文章于 2024-07-19 07:51:17 发布

阅读量3.9k

点赞数 7

分类专栏： Python 文章标签： python 代码操作Word

本文链接：https://blog.csdn.net/leiliz/article/details/115297817

版权

Python 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

本文介绍了如何使用Python的docx库动态修改Word文档内容，保留格式样式，并通过win32com转换为PDF。首先，安装必要的库如docx和pywin32，然后编写代码替换Word模板中的特定文本，例如'AME'和'XX'，最后将处理后的Word文档批量转换为PDF格式。

摘要由CSDN通过智能技术生成

前言

假如你有一个Word模版文档，要在里面填写人员信息，但人员有成百上千个，手动填起来太浪费时间，还容易弄错，刚好你又会写Python，请看下文

一、需要安装的库

操作word的库 docx
pip install docx
转pdf的库 win32com，在python中是安装pywin32
pip install pywin32

二、核心逻辑-替换

（1）获取需要填入的数据，大部分情况是Excel(用Pandas读取方便)或JSON
（2）在Word中需要填写的位置填上唯一标识的字符串（尽量短，比如我之前用NAME，结果被拆分成了N和AME），用代码打开Word，找到这个唯一标识的字符串，和原数据进行替换操作，重新保存即可
（3）转为PDF就很简单了

替换Word内容代码如下：

from docx import Document
import pandas as pd
import json

def replaceText(wb, t, value):
    for x in wb.paragraphs:
        if t in x.text:  # t 尽量短，一个最好，不然这里可能会被拆分 如果替换失败 DEBUG这里查看x.text
            inline = x.runs  # t 修改runs中的字符串 可以保留格式
            for i in range(len(inline)):
                if t in inline[i].text:
                    text = inline[i].text.replace(t, str(value))
                    inline[i].text = text

    for table in wb.tables:  # 遍历文档中的所有表格
        for row in table.rows:  # 遍历表格中的所有行
            for cell in row.cells:  # 遍历行中的所有单元格
                if t in cell.text:
                    for paragraph in cell.paragraphs:
                        if t in paragraph.text:
                            inline = paragraph.runs
                            for i in range(len(inline)):
                                if t in inline[i].text:
                                    text = inline[i].text.replace(t, str(value))
                                    inline[i].text = text

#  word表格居中：在字符串前面拼空格 这里的11是表格不换行的情况下最长可输入的字符数
def getCenterText(text):
    text = text.replace(' ', '')
    for i in range(11 - len(text)):
        text = " " + text
    return text

# 程序入口
if __name__ == '__main__':
    # loan_data = pd.read_excel(r"C:\Users\Administrator\Desktop\排名\汇总.xlsx",
    #                           sheet_name="Sheet1", header=0, names=None, index_col=0)
    # jsonstr = loan_data.to_json(orient='records', force_ascii=False)

    loan_data = [
        {"AME": "张三", "XX": "优秀"},
        {"AME": "李四", "XX": "良好"}
    ]

    for j in loan_data:
        wb = Document(r"C:\Users\Administrator\Desktop\排名\模版.docx")
        replaceText(wb, 'AME', j.get('AME'))  # 把Word中的AME替换成张三、李四
        replaceText(wb, 'XX', getCenterText(j.get('XX')))  # 如果是表格数据要居中
        wb.save(r"C:\Users\Administrator\Desktop\排名\结果(%s).docx" % j.get('AME'))
        print(j.get('AME'))
    print("完成")

转为PDF代码如下：

from win32com.client import Dispatch
from os import walk

wdFormatPDF = 17
def doc2pdf(input_file):
    word = Dispatch('Word.Application')
    doc = word.Documents.Open(input_file)
    doc.SaveAs(input_file.replace(".docx", ".pdf"), FileFormat=wdFormatPDF)
    doc.Close()
    word.Quit()

# 程序入口
if __name__ == '__main__':
    # 把此文件夹下所有的Word文档转为PDF
    directory = "C:\\Users\\Administrator\\Desktop\\排名"
    for root, dirs, filenames in walk(directory):
        for file in filenames:
            print(file)
            if file.endswith(".doc") or file.endswith(".docx"):
                doc2pdf(str(root + "\\" + file))
    print("全部完成")