Python批量替换Excel和Word中的关键字

PythonFun

已于 2023-08-16 17:18:35 修改

阅读量2k

点赞数

分类专栏： python小项目 python 基础操作文章标签： python excel word

于 2023-08-16 14:27:26 首次发布

本文链接：https://blog.csdn.net/henanlion/article/details/132318727

版权

python 同时被 3 个专栏收录

117 篇文章

订阅专栏

基础操作

84 篇文章

订阅专栏

python小项目

51 篇文章

订阅专栏

本文介绍了一种使用Python编程解决在多个Excel和Word文件中批量替换关键字的问题，同时保持原格式并删除源文件的方法。文章详细描述了遍历文件、读取替换表、处理docx和xlsx文件的步骤以及注意事项。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、问题的提出

有时，我们手头上有多个Excel或者Word文件，但是领导突然要求对某几个术语进行批量的修改，你是不是有要崩溃的感觉。因为这么多文件，要一个一个地打开文件，再进行批量替换修改，几个文件还好，如果是成百上千的文件，我想你一会儿就感觉自己被搞晕了，不仅搞不清修改了没有修改完，而且已经修改的也不知道修改的彻底不。

于是，问题来了，当我需要对多个Excel和Word文件中的关键字进行替换，而且不改变原文件的格式，同时删除源文件，我们该怎么办？这些office文件可能分布在不同的文件夹下，所以替换后还要存放在原来的文件夹。同时，我们编写的程序还要在Windows和MacOS环境下都可以使用。

二、算法分析

由于要在多个环境下使用，我们放弃VBA，考虑采用Python编程的方法来解决。

1. 第一步 读取一个替换关键字的"批量替换表.xlsx"生成一个字典，这样是为了后面可以批量替换。第二步遍历当前目录下所有目录包括上当的文件，主要是docx和xlsx文件，如果是doc和xls文件，还要考虑两这两种格式的文件进行批量的转化，见下面的文章。

批量转doc和xls为docx和xlsx文件

2. 第二步是遍历当前所有目录中的文件，用if条件，根据文件扩展名的不同来筛选出docx和xlsx文件。代码如下：

    for root, filefolder, files in os.walk(os.curdir):
        for file in files:
            if file.endswith("docx"):
                file_path = os.path.join(root, file)
                for key, value in dic.items():
                    word_replace_keywords(file_path, key, value)
            elif file.endswith("xlsx") and os.path.basename(file)!="批量替换表.xlsx":
                file_path = os.path.join(root, file)
                for key, value in dic.items():
                    excel_replace_keywords(file_path, key, value)

3. 第三步是对于docx和xlsx文件分别进行替换处理，主要采用了python-docx和openpyxls这两个模块来进行替换。针对docx文件，我们用Document()来读取，用以下代码来替换：

def info_update(doc, old, new):
    for para in doc.paragraphs:
        for run in para.runs:
            if old in run.text:
                run.text = run.text.replace(old, new)

对于xlsx文件我，我们通过下面的代码实现关键字替换，同时不改变原来关键字的格式。

def replace_cell_text_with_format(cell, keyword, replacement):
    paragraphs = cell.paragraphs
    for paragraph in paragraphs:
        for run in paragraph.runs:
            if keyword in run.text:
                new_text = run.text.replace(keyword, replacement)
                run.clear()  # 清除当前文本
                new_run = run._element  # 创建新的run
                new_run.text = new_text  # 设置新文本
                for key in run._r.attrib.keys():  # 复制格式属性
                    if key != 't':
                        new_run.attrib[key] = run._r.attrib[key]

4. 第四步 我们要保存替换后的文件，同时用os.remove()删除原来的文件。

三、代码展示

最终，我们编制出70多行的代码，一键实现了多文件、多关键字、保存源格式，又能在Windows和苹果电脑环境使用的程序。代码如下：

import os
from docx import Document
from openpyxl import load_workbook

def info_update(doc, old, new):
    for para in doc.paragraphs:
        for run in para.runs:
            if old in run.text:
                run.text = run.text.replace(old, new)
                
def replace_cell_text_with_format(cell, keyword, replacement):
    paragraphs = cell.paragraphs
    for paragraph in paragraphs:
        for run in paragraph.runs:
            if keyword in run.text:
                new_text = run.text.replace(keyword, replacement)
                run.clear()  # 清除当前文本
                new_run = run._element  # 创建新的run
                new_run.text = new_text  # 设置新文本
                for key in run._r.attrib.keys():  # 复制格式属性
                    if key != 't':
                        new_run.attrib[key] = run._r.attrib[key]
def get_dic():
    workbook = load_workbook('批量替换表.xlsx')
    sht = workbook.active
    dic = {}
    for c1,c2 in zip(sht["A"],sht["B"]):
        if c1.value!= None and c2.value!= None:
            dic[c1.value] = c2.value
    return dic

def word_replace_keywords(file_path, keyword, replacement):
    doc = Document(file_path)
    info_update(doc, keyword, replacement)
    try: 
        for table in doc.tables:
            if not any(cell.text for row in table.rows for cell in row.cells):
                continue  
            for row in table.rows:
                for cell in row.cells:
                    if keyword in cell.text:
                        replace_cell_text_with_format(cell, keyword, replacement)
    except Exception as e:
        print("Error processing table:", e)
            
    doc.save(file_path)

def excel_replace_keywords(file_path, keyword, replacement):
    wb = load_workbook(file_path)
    for sheet_name in wb.sheetnames:
        sheet = wb[sheet_name]
        for row in sheet.iter_rows():
            for cell in row:
                if cell.value and keyword in str(cell.value):
                    cell.value = str(cell.value).replace(keyword, replacement)
    wb.save(file_path)
    wb.close()
    
def get_replaced(dic):    
    for root, filefolder, files in os.walk(os.curdir):
        for file in files:
            if file.endswith("docx"):
                file_path = os.path.join(root, file)
                for key, value in dic.items():
                    word_replace_keywords(file_path, key, value)
            elif file.endswith("xlsx") and os.path.basename(file)!="批量替换表.xlsx":
                file_path = os.path.join(root, file)
                for key, value in dic.items():
                    excel_replace_keywords(file_path, key, value)
def main():
    dic = get_dic()
    get_replaced(dic)
if __name__ == "__main__":
    main()

以上代码的优势在于：速度快，设置好关键字后一键替换，可以在多个环境下使用，相比VBA代码，Python代码的执行速度更快、操作更简单、省时省力。