Python openpyxl 删除excel有删除线的文字

最新推荐文章于 2024-01-12 09:29:23 发布

天地之心online

最新推荐文章于 2024-01-12 09:29:23 发布

阅读量3.2k

点赞数 4

文章标签： python 开发语言后端

本文链接：https://blog.csdn.net/qq_17246289/article/details/122552551

版权

删除线分为两种, 一是单元格有strike属性, 另一种是单元格无strike属性, 单元格内的部分文本带有strike属性。

单元格的strike判断可以使用如下示例方法：

import openpyxl

book = openpyxl.load_workbook("test.xlsx")

sheet = book.worksheets[0]

for row in sheet.rows:
    for cell in row:
        if cell.font.strike:
            pass

当单元格无strike属性时，由于openpyxl不支持单元格更底层的元素解析，就只能自己写excel文档的解析方法，来获取带有删除线的内容。

xlsx后缀的Excel文档是由一组xml文档压缩而成，将xlsx文档后缀改为zip，可以解压出一组xml文件。在该目录下的./xl/sharedStrings.xml中，可以通过搜索strike属性来查看带有删除线的内容格式。为了自动处理，使用正则表达式对该xml文件进行匹配，并将匹配到的删除线内容替换为空，即可实现该功能。示例代码如下：

import os
import re
import openpyxl
import zipfile


def excel_strike_content_delete(path, final_file):
    extract_path = os.path.join(os.path.split(path)[0], "extract_path")
    if os.path.exists(extract_path):
        shutil.rmtree(extract_path)
        os.mkdir(extract_path)
    else:
        os.mkdir(extract_path)
    f = zipfile.ZipFile(path)
    for file in f.namelist():
        f.extract(file, extract_path)
    f.close()
    xml = extract_path + '/xl/sharedStrings.xml'
    with open(xml, 'rt', encoding="utf-8") as f:
        content = f.readlines()
        content = "".join([line for line in content])

    def replace(matched):
        group0 = matched.group(0)
        group1 = matched.group(1)
        group2 = matched.group(2)
        # print(group0)
        res = group0.replace(group2, "<t></t>").replace(group1, "")
        # print("res: ", res)
        return res

    pattern = r"<rPr>(<strike/>).*?</rPr>(<t>(?:.*?)</t>)"
    # print(type(content))
    line = re.sub("</si><si>", "</si>myLabel<si>", content, flags=re.S)
    # print(type(line))
    ls = line.split("myLabel")
    for i, line in enumerate(ls):
        # print("  ", line)
        line = re.sub(pattern, replace, line, flags=re.S)
        ls[i] = line
        # print(line)

    with open(xml, 'w+', encoding='utf-8') as f:
        f.write("".join(ls))

    f = zipfile.ZipFile(final_file, mode='w')
    for path, dirnames, filenames in os.walk(extract_path):
        # 去掉目标跟路径，只对目标文件夹下边的文件及文件夹进行压缩
        fpath = path.replace(extract_path, '')
        for filename in filenames:
            f.write(os.path.join(path, filename), os.path.join(fpath, filename))
    f.close()
    return os.path.abspath(final_file)

这样就将excel中的删除线部分删除掉了。

天地之心online

关注

4
点赞
踩
13

收藏

觉得还不错? 一键收藏
打赏
3
评论
Python openpyxl 删除excel有删除线的文字

删除线分为两种, 一是单元格有strike属性, 另一种是单元格无strike属性, 单元格内的部分文本带有strike属性。单元格的strike判断可以使用如下实例方法：import openpyxlbook = openpyxl.load_workbook("test.xlsx")sheet = book.worksheets[0]for row in sheet.rows: for cell in row: if cell.font.strike:
复制链接

扫一扫