读取包含合并单元格的Excel表格

最新推荐文章于 2024-07-23 19:30:06 发布

studyeboy

最新推荐文章于 2024-07-23 19:30:06 发布

阅读量566

点赞数 9

分类专栏： Python库工具文章标签： excel pandas openpyxl

本文链接：https://blog.csdn.net/studyeboy/article/details/135970299

版权

Python库同时被 2 个专栏收录

56 篇文章 4 订阅

订阅专栏

工具

36 篇文章 0 订阅

订阅专栏

该文章介绍了如何使用Python的pandas库和openpyxl引擎读取.xlsx文件，并对含有合并单元格的数据进行拆分，以便于进一步的数据处理，最后将拆分后的数据保存为新的Excel文件。

摘要由CSDN通过智能技术生成

待读取的.xlsx类型的表格中有合并单元格，为了对表格进行数据处理，需要将合并单元格进行拆分处理，所以进行如下处理：

def read_xlsx(file='./intent_data.xlsx', sheet_name=None, header=0):
    """读取 xlsx 格式文件。"""
    excel = pd.ExcelFile(load_workbook(file), engine="openpyxl")
    sheet_name = sheet_name or excel.sheet_names[0]
    sheet = excel.book[sheet_name]
    df = excel.parse(sheet_name, header=header)

    for item in sheet.merged_cells:
        top_col, top_row, bottom_col, bottom_row = item.bounds
        base_value = item.start_cell.value
        # 1-based index转为0-based index
        top_row -= 1
        top_col -= 1
        # 由于前面的几行被设为了header，所以这里要对坐标进行调整
        if header is not None:
            top_row -= header + 1
            bottom_row -= header + 1
        df.iloc[top_row:bottom_row, top_col:bottom_col] = base_value
    # 设置 pandas 输出选项以显示所有列
    pd.set_option('display.max_columns', None)

    # 设置 pandas 输出选项以显示所有行
    pd.set_option('display.max_rows', None)

    # 显示完整的 DataFrame
    # print(df)

    # 将 DataFrame 写入 Excel 文件
    df.to_excel(file.replace('.xlsx', '_split.xlsx'), index=False, engine="openpyxl")  # 如果您不希望保存索引，请将 index 参数设置为 False