【Python】Python操作Word和PowerPoint文件

最新推荐文章于 2025-05-28 20:48:34 发布

宅男很神经

最新推荐文章于 2025-05-28 20:48:34 发布

阅读量308

点赞数 4

文章标签： python

本文链接：https://blog.csdn.net/yangjia96/article/details/148215994

版权

Python 操作 Word 和 PowerPoint 文件深度剖析

1. Microsoft Office Open XML 格式 (.docx, .pptx) 概览

在深入 Python 库之前，理解 Word (.docx) 和 PowerPoint (.pptx) 文件格式的本质非常重要。与旧的二进制 .doc 和 .ppt 格式不同，.docx 和 .pptx 文件是基于 Microsoft Open XML 标准的。

Open XML 标准： 这是一种开放的、基于 XML 的文件格式标准，用于电子文档（WordprocessingML）、演示文稿（PresentationML）和电子表格（SpreadsheetML）。
ZIP 压缩包： .docx 和 .pptx 文件实际上是一个标准的 ZIP 压缩包。
内部结构： 解压这些 ZIP 文件，你会看到一个目录结构，其中包含了大量的 XML 文件、媒体文件（如图片）、以及用于定义各部分之间关系的 .rels 文件。
- .docx 文件中的核心部分通常在 word/document.xml 中，定义了文档的主要内容（段落、文本、表格等）。word/styles.xml 定义了样式，word/media/ 存放图片，word/_rels/document.xml.rels 定义了 document.xml 与其他部分（如图片、页眉页脚、注释）的关系。
- .pptx 文件中的核心部分在 ppt/presentation.xml 中，定义了演示文稿的结构（幻灯片列表、母版、布局等）。每张幻灯片的内容在 ppt/slides/slideX.xml 中，ppt/media/ 存放图片，ppt/themes/themeX.xml 定义主题，ppt/slideLayouts/slideLayoutX.xml 定义幻灯片布局，ppt/_rels/presentation.xml.rels 和 ppt/slides/_rels/slideX.xml.rels 定义了各种关系。

Python 中处理这些格式的库（python-docx, python-pptx）的工作原理，就是解析和生成这些复杂的 XML 文件结构，并将它们封装成更易于操作的 Python 对象模型。

2. Python 操作 Word 文件：python-docx

python-docx 是一个用于创建和更新 Microsoft Word .docx 文件的 Python 库。它不能读取 .doc 文件，也不能在现有段落中间插入文本（只能在段落末尾或创建新段落）。它的强大之处在于能够操作段落、Run（具有相同格式的文本块）、表格、图片、样式、页眉页脚、脚注尾注等。

安装 python-docx：

pip install python-docx

2.1 读取 .docx 文件：基础

读取 Word 文件通常涉及以下步骤：

打开文档 (Document)。
访问文档中的内容（段落、表格）。
遍历段落或表格中的 Run 和 Cell 来读取文本。

代码示例：基本读取文档内容

首先，我们需要一个 .docx 文件来读取。我们可以手动创建一个简单的文件，或者使用 python-docx 写入一个（后续会介绍写入）。假设我们有一个名为 example_read.docx 的文件，其内容大致如下：

# Header 1 Style
这是一个普通段落。

这是包含 **粗体** 和 *斜体* 文本的另一个段落。

表格示例:
| 列1 | 列2 |
|-----|-----|
| 数据A | 数据B |
| 数据C | 数据D |

import docx
import os

# 假设文件 example_read.docx 已经存在
file_to_read = "example_read.docx"

# 为了示例的完整性，我们先用 python-docx 创建一个简单的文件
# 实际使用时，你可能直接读取已有的文件
def create_sample_docx(filename):
    # 创建一个新的文档对象
    document = docx.Document()

    # 添加一个标题段落，使用内置的 'Title' 样式
    document.add_paragraph("示例 Word 文档", style='Title')
    # 添加一个段落
    document.add_paragraph("这是一个普通段落。")

    # 添加另一个段落，包含不同格式的文本 Run
    paragraph_with_runs = document.add_paragraph("这是包含 ")
    # 在段落中添加一个 Run 对象
    run_bold = paragraph_with_runs.add_run("粗体")
    # 设置 Run 的粗体格式
    run_bold.bold = True
    # 继续在同一个段落中添加文本
    paragraph_with_runs.add_run(" 和 ")
    run_italic = paragraph_with_runs.add_run("斜体")
    # 设置 Run 的斜体格式
    run_italic.italic = True
    paragraph_with_runs.add_run(" 文本的另一个段落。")

    # 添加一个段落作为表格前的说明
    document.add_paragraph("表格示例:")

    # 添加一个表格 (行数，列数)
    table = document.add_table(rows=2, cols=2)
    # 应用一个内置的表格样式 (如果可用)
    table.style = 'Table Grid' # 使用 'Table Grid' 样式

    # 填充表格的头部行
    cell_0_0 = table.cell(0, 0) # 访问第0行第0列的单元格
    cell_0_0.text = "列1"      # 设置单元格文本
    table.cell(0, 1).text = "列2" # 访问第0行第1列，设置文本

    # 填充表格的数据行
    table.cell(1, 0).text = "数据A"
    table.cell(1, 1).text = "数据B"
    # 添加更多行
    table.add_row().cells[0].text = "数据C" # 添加一行并访问第一个单元格
    table.rows[2].cells[1].text = "数据D"   # 访问刚添加行的第二个单元格

    # 保存文档
    document.save(filename)
    print(f"创建了示例 Word 文件用于读取: {
     filename}")

# 确保示例文件存在
create_sample_docx(file_to_read)

print(f"\n正在读取文件内容: {
     file_to_read}")

try:
    # 打开现有的 .docx 文档
    document = docx.Document(file_to_read)
    print("成功打开文档。")

    # 访问文档中的所有段落
    print("\n--- 文档中的段落 ---")
    # document.paragraphs 是一个包含所有 Paragraph 对象的列表
    for paragraph in document.paragraphs:
        # paragraph.text 获取整个段落的文本内容 (不包含格式信息)
        print(f"段落文本: {
     paragraph.text}")
        # paragraph.style 访问段落应用的样式对象
        # paragraph.style.name 获取样式名称
        print(f"段落样式: {
     paragraph.style.name}")
        print("--- 段落中的 Runs ---")
        # 段落由一个或多个 Run 组成，每个 Run 具有相同的格式
        # paragraph.runs 是一个包含所有 Run 对象的列表
        for run in paragraph.runs:
            # run.text 获取 Run 的文本内容
            print(f"  Run 文本: '{
     run.text}'")
            # run.bold 检查 Run 是否为粗体
            print(f"  Run 粗体: {
     run.bold}")
            # run.italic 检查 Run 是否为斜体
            print(f"  Run 斜体: {
     run.italic}")
            # run.underline 检查 Run 的下划线样式
            print(f"  Run 下划线: {
     run.underline}")
            # run.font 访问 Run 的字体对象
            # if run.font and run.font.color:
            #      print(f"  Run 字体颜色: {run.font.color.rgb}") # 字体颜色需要更深入访问

    # 访问文档中的所有表格
    print("\n--- 文档中的表格 ---")
    # document.tables 是一个包含所有 Table 对象的列表
    for table_index, table in enumerate(document.tables):
        print(f"找到表格 {
     table_index + 1}")
        # table.style 访问表格应用的样式对象
        # if table.style:
        #      print(f"  表格样式: {table.style.name}")

        # 遍历表格中的行
        # table.rows 是一个包含所有 Row 对象的列表
        for row_index, row in enumerate(table.rows):
            row_cells_text = [] # 用于存储当前行所有单元格的文本
            # 遍历行中的单元格
            # row.cells 是一个包含所有 Cell 对象的列表
            for cell_index, cell in enumerate(row.cells):
                # cell.text 获取单元格的文本内容
                row_cells_text.append(cell.text)
                # 你也可以进一步访问 cell.paragraphs 和 cell.tables (单元格可以包含段落和嵌套表格)
            print(f"  表格行 {
     row_index + 1}: {
     row_cells_text}")

except FileNotFoundError:
    print(f"错误: 文件 '{
     file_to_read}' 未找到。")
except Exception as e:
    print(f"读取 Word 文件时发生错误: {
     e}", exc_info=True)
finally:
    # 清理创建的示例文件
    if os.path.exists(file_to_read):
        # os.remove(file_to_read)
        print(f"\n注意: 示例文件 '{
     file_to_read}' 已保留，如不需要请手动删除。")

这个基本读取示例展示了：

使用 docx.Document(filepath) 打开 .docx 文件。
通过 document.paragraphs 访问文档中的所有段落。
Paragraph 对象的 .text 属性获取段落的全部文本。
Paragraph 对象的 .runs 属性获取段落内的 Run 对象列表，每个 Run 是一段具有相同格式的文本。
Run 对象的 .text, .bold, .italic, .underline 等属性获取 Run 的文本和基本格式。
通过 document.tables 访问文档中的所有表格。
Table 对象的 .rows 属性获取行列表，Row 对象的 .cells 属性获取单元格列表。
Cell 对象的 .text 属性获取单元格的文本。

重要提示： python-docx 在读取时，段落和表格是主要的顶级内容块。文档中其他内容（如图片、形状）不在 document.paragraphs 或 document.tables 的直接迭代范围内，需要更高级的方法来访问（通常是通过遍历段落中的 Run 或检查表格单元格中的内容，或者访问文档的 Drawing 结构）。

2.2 写入 .docx 文件：基础

写入 Word 文件通常涉及以下步骤：

创建一个新的文档对象 (Document) 或打开一个现有文档作为模板。
向文档添加内容（段落、表格）。
向段落添加文本，并按需设置格式。
保存文档到文件。

代码示例：基本写入文档内容

import docx
import os
import datetime

# 定义要写入的文件名
file_to_write = "example_write.docx"

print(f"\n正在写入文件内容到: {
     file_to_write}")

try:
    # 创建一个新的空白文档
    document = docx.Document()
    print("创建了新的文档对象。")

    # --- 添加段落 ---

    # 添加一个普通段落
    document.add_paragraph("这是写入的第一个段落。")
    print("添加了第一个段落。")

    # 添加一个带有内置样式的段落
    # 可以在这里指定 Excel 的标题样式 'Title' 或其他内置样式
    document.add_paragraph("这是一个带有标题样式的段落。", style='Heading 1')
    print("添加了一个带有 'Heading 1' 样式的段落。")

    # 添加一个包含多 Run 的段落并设置格式
    p = document.add_paragraph("这个段落包含 ") # 添加段落的起始文本
    print("开始添加包含多 Run 的段落。")
    # 向段落添加一个 Run 对象并设置粗体
    run_bold = p.add_run("一些粗体文本")
    run_bold.bold = True
    print("  添加了粗体 Run。")
    # 继续添加普通文本 Run
    p.add_run(" 和 ")
    print("  添加了普通文本 Run。")
    # 添加一个 Run 对象并设置斜体和下划线
    run_italic_underline = p.add_run("斜体加下划线文本")
    run_italic_underline.italic = True
    run_italic_underline.underline = True # 可以是 True, False, 或 docx.enum.text.WD_UNDERLINE 枚举值
    print("  添加了斜体加下划线 Run。")
    # 段落结束
    p.add_run(".")
    print("多 Run 段落添加完成。")

    # --- 添加表格 ---

    document.add_paragraph("以下是一个写入的表格:") # 表格前的说明段落

    # 添加一个表格 (行数，列数)
    # rows 参数是初始行数（包含头部），cols 是列数
    table = document.add_table(rows=1, cols=3)
    print("添加了一个 1x3 的表格。")
    # 应用一个内置的表格样式
    # 可用的样式名称取决于你的 Word 安装或模板，'Table Grid' 是比较通用的
    table.style = 'Table Grid'
    print("应用了 'Table Grid' 表格样式。")

    # 访问表格的初始行（通常是头部）并写入数据
    # table.rows 是一个列表，table.rows[0] 是第一行
    # table.rows[0].cells 是第一行的单元格列表
    table.rows[0].cells[0].text = "列 A"
    table.rows[0].cells[1].text = "列 B"
    table.rows[0].cells[2].text = "列 C"
    print("写入了表格头部。")

    # 添加更多数据行并写入内容
    data_rows = [
        [1, "Apple", datetime.date(2023, 10, 27)],
        [2, "Banana", datetime.date(2023, 10, 28)],
        [3, "Cherry", datetime.date(2023, 10, 29)]
    ]

    for row_data in data_rows:
        # table.add_row() 在表格末尾添加一个新行并返回新的 Row 对象
        new_row = table.add_row()
        # 遍历数据，写入新行的每个单元格
        for col_index, cell_value in enumerate(row_data):
             # new_row.cells 是新行的单元格列表
             new_row.cells[col_index].text = str(cell_value) # 通常需要转换为字符串写入

        print(f"添加并写入了一行数据: {
     row_data}")

    # --- 保存文档 ---
    document.save(file_to_write)

    print(f"成功将内容写入到文件: {
     file_to_write}")

except Exception as e:
    print(f"写入 Word 文件时发生错误: {
     e}", exc_info=True)
finally:
    # 注意: 为了检查生成的文件，这里不自动删除
    if os.path.exists(file_to_write):
        print(f"\n注意: 生成的文件 '{
     file_to_write}' 已保留，请手动删除。")

这个基本写入示例展示了：

使用 docx.Document() 创建一个新的空白文档。
使用 document.add_paragraph(text, style) 添加段落，可以指定文本和内置样式。
通过获取段落对象 (p = document.add_paragraph(...))，然后使用 p.add_run(text) 添加 Run，并设置 Run 的格式属性 (.bold, .italic, .underline 等)。
使用 document.add_table(rows, cols) 添加表格，可以指定初始行数和列数，并可选地应用表格样式 (table.style)。
通过访问 table.rows[index].cells[index] 来访问特定单元格，并设置其 .text 属性写入文本。
使用 table.add_row() 在表格末尾添加新行。
使用 document.save(filepath) 保存文档。

写入注意事项：

python-docx 主要用于内容创建和修改，对于精确的布局控制（例如，设置段落间距、行间距、页边距到像素级别）功能相对有限。这些通常通过样式或更低级别的 XML 操作实现。
直接设置单元格的 .text 会替换单元格原有的所有内容（包括段落、Run、嵌套表格等）。如果想在单元格内追加内容，需要访问其内部的段落。
写入非字符串数据类型时，通常需要先将其转换为字符串（例如 str(datetime.date.today())），除非是数字、布尔值等 python-docx 能直接处理的简单类型。

2.3 高级 .docx 文件操作

操作样式 (Styles)：

Word 文档大量依赖样式来控制格式（字体、段落间距、标题等级等）。python-docx 允许你访问文档中的样式，并创建、修改或应用它们。

document.styles: 访问文档中的样式集合。
document.styles['Style Name']: 按名称获取特定样式对象。
段落对象的 .style: 获取或设置段落应用的样式。
Run 对象的 .style: 获取或设置 Run 应用的字符样式。

代码示例：操作和应用样式

import docx
from docx.shared import Inches, Pt # 用于定义尺寸，如图片大小、字体大小
import os

file_styles = "document_styles.docx"

print(f"\n正在写入文件内容到: {
     file_styles}")

try:
    # 创建新的文档
    document = docx.Document()
    print("创建了新的文档对象。")

    # --- 查看内置样式 ---
    print("\n--- 文档中可用的内置样式 (部分) ---")
    # document.styles 对象包含所有样式
    # 可以通过 style.type 过滤 (WD_STYLE_TYPE.PARAGRAPH, WD_STYLE_TYPE.CHARACTER, etc.)
    # from docx.enum.style import WD_STYLE_TYPE
    # for style in document.styles:
    #      if style.type == WD_STYLE_TYPE.PARAGRAPH and style.builtin: # 只看内置的段落样式
    #          print(f"  内置段落样式: {style.name}")

    # 直接按名称访问并应用常用内置样式
    document.add_paragraph("这是 'Normal' 样式段落。", style='Normal')
    document.add_paragraph("这是 'Heading 1' 样式段落。", style='Heading 1')
    document.add_paragraph("这是 'Heading 2' 样式段落。", style='Heading 2')
    document.add_paragraph("这是 'Intense Quote' 样式段落。", style='Intense Quote')

    # --- 创建或修改自定义样式 ---
    # 通常基于现有样式创建新样式
    # 获取 Normal 样式作为基础
    normal_style = document.styles['Normal']
    # 添加一个新的段落样式
    # Style type is WD_STYLE_TYPE.PARAGRAPH
    # You need to give it a unique name
    # 'base_style' parameter sets the parent style
    # from docx.enum.style import WD_STYLE_TYPE
    # try:
    #      # If the style name doesn't exist, add_style creates it
    #      custom_para_style = document.styles.add_style('Custom Paragraph Style', WD_STYLE_TYPE.PARAGRAPH)
    #      # Set the base style
    #      custom_para_style.base_style = normal_style
    #
    #      # Modify style properties (font, paragraph format)
    #      # Modify font properties for the style
    #      custom_para_style.font.name = 'Arial'
    #      custom_para_style.font.size = Pt(12)
    #      custom_para_style.font.bold = True
    #      custom_para_style.font.color.rgb = docx.shared.RGBColor(0x42, 0x24, 0xE9) # RGB颜色
    #
    #      # Modify paragraph format properties for the style
    #      # custom_para_style.paragraph_format.space_after = Pt(12) # 段后间距
    #      # custom_para_style.paragraph_format.left_indent = Inches(0.5) # 左缩进
    #
    #      # Add a paragraph using the custom style
    #      document.add_paragraph("这是应用了自定义段落样式的段落。", style='Custom Paragraph Style')
    #      print("创建并应用了自定义段落样式。")
    #
    # except docx.exceptions.StyleNotFoundError as e:
    #      print(f"样式未找到错误: {e}")
    # except Exception as e:
    #      print(f"创建/修改样式时发生错误: {e}", exc_info=True)


    # 添加一个字符样式 (WD_STYLE_TYPE.CHARACTER)
    # from docx.enum.style import WD_STYLE_TYPE
    # try:
    #      custom_char_style = document.styles.add_style('Highlight Character Style', WD_STYLE_TYPE.CHARACTER)
    #      # Modify font properties for the character style
    #      custom_char_style.font.color.rgb = docx.shared.RGBColor(0xFF, 0x00, 0x00) # 红色
    #      custom_char_style.font.underline = True
    #
    #      # Add a paragraph
    #      p_char = document.add_paragraph("段落文本中的 ")
    #      # Add a run and apply the character style
    #      run_highlight = p_char.add_run("高亮文本")
    #      run_highlight.style = custom_char_style
    #      p_char.add_run("。")
    #      print("创建并应用了自定义字符样式。")
    #
    # except docx.exceptions.StyleNotFoundError as e:
    #      print(f"样式未找到错误: {e}")
    # except Exception as e:
    #      print(f"创建/修改样式时发生错误: {e}", exc_info=True)

    # 注意：直接通过 style 对象的属性修改会改变文档中所有使用该样式的元素格式。
    # 如果想只改变某个特定段落或 Run 的格式，直接修改其 .font 或 .paragraph_format 属性（这会创建本地覆盖）。

    # --- 保存文档 ---
    document.save(file_styles)
    print(f"成功将内容写入到文件: {
     file_styles}")

except Exception as e:
    print(f"写入 Word 文件时发生错误: {
     e}", exc_info=True)
finally:
    if os.path.exists(file_styles):
        print(f"\n注意: 生成的文件 '{
     file_styles}' 已保留，请手动删除。")

这个示例展示了如何使用 python-docx 操作样式：

访问 document.styles 获取样式集合。
通过名称访问特定样式对象 (document.styles['Style Name'])。
将样式对象赋值给段落的 .style 或 Run 的 .style 属性来应用样式。
创建自定义样式： 使用 document.styles.add_style(name, type) 创建新样式，并设置其 base_style。
修改样式属性： 通过样式对象的 .font 和 .paragraph_format 属性修改其字体和段落格式。这些修改会影响文档中所有应用了该样式的元素。
直接修改 Run/Paragraph 格式： 直接修改 Run 或 Paragraph 的 .font 或 .paragraph_format 属性会创建本地格式覆盖，优先级高于样式。

样式是 Word 文档格式化的核心。通过 python-docx 操作样式，可以实现文档的统一格式控制。

处理图片 (Pictures)：

Word 文档中可以嵌入图片。python-docx 允许你向文档添加图片。

document.add_picture(image_path, width=None, height=None): 向文档末尾添加图片。
Run 对象的 .add_picture(image_path, width=None, height=None): 在 Run 的位置添加图片。

代码示例：添加图片

import docx
from docx.shared import Inches, Emu # Inches用于英寸，Emu用于精确尺寸单位
import os

file_with_image = "document_with_image.docx"

# 需要一张图片文件来测试
# 如果没有图片，可以手动创建一个空的图片文件或者跳过此部分
dummy_image_path = "dummy_image.png"
# 为了示例完整性，我们创建一个简单的虚拟图片文件 (需要 Pillow 库: pip install Pillow)
try:
    from PIL import Image as PILImage
    img_pil = PILImage.new('RGB', (100, 50), color = (255, 165, 0)) # 橙色矩形
    img_pil.save(dummy_image_path)
    print(f"创建了虚拟图片文件: {
     dummy_image_path}")
except ImportError:
    print("注意: Pillow 库未安装，无法创建虚拟图片文件。跳过图片添加示例。")
    dummy_image_path = None # 标记图片文件不可用
except Exception as e:
    print(f"创建虚拟图片文件时出错: {
     e}", exc_info=True)
    dummy_image_path = None

if dummy_image_path and os.path.exists(dummy_image_path):
    print(f"\n正在写入文件内容到: {
     file_with_image}")
    try:
        document = docx.Document()
        print("创建了新的文档对象。")

        document.add_paragraph("文档中添加图片:")

        # --- 在文档末尾添加图片 ---
        # 可以指定宽度或高度，openpyxl 会按比例缩放
        # 尺寸单位可以是 Inches, Cm, Pt, Emu 等
        document.add_picture(dummy_image_path, width=Inches(3.0)) # 添加图片并指定宽度为3英寸
        print(f"在文档末尾添加了图片 '{
     dummy_image_path}'。")

        document.add_paragraph("在段落中添加图片:")

        # --- 在段落中添加图片 ---
        # 图片会作为 Run 添加到段落中
        p_image = document.add_paragraph("图片在这里: ")
        run_with_image = p_image.add_run() # 创建一个空的 Run 来容纳图片
        # 在 Run 中添加图片
        run_with_image.add_picture(dummy_image_path, height=Inches(0.5)) # 添加图片并指定高度为0.5英寸
        p_image.add_run("。") # 在图片后面添加文本
        print("在段落中添加了图片。")


        # --- 保存文档 ---
        document.save(file_with_image)
        print(f"成功将内容写入到文件: {
     file_with_image}")

    except Exception as e:
        print(f"写入 Word 文件时发生错误: {
     e}", exc_info=True)
    finally:
        # 清理生成的文件和虚拟图片文件
        if os.path.exists(file_with_image):
            # os.remove(file_with_image)
            print(f"\n注意: 生成的文件 '{
     file_with_image}' 已保留，请手动删除。")
        if os.path.exists(dummy_image_path):
            os.remove(dummy_image_path)
            print(f"清除了虚拟图片文件: {
     dummy_image_path}")

else:
    print("跳过图片添加示例，因为图片文件不可用。")

这个示例演示了如何使用 add_picture() 方法向 Word 文档中添加图片。图片可以添加到文档的末尾，也可以添加到段落中的特定 Run 位置。你可以指定图片的宽度或高度，python-docx 会自动处理缩放。

处理页眉和页脚 (Headers and Footers)：

Word 文档可以有页眉和页脚，它们可以在每个页面重复显示。python-docx 允许你访问和修改文档的页眉和页脚。

document.sections: 文档被分割成一个或多个 Section。每个 Section 可以有独立的页眉页脚。
section.header: 访问 Section 的页眉对象。
section.footer: 访问 Section 的页脚对象。
页眉/页脚对象类似于文档对象，包含段落 (.paragraphs) 和表格 (.tables)，你可以像操作文档主体一样操作它们。

代码示例：操作页眉和页脚

import docx
import os

file_headers_footers = "document_headers_footers.docx"

print(f"\n正在写入文件内容到: {
     file_headers_footers}")

try:
    document = docx.Document()
    print("创建了新的文档对象。")

    # --- 添加页眉和页脚 ---

    # 访问文档的第一个 Section (默认文档只有一个 Section)
    section = document.sections[0]
    print(f"访问了文档的第一个 Section: {
     section}")

    # 访问 Section 的页眉
    header = section.header
    # 页眉对象包含段落，默认可能有一个空段落
    print(f"访问了 Section 的页眉: {
     header}")

    # 清空默认页眉内容（如果存在）
    # for paragraph in header.paragraphs:
    #     paragraph.clear() # 清除段落内容
    # 或者直接替换段落
    if len(header.paragraphs) > 0:
         header.paragraphs[0].text = "这是文档的自定义页眉"
    else:
         header.add_paragraph("这是文档的自定义页眉") # 如果没有段落，添加一个

    # 可以像操作文档主体一样添加更多段落或表格到页眉
    # p_header = header.add_paragraph("页眉中的额外文本")
    # run_header = p_header.add_run(" (加粗)")
    # run_header.bold = True

    # 访问 Section 的页脚
    footer = section.footer
    print(f"访问了 Section 的页脚: {
     footer}")

    # 在页脚中添加页码 (Word 中的页码通常是一个域字段，openpyxl 直接添加文本)
    # 要添加真正的页码域，需要更低级别的操作或使用模板
    # 简单的文本示例：
    if len(footer.paragraphs) > 0:
        footer.paragraphs[0].text = "文档页脚 - 第 X 页 共 Y 页" # 占位符，不是实际页码
        # 或者添加右对齐的文本
        # from docx.enum.text import WD_ALIGN_PARAGRAPH
        # footer.paragraphs[0].alignment = WD_ALIGN_PARAGRAPH.RIGHT # 设置右对齐

    else:
         footer.add_paragraph("文档页脚")


    # --- 在文档主体添加一些内容以生成多页 ---
    for i in range(1, 20): # 添加足够多的内容以生成多页
        document.add_paragraph(f"这是第 {
     i} 段落，用于填充内容以创建多页。")
    print("添加了多页内容。")


    # --- 保存文档 ---
    document.save(file_headers_footers)
    print(f"成功将内容写入到文件: {
     file_headers_footers}")

except Exception as e:
    print(f"写入 Word 文件时发生错误: {
     e}", exc_info=True)
finally:
    if os.path.exists(file_headers_footers):
        # os.remove(file_headers_footers)
        print(f"\n注意: 生成的文件 '{
     file_headers_footers}' 已保留，请手动删除。")

这个示例演示了如何访问文档的 Section，然后通过 Section 对象访问其页眉和页脚。页眉和页脚对象的使用方式类似于文档对象本身，你可以向其中添加段落、文本甚至表格。需要注意的是，使用 python-docx 直接添加页码比较困难，因为 Word 中的页码通常是一个动态更新的域字段，而 python-docx 主要操作静态内容。添加真正的页码域通常需要基于模板或更低级别的 XML 操作。

处理表格的高级操作 (合并单元格、样式、尺寸)：

除了基本的表格创建和数据填充，python-docx 还支持表格的合并单元格、应用样式、设置单元格尺寸等。

cell.merge(other_cell): 将当前单元格与另一个单元格合并。合并会创建一个新的 _MergedCell 对象代表合并后的区域。
table.style: 应用表格样式。
cell.width, cell.height: 设置单元格的宽度和高度。注意需要使用 docx.shared 中的尺寸单位。
table.autofit: 控制表格的自动调整行为。

代码示例：表格高级操作

import docx
from docx.shared import Inches, Cm # 导入尺寸单位
import os

file_advanced_table = "advanced_table.docx"

print(f"\n正在写入文件内容到: {
     file_advanced_table}")

try:
    document = docx.Document()
    print("创建了新的文档对象。")

    document.add_paragraph("高级表格示例:")

    # --- 添加表格 ---
    table = document.add_table(rows=4, cols=4)
    table.style = 'Table Grid'
    print("添加了一个 4x4 的表格并应用样式。")

    # --- 合并单元格 ---
    # 合并第一行的前两列 (A1和B1)
    cell_a1 = table.cell(0, 0)
    cell_b1 = table.cell(0, 1)
    cell_a1.merge(cell_b1) # 将 A1 和 B1 合并
    cell_a1.text = "合并单元格 (A1:B1)"
    cell_a1.paragraphs[0].runs[0].font.bold = True # 设置合并后单元格中文本的格式

    # 合并第二列的第2和第3行 (B2和B3)
    cell_b2 = table.cell(1, 1)
    cell_b3 = table.cell(2, 1)
    cell_b2.merge(cell_b3) # 将 B2 和 B3 合并
    cell_b2.text = "合并单元格 (B2:B3)"

    # 合并右下角的 2x2 区域 (C3:D4)
    cell_c3 = table.cell(2, 2)
    cell_d3 = table.cell(2, 3)
    cell_c4 = table.cell(3, 2)
    cell_d4 = table.cell(3, 3)
    # 合并一个区域通常需要先合并第一行，再将其他行合并到第一行
    cell_c3.merge(cell_d3)
    cell_c3.merge(cell_c4) # C3 (现在是 C3:D3) 与 C4 (现在是 C4:D4) 合并
    # 注意：更简单的方式是合并左上角单元格与其他单元格
    # table.cell(2, 2).merge(table.cell(3, 3)) # 尝试直接合并对角线单元格，这会合并整个矩形区域 C3:D4
    # 验证合并后的单元格是否是期望的区域
    merged_cell = table.cell(2, 2).merge(table.cell(3, 3))
    merged_cell.text = "合并区域 (C3:D4)"


    # --- 设置单元格宽度 ---
    # 需要导入 Cm 或 Inches
    # 设置第一列的宽度
    table.columns[0].width = Cm(3) # 设置第一列宽度为 3 厘米
    # 设置某个单元格的宽度 (通常设置列宽更常见)
    # table.cell(0, 0).width = Inches(2) # 设置 A1 单元格宽度

    # --- 填充其他单元格 ---
    table.cell(1, 0).text = "数据 A"
    table.cell(1, 2).text = "数据 B"
    table.cell(1, 3).text = "数据 C"
    table.cell(2, 0).text = "数据 D"
    # table.cell(2, 1) 是合并区域的一部分，文本已设置
    # table.cell(2, 2) 是合并区域的左上角，文本已设置
    # table.cell(3, 0).text = "数据 E" # 行 3, 列 0
    # table.cell(3, 1) 是合并区域的一部分
    # table.cell(3, 2) 是合并区域的一部分


    # --- 保存文档 ---
    document.save(file_advanced_table)
    print(f"成功将内容写入到文件: {
     file_advanced_table}")

except Exception as e:
    print(f"写入 Word 文件时发生错误: {
     e}", exc_info=True)
finally:
    if os.path.exists(file_advanced_table):
        # os.remove(file_advanced_table)
        print(f"\n注意: 生成的文件 '{
     file_advanced_table}' 已保留，请手动删除。")

这个示例演示了表格的高级操作：

使用 cell.merge(other_cell) 合并两个单元格。注意，如果需要合并一个矩形区域，通常需要通过多次 merge 调用，或者像示例中那样，直接将区域的左上角单元格与其他边界单元格合并（尽管 API 文档中推荐的方法可能是先合并同一行的单元格，再合并这些合并后的单元格与下面行的单元格）。最简单有效的方式是 table.cell(r1, c1).merge(table.cell(r2, c2)) 来合并 (r1, c1) 和 (r2, c2) 定义的矩形区域。
使用 table.columns[index].width 和 table.cell(row, col).width 设置列或单元格宽度，需要使用 docx.shared 中的尺寸单位（Inches, Cm, Pt, Emu）。
应用表格样式 (table.style = '...')。

处理 Section (节)：

Word 文档可以包含多个 Section，每个 Section 可以有独立的页眉页脚、页边距、纸张方向、分栏设置等。python-docx 允许你访问现有的 Section，但不支持直接在文档中间插入 Section break 来创建新的 Section。通常，你需要基于模板文件来处理多 Section 文档。

document.sections: 文档中的 Section 列表。
section.page_height, section.page_width: 纸张尺寸。
section.top_margin, section.bottom_margin, section.left_margin, section.right_margin: 页边距。
section.orientation: 纸张方向（纵向或横向）。
section.start_type: Section 的起始类型（例如，新页、连续、新列）。

代码示例：读取 Section 属性

import docx
from docx.enum.section import WD_SECTION # 用于访问 Section 起始类型枚举
import os

# 假设文件 document_headers_footers.docx 包含多页和可能的多 Section
file_sections = "document_headers_footers.docx"

# Ensure the file exists (re-create if necessary)
if not os.path.exists(file_sections):
     print(f"\nSample file for sections not found, creating it: {
     file_sections}")
     # Assuming create_headers_footers_docx function exists from earlier
     try:
          # Re-use the logic from headers/footers example to create a multi-page doc
          document = docx.Document()
          section = document.sections[0]
          header = section.header
          if len(header.paragraphs) > 0:
               header.paragraphs[0].text = "Sample Header"
          else:
               header.add_paragraph("Sample Header")
          footer = section.footer
          if len(footer.paragraphs) > 0:
              footer.paragraphs[0].text = "Sample Footer"
          else:
              footer.add_paragraph("Sample Footer")
          for i in range(1, 20):
               document.add_paragraph(f"This is paragraph {
     i} to fill content for multiple pages.")
          document.save(file_sections)
          print(f"Created sample file: {
     file_sections}")
     except Exception as e:
          print(f"Error creating sample file {
     file_sections}: {
     e}. Skipping section read test.")
          file_sections = None # Mark as not available


print(f"\nReading section properties from {
     file_sections}:")
if file_sections and os.path.exists(file_sections):
    try:
        document = docx.Document(file_sections)
        print(f"文档包含 {
     len(document.sections)} 个 Section。")

        # 遍历所有 Section
        for section_index, section in enumerate(document.sections):
            print(f"\n--- Section {
     section_index + 1} ---")
            print(f"  纸张高度: {
     section.page_height} ({
     section.page_height.inches:.2f} 英寸)")
            print(f"  纸张宽度: {
     section.page_width} ({
     section.page_width.inches:.2f} 英寸)")
            # Note: page_height and page_width are Emu objects, can convert to inches etc.

            print(f"  上页边距: {
     section.top_margin} ({
     section.top_margin.inches:.2f} 英寸)")
            print(f"  下页边距: {
     section.bottom_margin} ({
     section.bottom_margin.inches:.2f} 英寸)")
            print(f"  左页边距: {
     section.left_margin} ({
     section.left_margin.inches:.2f} 英寸)")
            print(f"  右页边距: {
     section.right_margin} ({
     section.right_margin.inches:.2f} 英寸)")

            # 纸张方向
            if section.orientation == docx.enum.section.WD_ORIENTATION.PORTRAIT:
                print("  纸张方向: 纵向")
            elif section.orientation == docx.enum.section.WD_ORIENTATION.LANDSCAPE:
                 print("  纸张方向: 横向")
            else:
                 print(f"  纸张方向: {
     section.orientation}")


            # Section 起始类型
            if section.start_type == WD_SECTION.NEW_PAGE:
                 print("  Section 起始类型: 新页")
            elif section.start_type == WD_SECTION.CONTINUOUS:
                 print("  Section 起始类型: 连续")
            # Add checks for other types like NEW_COLUMN, EVEN_PAGE, ODD_PAGE
            else:
                 print(f"  Section 起始类型: {
     section.start_type}")

            # --- 修改 Section 属性 (示例) ---
            # 注意：修改 Section 属性会影响该 Section 的所有页面
            # 例如，将第一个 Section 的纸张方向改为横向 (如果需要)
            # if section_index == 0:
            #     print("  修改第一个 Section 的纸张方向为横向...")
            #     section.orientation = docx.enum.section.WD_ORIENTATION.LANDSCAPE
            #     # Note: Changing orientation might require swapping page_width and page_height
            #     # You might need to save and re-open the doc in Word to see the full effect

    except FileNotFoundError: # Already handled above
        print(f"错误: 文件 '{
     file_sections}' 未找到。")
    except Exception as e:
        print(f"读取 Section 属性时发生错误: {
     e}", exc_info=True)
finally:
    # Clean up
    if file_sections and os.path.exists(file_sections):
        # os.remove(file_sections)
        print(f"\n注意: Section 示例文件 '{
     file_sections}' 已保留，如不需要请手动删除。")

这个示例演示了如何遍历文档的 Section，并读取或修改 Section 的属性，如纸张尺寸、页边距、纸张方向和起始类型。这对于需要控制文档布局、创建不同布局的页面（例如，插入横向页面）时非常重要。但如前所述，python-docx 不支持直接插入 Section break 来创建新的 Section，你通常需要从一个已经包含多个 Section 的模板文件开始操作。

处理文档属性 (Document Properties)：

Word 文档包含一些元数据，如标题、作者、主题、关键字等。python-docx 允许你访问和修改这些文档属性。

document.core_properties: 访问 CoreProperties 对象，其中包含标题、作者、主题、创建日期、修改日期等标准属性。

代码示例：读写文档属性

import docx
import os
import datetime

file_doc_properties = "document_properties.docx"

print(f"\n正在写入文件内容到: {
     file_doc_properties}")

try:
    document = docx.Document()
    print("创建了新的文档对象。")

    # 添加一些内容
    document.add_paragraph("这是一个测试文档，用于演示文档属性。")

    # --- 读取默认文档属性 ---
    # 刚创建的文档，属性通常是空的或默认值
    print("\n--- 默认文档属性 ---")
    core_props = document.core_properties
    print(f"  标题: {
     core_props.title}")
    print(f"  作者: {
     core_props.author}")
    print(f"  主题: {
     core_props.subject}")
    print(f"  创建日期: {
     core_props.created}")
    print(f"  修改日期: {
     core_props.modified}")
    print(f"  类别: {
     core_props.category}")
    print(f"  关键字: {
     core_props.keywords}")


    # --- 修改文档属性 ---
    print("\n--- 修改文档属性 ---")
    core_props.title = "自动化生成的报告"
    core_props.author = "Python Script"
    core_props.subject = "业务数据分析"
    core_props.category = "报告"
    core_props.keywords = "Python, 自动化, 报告, 数据"
    core_props.created = datetime.datetime.now() # 修改创建日期（通常不改）
    # 修改日期会在保存时自动更新

    print("文档属性已修改。")

    # --- 保存文档 ---
    document.save(file_doc_properties)
    print(f"成功将内容写入到文件: {
     file_doc_properties}")

    # --- 重新打开文档并读取修改后的属性 ---
    print(f"\n重新打开文件 {
     file_doc_properties} 并读取属性:")
    document_reopened = docx.Document(file_doc_properties)
    core_props_reopened = document_reopened.core_properties
    print(f"  标题: {
     core_props_reopened.title}")
    print(f"  作者: {
     core_props_reopened.author}")
    print(f"  主题: {
     core_props_reopened.subject}")
    print(f"  创建日期: {
     core_props_reopened.created}")
    print(f"  修改日期: {
     core_props_reopened.modified}") # 这个应该接近当前时间
    print(f"  类别: {
     core_props_reopened.category}")
    print(f"  关键字: {
     core_props_reopened.keywords}")


except Exception as e:
    print(f"读写文档属性时发生错误: {
     e}", exc_info=True)
finally:
    if os.path.exists(file_doc_properties):
        # os.remove(file_doc_properties)
        print(f"\n注意: 文档属性文件 '{
     file_doc_properties}' 已保留，请手动删除。")

这个示例演示了如何通过 document.core_properties 访问文档的核心属性，并读取或修改它们的。这在自动化报告生成时非常有用，可以设置报告的标题、作者等元信息。

2.4 使用模板文件

在许多实际应用中，不是从头创建一个空白文档，而是加载一个预先设计好格式、包含静态内容和占位符的 Word 模板文件 (.docx)，然后用程序动态填充数据。

python-docx 可以加载现有的 .docx 文件：document = docx.Document(template_path).

处理占位符：

模板文件中的占位符通常是特定格式的文本（例如，[[报告标题]], {姓名}, { {Date}}）。你需要遍历文档的段落和表格，找到这些占位符文本，并将其替换为实际数据。

代码示例：基于模板填充数据

首先，你需要手动创建一个 Word 模板文件，例如 template.docx，内容如下：

报告标题: [[ReportTitle]]

生成日期: {
  {ReportDate}}

尊敬的客户 [[CustomerName]]:

这是您的月度报告。

总销售额: [[TotalSales]]

详细数据表格:
<<Table>> # 标记表格应该插入的位置或处理方式

<<EndTable>>

如果您有任何问题，请联系我们。

此致，
报告团队

import docx
import os
import datetime
import re # 用于正则表达式查找占位符

# 模板文件路径
template_file = "report_template.docx"
# 输出文件路径
output_file_filled = "filled_report.docx"

# 确保模板文件存在 (手动创建或通过代码创建)
# 这里假设 template.docx 已手动创建并包含上述占位符
if not os.path.exists(template_file):
     print(f"错误: 模板文件 '{
     template_file}' 未找到。请先手动创建它。")
     # 也可以在这里添加代码使用 python-docx 创建一个基本的模板文件，但复杂模板手动创建更方便
     sys.exit(1) # 如果模板不存在，程序无法继续


print(f"\n正在基于模板 '{
     template_file}' 填充数据，并保存到 '{
     output_file_filled}'")

# 要填充的数据
report_data = {
   
    'ReportTitle': '2023年10月业务报告',
    'ReportDate': datetime.date(2023, 10, 31).strftime('%Y-%m-%d'), # 格式化日期为字符串
    'CustomerName': '张三',
    'TotalSales': '15,750.75 元',
    # 用于表格的数据 (列表的列表)
    'TableData': [
        ['产品', '销量', '金额'],
        ['产品 A', 100, 5000.00],
        ['产品 B', 200, 10000.00],
        ['产品 C', 50, 750.75],
    ]
}

def fill_template(template_filepath: str, output_filepath: str, data: Dict[str, Any]):
    """
    加载模板文件，用字典中的数据填充占位符，并处理表格。
    """
    try:
        # 加载模板文档
        document = docx.Document(template_filepath)
        print("加载模板文档成功。")

        # --- 填充文本占位符 ---
        # 遍历文档的所有段落
        print("正在填充文本占位符...")
        for paragraph in document.paragraphs:
            # 遍历段落中的所有 Run (以防占位符被分割到不同 Run)
            # 更简单但可能丢失格式的方式是直接操作 paragraph.text (但会替换整个 Run 结构)
            # 稳健的方式是找到包含占位符的 Run，只替换 Run 中的文本
            # 或者更高级：找到跨 Run 的占位符，替换并合并 Run

            # 简单的字符串替换方法 (可能破坏 Run 结构和格式)
            # text = paragraph.text
            # for key, value in data.items():
            #     placeholder = f'[[{key}]]' # 示例占位符格式 [[Key]]
            #     placeholder2 = f'{
   {
   {
   {
   {key}}}}}' # 示例占位符格式 {
   {Key}}
            #     text = text.replace(placeholder, str(value))
            #     text = text.replace(placeholder2, str(value))
            # paragraph.text = text # 直接设置文本会替换掉 Run 结构

            # 查找和替换占位符在 Run 级别 (更安全地保留格式)
            # 这种方法需要处理占位符可能跨越多个 Run 的情况，比较复杂
            # 一个常见的策略是，先将整个段落的 Run 合并成一个，