用 Python 实现将长 Markdown 文档从二级标题开始拆分

牧文山

于 2024-09-17 15:51:57 发布

阅读量161

点赞数

分类专栏： Python自动化办公文章标签： python markdown 笔记

本文链接：https://blog.csdn.net/weixin_42750611/article/details/142315328

版权

Python自动化办公专栏收录该内容

22 篇文章 28 订阅

订阅专栏

以下是一个简单的Python脚本，它可以将Markdown文档按照二级标题（##）进行拆分，并保存到指定的输出路径。

import os

def split_markdown_by_headers(input_path, output_folder):
    # 确保输出文件夹存在
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # 初始化变量
    current_file = None
    current_content = []

    # 读取输入文件
    with open(input_path, 'r', encoding='utf-8') as file:
        for line in file:
            # 检查是否是二级标题
            if line.startswith('## '):
                # 如果已经有打开的文件，先保存
                if current_file:
                    save_current_file(current_file, current_content, output_folder)
                    current_content = []  # 重置内容

                # 新的文件名是二级标题（去掉'## '）
                current_file = line.lstrip('# ').strip().replace(' ', '_') + '.md'
            
            # 将当前行添加到内容列表
            current_content.append(line)

    # 保存最后一个文件
    if current_file:
        save_current_file(current_file, current_content, output_folder)

def save_current_file(filename, content, folder):
    # 拼接完整的文件路径
    output_path = os.path.join(folder, filename)
    # 写入文件
    with open(output_path, 'w', encoding='utf-8') as file:
        file.writelines(content)

# 自定义输入路径
input_path = input("请输入Markdown文件的路径: ")
# 输出路径固定
output_path = 'output_md_files'

# 调用函数进行拆分
split_markdown_by_headers(input_path, output_path)