python通过txt文本中提取目录（无论文章是否将目录提取到文章开头）

最新推荐文章于 2024-05-10 18:56:08 发布

喜欢地上爬的孩子

最新推荐文章于 2024-05-10 18:56:08 发布

阅读量1.5k

点赞数

分类专栏： python nlp 文章标签： python

本文链接：https://blog.csdn.net/Thefreelittle/article/details/121373855

版权

python 同时被 2 个专栏收录

19 篇文章

订阅专栏

nlp

4 篇文章

订阅专栏

通过正则表达式来识别文章中的标题：

以参考文献为截至

import re

# 分析header
def main_read_txt():
    url = "txt\\zhengwen.txt"
    with open(url, "r", encoding='utf-8') as f:
        count = f.readlines()
        for line in count:
            if line.find('参考文献(References)') < 0:
                line = line.strip('\n')  # 去掉列表中每一个元素的换行符
                if len(line) >= 3 and line != '\x0c':
                    # print(line[0], line[1], line[2])
                    if re.match(r'\d', line):
                        print(line)
                    elif re.match(r'\(\d', line):
                        print(line)
            else:
                break

if __name__ == '__main__':
    main_read_txt()