python通过txt文本中提取目录（无论文章是否将目录提取到文章开头）

最新推荐文章于 2023-01-12 23:21:43 发布

喜欢地上爬的孩子

最新推荐文章于 2023-01-12 23:21:43 发布

阅读量1.3k

点赞数

分类专栏： python nlp 文章标签： python

本文链接：https://blog.csdn.net/thefreelittle/article/details/121373855

版权

python 同时被 2 个专栏收录

19 篇文章 4 订阅

订阅专栏

nlp

4 篇文章 0 订阅

订阅专栏

通过正则表达式来识别文章中的标题：

以参考文献为截至

import re

# 分析header
def main_read_txt():
    url = "txt\\zhengwen.txt"
    with open(url, "r", encoding='utf-8') as f:
        count = f.readlines()
        for line in count:
            if line.find('参考文献(References)') < 0:
                line = line.strip('\n')  # 去掉列表中每一个元素的换行符
                if len(line) >= 3 and line != '\x0c':
                    # print(line[0], line[1], line[2])
                    if re.match(r'\d', line):
                        print(line)
                    elif re.match(r'\(\d', line):
                        print(line)
            else:
                break

if __name__ == '__main__':
    main_read_txt()

欢迎评论交流

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

喜欢地上爬的孩子

关注关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
python通过txt文本中提取目录（无论文章是否将目录提取到文章开头）

通过正则表达式来识别文章中的标题：以参考文献为截至import re# 分析headerdef main_read_txt(): url = "txt\\zhengwen.txt" with open(url, "r", encoding='utf-8') as f: count = f.readlines() for line in count: if line.find('参考文献(References)') < 0
复制链接

扫一扫