python切割txt文件_使用Python将文本分割成句子

我猜我认为这更多的是看后面比前面一看:

import re

# article_content contains all the article's paragraphs

# in this case, a single paragraph.

article_content = ["""Recognizing the rising opportunity Jerusalem Venture Partners opened up their Cyber Labs incubator, giving a home to many of the city’s promising young companies. International corporates like EMC have also established major centers in the park, leading the way for others to follow! On a visit last June, the park had already grown to two buildings with the ground being broken for the construction of more in the near future. This is really interesting! What do you think?"""]

split_article_content = []

for element in article_content:

split_article_content += re.split("(?<=[.!?])\s+", element)

print(*split_article_content, sep='\n\n')

输出

% python3 test.py

Recognizing the rising opportunity Jerusalem Venture Partners opened up their Cyber Labs incubator, giving a home to many of the city’s promising young companies.

International corporates like EMC have also established major centers in the park, leading the way for others to follow!

On a visit last June, the park had already grown to two buildings with the ground being broken for the construction of more in the near future.

This is really interesting!

What do you think?

%

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Python有许多方法可以按章节分割文本。以下是其中的一些方法: 1. 根据关键词切割:你可以找到文本中章节标题的关键词,然后使用这些关键词将文本分割各个章节。例如: ```python import re text = "章节1:这是第一章节的正文。章节2:这是第二章节的正文。" chapter_titles = re.findall(r'章节\d+', text) chapters = re.split('|'.join(chapter_titles), text)[1:] # 去掉第一个空字符串 ``` 这段代码首先使用正则表达式查找所有的章节标题,然后使用这些标题将文本分割章节。注意:split 函数的第一个参数必须是一个正则表达式,而不是一个简单的字符串。 2. 根据行数切割:你可以使用 Python文件操作函数按行读取文本,并将文本分割相等的章节。例如: ```python with open('text.txt', 'r') as f: lines = f.readlines() chapter_size = len(lines) // 10 # 假设文本有 10 个章节 chapters = [lines[i:i+chapter_size] for i in range(0, len(lines), chapter_size)] ``` 这段代码首先打开一个文本文件,然后使用 readlines 函数按行读取文本。接着,你可以将文本划分为相等的章节数,然后将每个章节作为一个列表返回。 3. 根据文本结构切割:如果文本的章节结构是固定的,你可以使用正则表达式或字符串操作函数找到每个章节的起始位置和终止位置。例如: ```python text = "章节1:这是第一章节的正文。章节2:这是第二章节的正文。" chapter_starts = [m.start() for m in re.finditer(r'章节\d+', text)] chapter_ends = [chapter_starts[i+1] if i != len(chapter_starts) - 1 else len(text) for i in range(len(chapter_starts))] chapters = [text[chapter_starts[i]:chapter_ends[i]] for i in range(len(chapter_starts))] ``` 这段代码使用正则表达式找到每个章节的起始位置,并根据每个起始位置找到相应的终止位置。然后,将每个章节作为一个列表返回。注意:在最后一个章节后面添加了文本的长度作为终止位置,以确保最后一个章节包含所有剩余的文本
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值