NLTK获取文章摘要代码示例

最新推荐文章于 2024-07-11 08:30:12 发布

光英的记忆

最新推荐文章于 2024-07-11 08:30:12 发布

阅读量893

点赞数 1

分类专栏： NLTK

本文链接：https://blog.csdn.net/qq_29678299/article/details/90521884

版权

本文将展示如何利用Python的自然语言处理库NLTK来提取文章的摘要。通过执行简单的步骤，我们可以对长文本进行处理，找出关键句子，进而生成文章的精华概述。

摘要由CSDN通过智能技术生成

import sys
import nltk
from sklearn.feature_extraction.text import TfidfVectorizer


# 获取文章摘要
# 一旦有了no_of_nouns和no_of_ners分数的列表，就可以利用这些分数，创建更复杂的规则。
# 例如，一个典型的新闻报道将从相关话题的重要细节开始，最后一句话是整个故事的总结
f = open('nyt.txt', 'r')
news_contents = f.read()
result = []
# 句子标记解析
for sent_no, sentence in enumerate(nltk.sent_tokenize(news_contents)):
    no_tokens_of = len(nltk.word_tokenize(sentence))  # 单词标记解析
    tagged = nltk.pos_tag(nltk.word_tokenize(sentence))  # 词性标注
    no_of_nouns = len([word for word, pos in tagged if pos in ['NN', 'NNP']])  # 获取所有名词
    ners = nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sentence)), binary=False)  # 命名实体识别
    no_of_ners = len([chunk for chunk in ners if hasattr(chunk, 'label')])
    score = (no_of_ners + no_of_nouns)/float(no_tokens_of)
    result.append((sent_no, no_tokens_of, no_of_ners, no_of_nouns,