基于spaCy实现pytextrank对英文短语抽取

最新推荐文章于 2024-08-10 07:34:35 发布

MasonYyp

最新推荐文章于 2024-08-10 07:34:35 发布

阅读量2.1k

点赞数

本文链接：https://blog.csdn.net/make_progress/article/details/116943867

版权

本文介绍如何使用pytextrank进行文本关键词提取和摘要生成。包括安装配置环境、选择合适的数据模型，以及通过实例演示如何运行pytextrank并解析结果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1 参考学习网站

# 中文的博客
https://www.5axxw.com/wiki/content/475klz

# pytextrank的简单使用
https://spacy.io/universe/project/spacy-pytextrank
https://derwen.ai/docs/ptr/start/

2 安装开发环境

（1）安装python3.8

注意：在python3.6及以下版本，安装pytextrank可能存在问题，我原来用的python3.6没有安装成功。

sudo apt install python3.8

（2）安装pytextrank

pytextrank是基于spaCy实现的，因此需要安装spaCy。在安装pytextrank时自动安装spaCy，spaCy 是一个 Python 和 CPython 的 NLP 自然语言文本处理库。

pip install pytextrank

安装pytextrand不同的版本对spacy的数据模型会有影响，我安装的版本如下：

python=3.8.0，pytextrank=3.1.1，spacy=3.0.6

3 安装spaCy的数据集和模型

在线安装数据模型会受到网络和网速的限制，一般装不成功。因此，离线下载是最好的办法。

spaCy学习网站

https://spacy.io/models

下载数据模型地址

https://github.com/explosion/spacy-models/releases

离线安装数据模型

# 我用的数据模型是en-core-web-sm=3.0.0，不同的版本要对应不同的版本的spaCy会有影响
pip install en_core_web_sm-3.0.0.tar.gz

4 简单例子

import spacy
# 必须导入pytextrank，虽然表面上没用上，
import pytextrank

# example text
text = "Compatibility of systems of linear constraints over the set of natural numbers. Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are considered. Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given. These criteria and the corresponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types systems and systems of mixed types."

# 加载模型和依赖
nlp = spacy.load("en_core_web_sm")

# 此处调用“PyTextRank”包
nlp.add_pipe("textrank")
doc = nlp(text)

# 读出短语、词频和权重
for phrase in doc._.phrases:
    # 短语
    print(phrase.text)
    # 权重、词频
    print(phrase.rank, phrase.count)
    # 短语的列表
    print(phrase.chunks)