Spacy使用手册

最新推荐文章于 2024-07-12 10:58:53 发布

Perry 彭儒

最新推荐文章于 2024-07-12 10:58:53 发布

阅读量1.1k

点赞数 1

分类专栏：深度学习工具

本文链接：https://blog.csdn.net/pengru120/article/details/115700464

版权

深度学习工具专栏收录该内容

37 篇文章 2 订阅

订阅专栏

本文介绍了如何安装和使用Spacy进行自然语言处理，包括选择效率和精度的模型，如en_core_web_sm和en_core_web_trf。通过Spacy，可以进行词性标注和依赖关系分析，并展示了如何打印依赖树结构。此外，还提供了批量处理文本的方法。

摘要由CSDN通过智能技术生成

1. 安装Spacy如下：https://spacy.io/usage

我的选项如下，其中Select pipeline for efficiency对应en_core_web_sm，Select pipeline for accuracy 对应en_core_web_trf；

2. 手动下载Spacy模型，调用如下命令安装：

pip install en_core_web_sm-3.0.0.tar.gz
pip install en_core_web_trf-3.0.0.tar.gz

en_core_web_sm文件：https://github.com/explosion/spacy-models/releases//tag/en_core_web_sm-3.0.0

en_core_web_trf文件：https://github.com/explosion/spacy-models/releases//tag/en_core_web_trf-3.0.0

3. 常见的NLP处理流程：

知乎介绍：https://zhuanlan.zhihu.com/p/63110761

官网介绍：https://spacy.io/usage/linguistic-features

4. Spacy Labels介绍：

POS Tagging Labels：https://melaniewalsh.github.io/Intro-Cultural-Analytics/features/Text-Analysis/POS-Keywords.html

Dependency Labels：https://github.com/clir/clearnlp-guidelines/blob/master/md/specifications/dependency_labels.md

5. Spacy中token属性：https://www.jianshu.com/p/488e29470755

6. Spacy依赖树结构介绍和打印：

import spacy
from nltk import Tree

nlp = spacy.load("en_core_web_sm")
doc = nlp(u"This is some sentence that spacy will not appreciate .")

# 打印当前单词，父单词，所有子单词，左子单词(单词索引在其之前)，右子单词(单词索引在其之后)
# 注：子单词是直接相连，间接相连为子孙单词
for token in doc:
    print(token.text, token.head.text, [child for child in token.children],
          [left_child for left_child in token.lefts], [right_child for right_child in token.rights], )

def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_

print([to_nltk_tree(sent.root).pretty_print() for sent in doc.sents])

结果：

6. 批量处理文本：https://spacy.io/usage/processing-pipelines

texts = ["This is a text", "These are lots of texts", "..."]
docs = list(nlp.pipe(texts))

Perry 彭儒

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
Spacy使用手册

1. 安装Spacy的网址如下：https://spacy.io/usage我的选项如下，其中Select pipeline for efficiency对应en_core_web_sum，Select pipeline for accuracy 对应en_core_web_sum_trf；
复制链接

扫一扫

专栏目录