python spacy库使用总结【待完善】

最新推荐文章于 2024-08-07 11:06:57 发布

阿.荣.

最新推荐文章于 2024-08-07 11:06:57 发布

阅读量3.3k

点赞数 2

分类专栏： python 文章标签： nlp

本文链接：https://blog.csdn.net/bmicnj/article/details/107189649

版权

本文介绍了Python的spaCy库，它是一个强大的NLP工具包，专注于性能和实用性。主要内容包括安装、词分词、英文断句、词干化、词性标注、命名实体识别、名词短语提取以及词向量相似度计算。

摘要由CSDN通过智能技术生成

spacy库的使用说明

1.安装
2.用法

1.安装

见另一篇python spacy安装问题末尾总结。

2.用法

spaCy 是一个Python自然语言处理工具包，诞生于2014年年中，号称“Industrial-Strength Natural Language Processing in Python”，是具有工业级强度的Python NLP工具包。spaCy里大量使用了 Cython 来提高相关模块的性能，这个区别于学术性质更浓的Python NLTK，因此具有了业界应用的实际价值。

import spacy
nlp = spacy.load(en_core_web_em)

官方文档见spacy（https://spacy.io/usage/linguistic-features）

主要支持英语和德语。

功能包括word tokenize, 英文断句，词干化，词性标注，命名实体识别，名词短语提取，相似度计算……

2.1 word tokenize（doc: token）

将英文单词和标点符号都分离出来，如果含有中文，则中文以多个文字之间的空格分词。

In [3]: test_doc = nlp(u"it's word tokenize test for spacy")

In [4]: print(test_doc)
it's word tokenize test for spacy

In [5]: for token in test_doc:
print(token)
...:
it
's
word
tokenize
test
for
spacy

test_doc是 spacy.tokens.doc.Doc 对象。

2.2 英文断句（doc.sents: sent）

In [6]: test_doc = nlp(u'Natural language processing (NLP) deals with the application of computational models to text or speech data. Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction, where the goal is to transform unstructured text into structured (database) representations that can be searched and browsed in flexible ways. NLP technologies are having a dramatic impact on the way people interact with computers, on the way people interact with each other through the use of language, and on the way people access the vast amount of linguistic data now in electronic form. From a scientific viewpoint, NLP involves fundamental questions of how to structure formal models (for example statistical models) of natural language phenomena, and of how to design algorithms that implement these models.')

In [7]: for sent in test_doc.sents:
print(sent)
...:
Natural language processing (NLP) deals with the application of computational models to text or speech data.
Application areas within NLP include automatic (machine) translation between languages; dialogue systems, which allow a human to interact with a machine using natural language; and information extraction