SpaCy的使用例子总结

当使用Spacy进行自然语言处理时,常见的用例包括文本分词、命名实体识别、词性标注、句法分析等。下面是一些常见的使用例子及相应的代码:

文本分词(Tokenization)

将文本划分成单词或标点符号等基本单元。

import spacy

# 加载英文模型
nlp = spacy.load("en_core_web_sm")
# 文本分词
text = "This is a sample sentence."
doc = nlp(text)

# 输出分词结果
for token in doc:
    print(token.text)

运行结果

This
is
a
sample
sentence
.

命名实体识别(Named Entity Recognition)

识别文本中的命名实体,如人名、地名、组织机构等。

import spacy

# 加载英文模型
nlp = spacy.load("en_core_web_sm")
# 文本
text = "Apple is a big company, headquartered in Cupertino, California."
# 处理文本
doc = nlp(text)
# 提取命名实体
for ent in doc.ents:
    print(ent.text, ent.label_)

运行结果:

Apple ORG
Cupertino GPE
California GPE

词性标注(Part-of-speech Tagging)

标注文本中每个词的词性

import spacy

# 加载英文模型
nlp = spacy.load("en_core_web_sm")

# 文本
text = "This is a sample sentence."

# 处理文本
doc = nlp(text)

# 输出词性标注结果
for token in doc:
    print(token.text, token.pos_)

运行结果:

This PRON
is AUX
a DET
sample NOUN
sentence NOUN
. PUNCT

句法分析(Dependency Parsing)

分析文本中单词之间的依赖关系。

import spacy

# 加载英文模型
nlp = spacy.load("en_core_web_sm")

# 文本
text = "Apple is looking at buying U.K. startup for $1 billion"

# 处理文本
doc = nlp(text)

# 输出句法依赖关系
for token in doc:
    print(token.text, token.dep_, token.head.text, token.head.pos_,
          [child for child in token.children])

运行结果:

Apple nsubj looking VERB []
is aux looking VERB []
looking ROOT looking VERB [Apple, is, at, startup]
at prep looking VERB [buying]
buying pcomp at ADP [U.K.]
U.K. dobj buying VERB []
startup dep looking VERB [for]
for prep startup NOUN [billion]
$ quantmod billion NUM []
1 compound billion NUM []
billion pobj for ADP [$, 1]

英文分句

import spacy
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("sentencizer")
doc = nlp("This is a sentence. This is another sentence.")
for sentence in doc.sents:
    print(sentence)

运行结果:

This is a sentence.
This is another sentence.

关键字抽取

import spacy

nlp = spacy.load("en_core_web_sm")
text= """
    Please ignore that NLLB is not made to translate this large number of tokens at once. Again, I am more interest in the computational limits I have.

I already use torch.no_grad() and put the model in evaluation mode which I read online should safe some memory. My full code to run the inference looks like this:
    """

doc = nlp(text)
keywords = [token.text for token in doc if token.pos_ in ['NOUN', 'PROPN']]
print(keywords)

运行结果:

['NLLB', 'number', 'tokens', 'interest', 'limits', 'torch.no_grad', 'model', 'evaluation', 'mode', 'memory', 'code', 'inference']

句子相似度的比较

import spacy
nlp = spacy.load("en_core_web_lg")
 
doc1 = nlp(u'the person wear red T-shirt')
doc2 = nlp(u'this person is walking')
doc3 = nlp(u'the boy wear red T-shirt')
 
print(doc1.similarity(doc2))
print(doc1.similarity(doc3))
print(doc2.similarity(doc3))

运行结果:

0.7003971105290047
0.9671912343259517
0.6121211244876517

Model Architectures · spaCy API Documentation

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

茫茫人海一粒沙

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值