ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipelin

最新推荐文章于 2022-11-16 22:25:28 发布

LoveLkl

最新推荐文章于 2022-11-16 22:25:28 发布

阅读量1.6k

点赞数

文章标签： spacy poython tensorflow

本文链接：https://blog.csdn.net/u010327784/article/details/89203658

版权

错误代码：

nlp = English()
nlp.add_pipe(nlp.create_pipe('sentencizer'))

def normalize(text):
text = text.lower().strip()
doc = nlp(text)
filtered_sentences = []
for sentence in tqdm(doc.sents):#错误在这

错误：

ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].is_sent_start.

原因：

This is currently a limitation of the sentencizer, because the is_sentenced property is based on whether the Token.is_sent_start properties were changed. However, for the first token in a sentence, this will always default to True. So if the sentence only contains one token, there's no way for spaCy to tell whether the sentence boundaries have been set or not

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

LoveLkl

关注关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipelin

错误代码：nlp = English()nlp.add_pipe(nlp.create_pipe('sentencizer'))def normalize(text): text = text.lower().strip() doc = nlp(text) filtered_sentences = [] for sentence in tqdm(doc.s...
复制链接

扫一扫