NLP：NLTK、spaCy、pattern库

最新推荐文章于 2025-01-05 13:45:39 发布

专心致志写BUG

最新推荐文章于 2025-01-05 13:45:39 发布

阅读量1k

点赞数 3

分类专栏： NLP笔记

本文链接：https://blog.csdn.net/weixin_43975374/article/details/107483763

版权

NLP笔记专栏收录该内容

22 篇文章

订阅专栏

NLTK

NLTK词频统计（Frequency）
NLTK去除停用词（stopwords）
NLTK分句和分词（tokenize）
NLTK词干提取（Stemming）
NLTK词形还原（Lemmatization）
NLTK词性标注（POS Tag）
NLTK中的wordnet
使用方法：https://blog.csdn.net/asialee_bird/article/details/85936784
No module named ‘en_core_web_sm‘的问题：：https://blog.csdn.net/weixin_43975374/article/details/107442194

spaCy

分句sentencizer
分词Tokenization
词性标注Part-of-speech tagging
词形还原Lemmatization
识别停用词Stop words
依存分析Dependency Parsing
提取名词短语Noun Chunks
命名实体识别Named Entity Recognization
指代消解Coreference Resolution
依存分析可视化Display
知识提取
官网：https://spacy.io/
使用方法：https://www.jianshu.com/p/e6b3565e159d

pattern

官网：https://github.com/clips/pattern

区别于以上两个库的最大优点就是

可以根据要求输出一个动词的不同时态的形式！！

细致讲解：https://blog.csdn.net/weixin_43975374/article/details/107484781

from pattern.en import conjugate, lemma, lexeme, PRESENT, INFINITIVE, PAST, FUTURE, SG, PLURAL, PROGRESSIVE
vb_word = "be"
print(conjugate(vb_word, tense=PRESENT, person=1, number=SG))
print(conjugate(vb_word, tense=PRESENT, person=2, number=SG))
print(conjugate(vb_word, tense=PRESENT, person=3, number=SG))
print(conjugate(vb_word, tense=PRESENT, number=PLURAL))
print(conjugate(vb_word, tense=PRESENT, aspect=PROGRESSIVE))
print(conjugate(vb_word, tense=INFINITIVE))
print(conjugate(vb_word, tense=PAST, aspect=PROGRESSIVE))