NLTK
- NLTK词频统计(Frequency)
- NLTK去除停用词(stopwords)
- NLTK分句和分词(tokenize)
- NLTK词干提取 (Stemming)
- NLTK词形还原(Lemmatization)
- NLTK词性标注(POS Tag)
- NLTK中的wordnet
- 使用方法:https://blog.csdn.net/asialee_bird/article/details/85936784
-
No module named ‘en_core_web_sm‘的问题::https://blog.csdn.net/weixin_43975374/article/details/107442194
spaCy
- 分句sentencizer
- 分词Tokenization
- 词性标注Part-of-speech tagging
- 词形还原Lemmatization
- 识别停用词Stop words
- 依存分析Dependency Parsing
- 提取名词短语Noun Chunks
- 命名实体识别Named Entity Recognization
- 指代消解Coreference Resolution
- 依存分析可视化Display
- 知识提取
- 官网:https://spacy.io/
- 使用方法:https://www.jianshu.com/p/e6b3565e159d
pattern
官网:https://github.com/clips/pattern
区别于以上两个库的最大优点就是
可以根据要求输出一个动词的不同时态的形式!!
细致讲解:https://blog.csdn.net/weixin_43975374/article/details/107484781
from pattern.en import conjugate, lemma, lexeme, PRESENT, INFINITIVE, PAST, FUTURE, SG, PLURAL, PROGRESSIVE
vb_word = "be"
print(conjugate(vb_word, tense=PRESENT, person=1, number=SG))
print(conjugate(vb_word, tense=PRESENT, person=2, number=SG))
print(conjugate(vb_word, tense=PRESENT, person=3, number=SG))
print(conjugate(vb_word, tense=PRESENT, number=PLURAL))
print(conjugate(vb_word, tense=PRESENT, aspect=PROGRESSIVE))
print(conjugate(vb_word, tense=INFINITIVE))
print(conjugate(vb_word, tense=PAST, aspect=PROGRESSIVE))