NLTK vs Sklearn vs Gensim

最新推荐文章于 2023-12-22 09:19:01 发布

阿满子

最新推荐文章于 2023-12-22 09:19:01 发布

阅读量2.6k

点赞数

分类专栏：程序语言文章标签： nlp nltk sklearn gensim

程序语言专栏收录该内容

1 篇文章 0 订阅

订阅专栏

NLTK、SKlearn和Gensim使用场景

引用quora上的回答：

Yuval Feinstein的回答：
Generally,
- NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.)
- Sklearn is used primarily for machine learning (classification, clustering, etc.)
- Gensim is used primarily for topic modeling and document similarity.

Roland Bischof的回答：
- NLTK is specialized on gathering and classifying unstructured texts. If you need e.g. a POS-tagger, lematizer, dependeny-analyzer, etc, you’ll find them there, and sometimes nowhere else. It offers a quit broad range of tools developped mainly in academic research. But: most often it is not very well optimized - involving NLTK libraries often means to accept a huge performance loss. If you do text-gathering or -preprocessing, its fine to begin with - until you found some faster alternatives.

-SKLEARN is a much more an analyzing tool, rather than an gathering tool. Its greatly documented, well optimized, and covers a broad range of statistical methods.

-GENSIM is a very well optimized, but also highly specialized, library for doing jobs in the periphery of “WORD2DOC”. That is: it offers an easy and surpringly well working and swift AI-approach to unstructured texts. If you are interested in prodution, you might also have a look on TensorFlow, which offers a mathematically generalized, yet highly performant, model.

Although considerably overlapping, I personnaly prefer using NLTK for pre-processing, GENSIM as kind of base platform, and SKLEARN for third step processing issues.

阿满子

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
NLTK vs Sklearn vs Gensim

Generally, - NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.) - Sklearn is used primarily for machine learn...
复制链接

扫一扫

专栏目录