sklearn使用TFIDF进行文本关键字提取

最新推荐文章于 2024-01-09 00:58:22 发布

FIXLS

最新推荐文章于 2024-01-09 00:58:22 发布

阅读量7.5k

点赞数 1

分类专栏： NLP sklearn

本文链接：https://blog.csdn.net/baidu_15113429/article/details/80805181

版权

该博客介绍了如何利用sklearn库中的TF-IDF方法来提取文本中的关键字，帮助理解并实践文本分析技术。

摘要由CSDN通过智能技术生成

# encoding=utf-8
from sklearn.feature_extraction.text import TfidfVectorizer, TfidfTransformer

corpus = [
     'This This is the first document.',
     'This This is the second second document.',
     'And the third one.',
     'Is this the first document?',
 ]
tfidf_model = TfidfVectorizer()
tfidf_matrix = tfidf_model.fit_transform(corpus)
word_dict=tfidf_model.get_feature_names()
print(word_dict)
print(tfidf_matrix)

实验结果：

"C:\Program Files\Anaconda3\python.exe" D:/pycharmprogram/csgwork/find_classification_keys/test_tfidfVectorizer.py
['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
  (0, 8)	0.6986804246371375
  (0, 3)	0.34934021231856877
  (0, 6)	0.2856085141790751
  (0, 2)	0.43150466158747897
  (0, 1)	0.