python sklearn TfidfVectorizer

最新推荐文章于 2022-12-30 14:36:51 发布

ShawDa

最新推荐文章于 2022-12-30 14:36:51 发布

阅读量2.4k

点赞数

本文链接：https://blog.csdn.net/sinat_36811967/article/details/79630158

版权

本文介绍了Python的sklearn库中TfidfVectorizer的使用，通过一个实例展示了其计算过程。虽然最终结果与手动计算可能略有差异，但可以看到IDF（逆文档频率）的计算原则，即频繁出现的词汇IDF值低。此外，每个词向量的元素平方和为1，体现了TF-IDF的归一化特性。

摘要由CSDN通过智能技术生成

参考：http://python.jobbole.com/81311/

# -*- coding:utf-8 -*-

from sklearn.feature_extraction.text import TfidfVectorizer, HashingVectorizer
import math
import numpy as np

corpus = ['This is the first document.',
      'This is the second second document.',
      'And the third one.',
      'Is this the first document?',]
vectorizer = TfidfVectorizer(min_df=1)
vectorizer.fit_transform(corpus)
print(vectorizer.get_feature_names())
print(TfidfVectorizer().f

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

ShawDa

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python sklearn TfidfVectorizer

参考：http://python.jobbole.com/81311/# -*- coding:utf-8 -*-from sklearn.feature_extraction.text import TfidfVectorizer, HashingVectorizerimport mathimport numpy as npcorpus = ['This is the first...
复制链接

扫一扫