Tfidf使用

最新推荐文章于 2024-05-03 18:30:12 发布

qq_37373452

最新推荐文章于 2024-05-03 18:30:12 发布

阅读量548

点赞数 1

分类专栏： sklearn 文章标签： Tfidf

sklearn 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

from sklearn.feature_extraction.text import TfidfTransformer 
from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.feature_extraction.text import TfidfVectorizer


corpus = [          'This is the first document.',
        'This is the second second document.',
        'And the third one.',
        'Is this the first document?',
        ]


tfidf = TfidfVectorizer()

re = tfidf.fit(corpus)
name = tfidf.get_feature_names()
print (re)
f = re.transform(['This is the first document.'])
score = f.data
i = f.indices
tfScore = [(name[x[0]],x[1]) for x in zip(i , score)]
print(tfScore)
tfScore = sorted(tfScore, key=lambda x: x[1], reverse=False)
print(tfScore)
print(name)
print(f.indices)
print(f)
print(f.data)